
A web browser that was only partially functional, if built by human engineers, would likely not generate much discussion. However, when Michael Truell, the CEO of the coding startup Cursor, announced last week that a swarm of AI agents had constructed a browser that he described as “kind of works”—operating autonomously for a full week—the story spread rapidly throughout the tech community, amassing over six million views.
What accounts for the excitement? There are two primary factors. Firstly, AI systems have traditionally had very limited attention spans. When ChatGPT first emerged, models could only maintain focus on a task for mere seconds. This duration expanded to minutes with improved models, and later to hours. The Cursor initiative is being touted as one of the initial instances where an AI has managed a complex, open-ended software endeavor for a full week without any human direction.
Furthermore, individual AI agents are typically confined to narrow, specific tasks. Coordinating hundreds of these agents on a large-scale project has, until now, seemed like a distant possibility. This is precisely why Cursor tested the limits of autonomous coding—on a project that might take a human team months—by assembling an “orchestra” of AI agents functioning as a unit. The key question was whether an AI system could demonstrate sufficient persistence and collaborative ability to explore code, decompose tasks, debug its own work, and continue progressing for days without losing focus.
An AI agent ‘orchestra’
The researchers discovered that the answer was largely affirmative. Cursor’s experiment organized hundreds of agents into a structure resembling a software team, complete with “planners,” “workers,” and “judges” that coordinated across millions of lines of code. This points toward a future, as described by both Cursor and OpenAI, where AI doesn’t merely aid workers but assumes responsibility for entire projects. Such a shift would fundamentally transform how complex work is accomplished, starting with software development and eventually extending to other fields.
Experiments with AI swarms have been conducted for a few years. However, Cursor notes that current models are more intelligent and can maintain coherence for significantly longer periods. They can now be deployed at a much larger scale, supported by a custom layer that manages hundreds of agents and prevents them from falling into disarray.
Jonas Nelle, a Cursor engineer focused on long-running AI agents, told that because AI models are continually improving, engineers and researchers must reassess their expectations about the models’ capabilities every few months. Although he conceded that he “wouldn’t download it and delete Chrome today,” he stated the browser project was “certainly better than anything models previously would have been able to do.”
Bill Chen, an OpenAI engineer who stress-tests and evaluates the real-world performance of the company’s models, emphasized that these long-running agents represent a significant frontier. He said the duration of a task and an AI system’s ability to complete it autonomously and coherently serve as a “very good indicator of how intelligent and how general a system is.” He described the Cursor project, which utilized OpenAI’s GPT-5.2, as “a direct result of us really continuously pushing forward the boundaries of model capabilities,” adding that even longer-duration tests are planned for the future.
AI agent swarms are not ready for business use
Nevertheless, these systems are not yet prepared for production use. Beyond being prone to bugs and incomplete, operating swarms of agents for days or weeks incurs significant expense. Although costs have dropped dramatically in the past year, extended operations involving hundreds of AI agents can still accumulate high charges.
Security is another major concern. An autonomous system introduces worries about vulnerabilities, potential data leaks, and other risks, necessitating new layers of control and auditability.
However, Chen predicted a near future where a system like this could be ready “for broad consumption and at a not prohibitive cost.” He explained that progress has been steady, with critical advancements occurring at each stage. For the present, he noted, the enthusiasm stems from this being a tangible, practical demonstration of model capability, as opposed to “how this model performs on academic and public evaluations and benchmarks.”
This rapid development has astonished even seasoned AI watchers. Independent researcher Simon Willison recently noted that he had predicted a largely AI-built web browser would emerge by 2029, and that it wouldn’t be surprising. “Rolling a new web browser is one of the most complicated software projects I can imagine,” he wrote. Cursor appears to have fast-tracked that schedule. “I may have been off by three years,” Willison admitted. “I have to admit I’m very surprised to see something this capable emerge so quickly.”
This aligns with what OpenAI and others refer to as an “overhang”—the concept that the most advanced AI models possess capabilities far beyond what is currently deployed, and that the right mix of tools, product design, and cost reductions can suddenly make them viable at scale. Therefore, while tools like the Cursor browser are not yet ready for mainstream use, the direction of progress is unmistakable.
