Want better AI agents? Improve information retrieval, Databricks says

Greetings and welcome to Eye on AI. This issue covers… acquiring the team and technology from AI chip startup Groq… purchasing Manus AI…AI becoming more adept at self-improvement…but potential gaps in our understanding of the brain could hinder AGI development.

Happy New Year! Significant developments have occurred in AI since our last update just before Christmas Eve. We’ll bring you up to speed in the Eye on AI News section below.

As I’ve mentioned previously, 2025 was anticipated to be the year of AI agents, yet most organizations faced difficulties implementing them. By year-end, the majority remained in the pilot testing phase. I believe this will shift in the coming year, partly because technology providers are realizing that merely supplying AI models with agent capabilities is insufficient. They must assist clients in designing complete workflows around AI agents—either through direct consultation via forward-deployed engineers serving as “customer success” guides, or via software tools that simplify the process for customers to handle independently.

A critical component for optimizing these workflows is ensuring AI agents can access appropriate information. Since 2023, the conventional approach has been RAG, or retrieval-augmented generation. This method gives AI systems search capabilities to fetch relevant documents or data from internal corporate resources or the public web, enabling models to base responses on this retrieved information rather than solely on training data. Various search tools support RAG systems, with many organizations employing hybrid solutions that integrate vector databases (especially for unstructured content) alongside traditional keyword or Boolean search methods.

However, RAG isn’t a universal solution, and basic RAG implementations often produce significant error rates. One issue is that AI models frequently have difficulty converting user prompts into effective search parameters. Additionally, even when searches execute properly, models may inadequately filter or process the retrieved data—sometimes due to diverse data formats, sometimes because users provide poor instructions, and occasionally because the AI models themselves are unreliable and disregard directions.

However, AI agents typically fail not because they cannot reason about data, but because they don’t receive the right data from the start,” , the research director at Databricks tells me. Bendersky was a longtime veteran of , where he worked on both Google Search and for Google .

Databricks Unveils New Retrieval Architecture That Outperforms RAG

Today, (recognized for its data analytics platform) is launching a new architecture for retrieval-augmented AI agents named Instructed Retriever, which it claims addresses most of RAG’s limitations.

The system converts a user’s prompt and any persistent custom specifications (such as document recency or product review quality) into a multi-stage search strategy that targets structured and unstructured data—plus crucially, metadata—to deliver the right information to the AI model.

Much of this capability involves converting natural language prompts and search specifications into specialized search query syntax. “The key is in translating natural language effectively, which can be quite challenging, and developing a robust model for query translation,” , Databricks’ CTO for neural networks, says. (Tang co-founded MosaicML, which Databricks in 2023.)

According to a series of benchmark tests developed by Databricks to mirror real-world enterprise scenarios—including instruction adherence, domain-specific searching, report creation, list generation, and searching complex PDF layouts—the company’s Instructed Retriever architecture achieved 70% greater accuracy compared to basic RAG methods. When deployed in multi-step agent workflows, it produced 30% better results than RAG-based processes while needing 8% fewer steps on average to reach a solution.

Enhancing Performance Even With Vague Instructions

The company also developed a new evaluation to assess how effectively the model handles imprecise queries. This test builds upon an existing Stanford University benchmark dataset known as (Semi-structured Retrieval Benchmark). Specifically, Databricks examined a subset of product search queries called StaRK-Amazon, expanding this dataset with supplementary examples. The focus was on searches containing implicit conditions. For example, the query “find a jacket from FooBrand that is best rated for cold weather” carries multiple hidden constraints: it must be a jacket, from FoodBrand, and be the FooBrand jacket with the highest cold-weather rating. The evaluation also covered queries where users seek to exclude specific products or restrict results to items with recent reviews.

The core concept behind Instructed Retriever is its ability to transform these implicit conditions into explicit search parameters. Bendersky identifies this as the key breakthrough: the architecture can convert natural language queries into searches that effectively utilize metadata.

Databricks evaluated the Instructed Retriever architecture using OpenAI’s GPT-5 Nano and GPT-5.2, along with Anthropic’s Claude-4.5 Sonnet, plus a custom fine-tuned 4-billion-parameter model designed specifically for these queries, dubbed InstructedRetriever-4B. All models were tested against a conventional RAG architecture, showing 35% to 50% improvement in result accuracy. The InstructedRetriever-4B performed comparably to the larger frontier models from OpenAI and Anthropic while offering lower deployment costs.

As with any AI implementation, proper data placement and formatting remains essential for success. Bendersky notes that Instructed Retriever functions effectively provided an enterprise’s dataset includes a search index with metadata. (Databricks also provides tools to transform completely unstructured datasets into metadata-enriched formats.)

The company indicates that Instructed Retriever is currently available to beta customers through its Knowledge Assistant product within the Agent Bricks AI agent development platform, with general availability expected soon.

This represents just one of many innovations likely to emerge this year from AI agent vendors, potentially making 2026 the actual year of AI agents.

Now, here are additional AI updates.

Jeremy Kahn

@jeremyakahn