POMA AI Attains Industry-Leading RAG Chunking and Document Ingestion, Cutting Tokens by 77% vs. Conventional Models

Smart hierarchical chunking is the optimal data preparation for vector database embeddings

Berlin, BERLIN, March 16, 2026 — POMA AI, a document intelligence company based in Berlin, has launched POMA-OfficeQA, an open-source benchmark. This benchmark demonstrates that POMA AI’s structure-aware document chunking significantly reduces RAG retrieval costs, achieving a 77% decrease compared to both basic text splitting and Unstructured.io’s element extraction methods.

POMA AI Achieves Best-in-Class RAG Chunking and Document Ingestion With 77% Token Reduction vs. Conventional Models

Out of the box, POMA PrimeCut utilizes 77% fewer tokens than traditional models. This figure increases to 83% when employed in customized configurations.

“Currently, every RAG system in production loses information before it’s even processed by the model,” stated Dr. Alexander Kihm, founder & CEO of POMA AI. “While the industry has focused on optimizing embeddings, rerankers, and prompt engineering, the ingestion layer is the actual source of most retrieval failures. This benchmark quantifies what practitioners have intuitively understood: structure-aware chunking is the fundamental element that enables everything downstream to function effectively.”

The comprehensive benchmark, accessible on GitHub, evaluated three document chunking strategies for Retrieval-Augmented Generation (RAG). The evaluation used identical embeddings, retrieval logic, and 20 table-lookup questions across 14 U.S. Treasury Bulletins, totaling approximately 2,150 pages. The test measured each method’s capacity to retrieve all necessary evidence for accurately answering factual questions. The metric used, context recall, specifies the minimum token budget a retrieval system requires to ensure all evidence is present in the retrieved context.

The findings revealed that POMA’s hierarchical chunking, which preserves document structure including table headers, section hierarchy, and semantic connections between content elements, required 77% fewer tokens to achieve 100% context recall. The token counts were as follows:

Baseline (naive chunking with 500 token, 100 overlap): 1.45 million
Unstructured.io (element extraction: 1.48 million
POMA AI (structure-aware): 340k

All methods employed OpenAI’s text-embedding-3-large model for embeddings and cosine similarity for retrieval ranking. Ground truth was established using exact chunk indices verified against the source documents, thereby eliminating false positives from coincidental numeric matches. Only questions answerable by all three methods were included to ensure a fair comparison. Questions that resulted in extraction failures for any method (such as OCR errors or missing values) were excluded.

“What impressed us about POMA was the engineering depth behind a seemingly straightforward insight,” commented Till Faida, co-founder of AdBlock, an investor and advisor to POMA AI. “They targeted the ingestion layer, which is the part of the pipeline that everyone assumes is already solved. This benchmark proves that it is not. A 77% reduction in tokens significantly alters the economics of operating RAG at an enterprise scale. That represents the kind of structural advantage we seek.”

ABOUT POMA AI: POMA AI is a Berlin-based company specializing in document intelligence, developing infrastructure for enterprise RAG systems. Its core technology converts complex documents into semantically coherent chunks suitable for vector search and LLM processing. POMA’s API handles document processing in a single operation, outputting both granular chunks and grouped chunksets that are compatible with any embedding model and vector store. A free demo is available on the POMA AI website. Further information about POMA AI can be found on LinkedIn or X (Twitter).

POMA AI Achieves Best-in-Class RAG Chunking and Document Ingestion With 77% Token Reduction vs. Conventional Models

POMA PrimeCut’s structure-aware embeddings demonstrated a 119x improvement over context-only embeddings.

Press Inquiries

Florian Athens
fa [at] poma-ai.com
https://poma-ai.com

POMA AI Attains Industry-Leading RAG Chunking and Document Ingestion, Cutting Tokens by 77% vs. Conventional Models

Search

Categories

Links