[{"data":1,"prerenderedAt":28},["ShallowReactive",2],{"nr-en-baidu-unlimited-ocr-dozens-pages":3},{"slug":4,"title":5,"dek":6,"date":7,"time":8,"publishedAt":9,"updated":10,"updatedAt":10,"dateFmt":11,"updatedFmt":10,"kind":12,"tier":13,"author":14,"authorName":15,"topics":16,"tracker":10,"trackerLabel":10,"headlineStat":22,"image":23,"ogImage":24,"imageAlt":5,"csv":10,"minutes":25,"words":26,"html":27},"baidu-unlimited-ocr-dozens-pages","Baidu Breaks OCR Bottleneck: Dozens of Pages in One Pass","Chinese researchers have built a document recognition system that shatters the previous ten-page limit. A modified attention mechanism keeps memory usage constant regardless of document length.","2026-07-05","18:31","2026-07-05T18:31:00+02:00","","July 5, 2026","news","standard","ideal-syka","Ideal Syka",[17,18,19,20,21],"OCR","Document Recognition","Attention Mechanism","AI Architecture","Baidu","Dozens of pages instead of ten per pass","\u002Fnewsroom\u002Fimg\u002Fbaidu-unlimited-ocr-dozens-pages.webp","\u002Fog-nr\u002Fbaidu-unlimited-ocr-dozens-pages.en.png",3,508,"\u003Cp>Baidu researchers have developed an OCR model that processes dozens of document pages in a single inference pass – while previous systems topped out at around ten pages. The system, called \u003Cstrong>Unlimited OCR\u003C\u002Fstrong>, uses a novel attention mechanism named \u003Cstrong>Reference Sliding Window Attention (R-SWA)\u003C\u002Fstrong> to keep memory and processing speed constant, regardless of text volume.\u003C\u002Fp>\n\u003Ch2>Quick Facts\u003C\u002Fh2>\n\u003Cul>\n\u003Cli>\u003Cstrong>Unlimited OCR\u003C\u002Fstrong> handles \u003Cstrong>dozens of pages\u003C\u002Fstrong> in one pass, compared to the previous limit of about \u003Cstrong>ten pages\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>The innovation&#39;s core: \u003Cstrong>R-SWA\u003C\u002Fstrong> keeps the KV-cache at constant size instead of growing linearly\u003C\u002Fli>\n\u003Cli>Baidu uses \u003Cstrong>Deepseek OCR\u003C\u002Fstrong> as its foundation and pairs it with a Mixture-of-Experts architecture (\u003Cstrong>3 billion parameters\u003C\u002Fstrong>, with \u003Cstrong>500 million active\u003C\u002Fstrong> during inference)\u003C\u002Fli>\n\u003Cli>Trained on roughly \u003Cstrong>two million document samples\u003C\u002Fstrong> – the system currently tops the most important OCR benchmark\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>The Problem: The KV-Cache Bottleneck\u003C\u002Fh2>\n\u003Cp>Previous OCR systems hit a technical wall. Language models store all processed tokens in a \u003Cstrong>KV-cache\u003C\u002Fstrong> during text generation – a buffer they reference later. With multi-page documents, this cache grows linearly with every new line. That causes exponential memory bloat and steadily declining speed. The practical workaround was crude: process each page separately, reset the cache, move to the next page – inefficient and slow.\u003C\u002Fp>\n\u003Ch2>Human Forgetting as a Model\u003C\u002Fh2>\n\u003Cp>Baidu solves this with an elegant analogy to human perception. When copying a book, you don&#39;t constantly re-read everything you&#39;ve written. You focus on the source, the last few characters, and what comes next. Older passages fade through a kind of &quot;soft forgetting.&quot;\u003C\u002Fp>\n\u003Cp>That&#39;s exactly what \u003Cstrong>R-SWA\u003C\u002Fstrong> does: each newly generated token sees all visual reference tokens and the prompt – but when looking back at already-generated output, it only attends to the \u003Cstrong>last 128 tokens\u003C\u002Fstrong>. The KV-cache stays constant instead of growing. An additional trick: visual tokens are encoded once and remain unchanged, preventing them from blurring through ongoing state changes.\u003C\u002Fp>\n\u003Cdiv class=\"tbl-scroll\">\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Aspect\u003C\u002Fth>\n\u003Cth>Previous Systems\u003C\u002Fth>\n\u003Cth>Unlimited OCR\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Pages per pass\u003C\u002Ftd>\n\u003Ctd>~10\u003C\u002Ftd>\n\u003Ctd>Dozens\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>KV-cache growth\u003C\u002Ftd>\n\u003Ctd>Linear\u003C\u002Ftd>\n\u003Ctd>Constant\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Latency across decoding steps\u003C\u002Ftd>\n\u003Ctd>Rising\u003C\u002Ftd>\n\u003Ctd>Flat\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\u003C\u002Fdiv>\n\u003Ch2>Architecture and Training\u003C\u002Fh2>\n\u003Cp>Unlimited OCR builds on \u003Cstrong>Deepseek OCR\u003C\u002Fstrong>. The \u003Cstrong>DeepEncoder\u003C\u002Fstrong> compresses a 1024×1024-pixel PDF image down to 256 tokens. The decoder network is a \u003Cstrong>Mixture-of-Experts architecture\u003C\u002Fstrong> with three billion parameters, of which only around 500 million are active during inference – saving compute. Training used roughly two million document samples, split 9-to-1 between single-page and multi-page data.\u003C\u002Fp>\n\u003Ch2>What This Means for You\u003C\u002Fh2>\n\u003Cp>This matters especially for German enterprises handling document processing – insurance, government, logistics, financial services. A system processing dozens of pages in one pass could dramatically speed up batch processing and reduce memory demands. Key questions remain: How well does Unlimited OCR handle German-language documents and specialized formats (forms, tables)? When will it become publicly available? Baidu has demonstrated a technical edge here – German and European teams should watch closely.\u003C\u002Fp>\n\u003Ch2>Sources\u003C\u002Fh2>\n\u003Cul>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fthe-decoder.com\u002Fbaidus-unlimited-ocr-processes-dozens-of-document-pages-in-one-pass-by-treating-memory-like-human-forgetting\u002F\">The Decoder\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cem>Editorially owned by \u003Ca href=\"\u002Fen\u002Fautor\u002Fideal-syka\">Ideal Syka\u003C\u002Fa>. Sources and method: \u003Ca href=\"\u002Fen\u002Fredaktion\">Newsroom &amp; method\u003C\u002Fa>. Tips and corrections: \u003Ca href=\"mailto:ai@i6eal.de\">ai@i6eal.de\u003C\u002Fa>.\u003C\u002Fem>\u003C\u002Fp>\n",1783276596189]