DeepSeek AI has announced the release of DeepSeek-OCR, a novel optical character recognition (OCR) system designed to enhance how large language models handle long text contexts through optical 2D mapping.
The technology introduces a vision-based context compression method that transforms text into compact visual tokens. DeepSeek claims its OCR accuracy exceeds 10% when compressing text at ratios ranging from 96x to 9x, and it can still achieve around 20% accuracy even under a 60x compression rate.
DeepSeek-OCR consists of two core components, DeepEncoder and DeepSeek3B-MoE-A570M, which work together to balance accuracy and efficiency. DeepEncoder reduces visual tokens before processing, preventing GPU overload even with high-resolution inputs.
In the OmniDocBench benchmark, the system outperforms existing OCR models like GOT-OCR2.0 and MinerU2.0, delivering superior performance while using fewer visual tokens and maintaining higher efficiency.
According to DeepSeek, the model can process more than 200,000 pages per day on a single NVIDIA A100 GPU, and scales to 33 million pages daily using 20 nodes.
The company states that this scalability makes DeepSeek-OCR ideal for large-scale document digitization and AI training data generation. It also supports various resolutions and document types, including charts, chemical formulas, and multilingual texts.
DeepSeek adds that its approach, leveraging visual patterns for compression, represents a new paradigm for language model efficiency. The system design allows smaller language models to effectively decode visual representations, indicating potential applications in memory optimization and long-context processing.
The code and model weights for DeepSeek-OCR are available as open-source on GitHub. The company states that its goal is to support broader research that integrates vision and language for more efficient AI systems.
DeepSeek notes that this paradigm "opens new possibilities for rethinking how visual and language modalities can be combined synergistically to improve computational efficiency in large-scale text processing and agent systems."
This release follows DeepSeek's recent V3.2-Exp model, which reportedly achieves significant efficiency improvements in training and inference, further advancing cost-effective long-context processing for LLMs.