Surya OCR

TL;DR

Open-source OCR system created by Vik Paruchuri, first released in 2024 and developed alongside the related Marker PDF-to-Markdown project.
Document-understanding pipeline: text detection, recognition, layout analysis, reading-order prediction, and table recognition from a single Python package.
Strong multilingual coverage (90+ languages) with competitive accuracy on document-level benchmarks against PaddleOCR and Tesseract.
Permissively licensed for non-commercial and research use; commercial use depends on revenue and is governed by the Surya licence terms.

What Surya Offers#

Surya is a document OCR system rather than just a text recogniser. The default pipeline returns not only the transcribed text and bounding boxes but also a layout decomposition (title, paragraph, list, figure, table), an inferred reading order, and — for tables — a structured cell grid. That makes it a natural input for downstream document-understanding workloads: RAG ingestion, structured-data extraction, accessibility tooling.

It is developed in tandem with Marker, a PDF-to-Markdown converter that uses Surya for the OCR layer. The two projects share infrastructure and are commonly used together for document ingestion pipelines.

Components#

Detection model — text-line and text-block detection for arbitrary orientations.
Recognition model — transformer-based encoder-decoder text recogniser.
Layout model — segments pages into semantic regions (text, title, list, table, figure, caption, header, footer).
Order model — predicts reading order across detected regions, important for multi-column or magazine layouts.
Table recognition — locates and reconstructs table structure into row/column cells.

Multilingual Coverage#

Surya supports OCR for 90+ languages out of the box. The text recogniser is trained jointly across languages, so script switches within a document are handled without per-language model swapping. Layout and order models are largely script-agnostic and apply across the language set.

For mixed-script documents — academic papers with English captions over Devanagari body text, or multilingual government forms — Surya handles the joint vocabulary more cleanly than running PaddleOCR with two language models.

Deployment#

Surya runs on PyTorch and benefits from FP16 inference on L4 or L40S accelerators. For high-throughput document ingestion, batched inference across pages is the dominant optimisation; layout and order models are small enough that they rarely become the bottleneck.

python

from surya.ocr import run_ocr
from surya.model.detection.model import load_model as load_det_model, load_processor as load_det_processor
from surya.model.recognition.model import load_model as load_rec_model
from surya.model.recognition.processor import load_processor as load_rec_processor
from PIL import Image

image = Image.open("document.png")
predictions = run_ocr(
    [image],
    [["en", "hi"]],
    load_det_model(), load_det_processor(),
    load_rec_model(), load_rec_processor(),
)

Licensing and Practical Notes#

Surya is freely available for research and personal use. Commercial use is permitted under the licence below a revenue threshold; above that, a commercial agreement with the author is required. Check the LICENCE file on GitHub before deploying in a commercial context — the terms have been refined over the project's lifetime.

Best paired with Marker when the input is PDFs rather than raster images.
Reading-order model is the standout feature for multi-column ingestion — the alternative is hand-tuned heuristics.
Table recognition output is structured enough to be parsed directly into pandas DataFrames.

References

Surya GitHub · GitHub
Marker GitHub · GitHub

What Surya Offers#

Components#

Detection model — text-line and text-block detection for arbitrary orientations.

Recognition model — transformer-based encoder-decoder text recogniser.

Layout model — segments pages into semantic regions (text, title, list, table, figure, caption, header, footer).

Order model — predicts reading order across detected regions, important for multi-column or magazine layouts.

Table recognition — locates and reconstructs table structure into row/column cells.

Multilingual Coverage#

Deployment#

python

from surya.ocr import run_ocr
from surya.model.detection.model import load_model as load_det_model, load_processor as load_det_processor
from surya.model.recognition.model import load_model as load_rec_model
from surya.model.recognition.processor import load_processor as load_rec_processor
from PIL import Image

image = Image.open("document.png")
predictions = run_ocr(
    [image],
    [["en", "hi"]],
    load_det_model(), load_det_processor(),
    load_rec_model(), load_rec_processor(),
)

Licensing and Practical Notes#

Best paired with Marker when the input is PDFs rather than raster images.

Reading-order model is the standout feature for multi-column ingestion — the alternative is hand-tuned heuristics.

Table recognition output is structured enough to be parsed directly into pandas DataFrames.

Surya OCR

What Surya Offers#

Components#

Multilingual Coverage#

Deployment#

Licensing and Practical Notes#

References

Browse all entries

Deploy on Yobitel

Surya OCR

What Surya Offers#

Components#

Multilingual Coverage#

Deployment#

Licensing and Practical Notes#

References

Browse all entries

Deploy on Yobitel