Anuvaad: Domain-Specific Translation Engine for the Supreme Court of India

At a glance

Anuvaad is an open-source, domain-specific neural machine translation (NMT) engine built by Tarento with EkStep Foundation for Indic languages. The engine was selected by the Honourable Supreme Court of India and deployed in November 2019 as SUVAS, the Supreme Court Vidhik Anuvaad Software, an AI-powered legal translation system that converts English judgments and orders into Indian regional languages. The Bangladesh Supreme Court adopted the same engine in February 2021. By 2024, SUVAS had been used to translate tens of thousands of Supreme Court judgments into 16 regional Indian languages.

About the project: bridging India's legal language gap with open-source NMT

The Indian Constitution lists 22 official languages, supported by more than 6,000 dialects and over 55 languages with more than a million speakers each. The country's law, however, runs in English. Article 348(1)(a) of the Constitution requires Supreme Court and High Court proceedings to be conducted in English, which puts a real barrier between the average citizen and the rulings that affect their lives.

Anuvaad was conceptualised as a general-purpose, open-domain translation module for Indic languages, built to close that gap. The engine is open-sourced under the MIT licence and funded by EkStep Foundation. The judicial domain became its first large-scale, real-world test.

The challenge: legal translation across 22 official Indian languages

The brief from the Supreme Court of India was demanding. Translate large volumes of judgments, orders and legal documents into multiple Indian regional languages with the accuracy that the law actually requires. General-purpose machine translation services were not built for this. They lacked the legal vocabulary, the consistency and the document handling needed for court-grade work.

Three constraints shaped the build:

More than 20,000 domain-specific legal documents had to be digitised and translated with high accuracy.
Translation had to support multiple Indic languages, each with its own script, grammar and legal terminology.
The solution had to scale, stay open, and continue improving as new parallel corpora and benchmarks became available.

Why Tarento: domain-specific NMT and OCR for Indic languages

Tarento brought deep capability in OCR technology, AI engineering and neural machine translation for Indic languages. That mix was the right fit for a project where the input was often a scanned judgment and the output had to be a clean, legally accurate translation in Hindi, Tamil, Gujarati, Punjabi or another regional language.

What Tarento built: an end-to-end translation pipeline

Tarento delivered an end-to-end translation pipeline and supporting toolchain designed for state-of-the-art quality in the judicial domain:

OCR for scanned judgments, so older paper-based rulings could be digitised and brought into the translation flow.
Neural Machine Translation (NMT) models trained on legal-domain parallel corpora to deliver accuracy that general-purpose translators could not match for judicial text.
Parallel corpus and benchmarking tools to expand training data and evaluate translation quality consistently across languages.
Open-source delivery on GitHub, so the engine could be inspected, extended and contributed to by the wider Indic NLP community.

The system supports multiple vernacular Indian languages with high-quality digitisation and translation in the same pipeline.

Technology stack: NMT, OCR, ULCA and big data tooling

Anuvaad runs on a stack chosen for scale and openness: Apache Spark, Apache Airflow, Apache HBase and MongoDB on the data side, with Tarento's own OCR and NMT components on top. The engine connects with ULCA (Universal Language Contribution API), the open data platform that hosts datasets and reference models for India's BHASHINI ecosystem, also built by Tarento with EkStep Foundation.

Impact: SUVAS at India's Supreme Court, from 2019 to today

Anuvaad was deployed by the Supreme Court of India as SUVAS, the Supreme Court Vidhik Anuvaad Software, on 26 November 2019, launched by the then Chief Justice of India, S.A. Bobde. Since then:

Anuvaad has been used to digitise over 22 million documents and translate them into multiple Indian languages.
SUVAS now supports 16 regional Indian languages, expanded from the original nine.
By 2024, around 36,271 Supreme Court judgments had been translated into Hindi and another 17,142 judgments into 16 other regional languages.
Tens of thousands of translated judgments are now available on the e-SCR portal, making landmark rulings accessible in languages citizens actually speak.

For judicial-domain content, Anuvaad has shown qualitative and quantitative advantages over Google Translate, driven by the additional legal parallel corpus. On general sentences, performance is comparable.

Beyond India: adoption by the Bangladesh Supreme Court

In February 2021, the Bangladesh Supreme Court launched the same AI-based translation software, extending Anuvaad's footprint beyond India and validating the engine in a second South Asian judicial system.

Why this case study matters

Anuvaad is what domain-specific, open-source AI looks like when it serves real public infrastructure. The same combination of OCR, NMT, parallel corpus engineering and disciplined open-source delivery is what Tarento brings to legal tech, government and Indic language AI projects across India and beyond.

Explore Tarento´s Language AI services & Platform Engineering.

Think your idea makes lives simpler?

We can help you transform your business.

< previous

Real-time Inventory and Stock Analysis for India’s Leading Electric Vehicle Manufacturer

Next >

From 44% to 86%: How a Stalled AI Document-Processing Programme Was Rescued for a Leading Icelandic Bank

Next >

We use cookies to enhance the experience on our website. To know more please read our Privacy Policy

For details on how we use your data, see our Privacy Policy.