Tarento Joins Ekstep To Build The Pillar For National Language Translation Mission Via ULCA Platform

Client overview: EkStep Foundation

EkStep Foundation is a Bengaluru-based non-profit building large-scale digital public goods. Co-founded by Nandan Nilekani, Rohini Nilekani and Shankar Maruwada, EkStep brings experience from Aadhaar-scale digital infrastructure into education and Indic language AI.

The challenge: building a shared foundation for Indian language AI

India’s National Language Translation Mission needed a common digital foundation for 22 official languages, many of which remain low-resource for NLP.

To build reliable machine translation, speech recognition, text-to-speech and OCR capabilities, India needed a single platform to collect datasets, host reference models, attribute contributors and benchmark progress. The platform also required standard API contracts so that datasets, models and systems from different research labs could interoperate.

Why EkStep chose Tarento

Tarento had already partnered with EkStep on Anuvaad, a document translation platform used by judicial bodies for legal translations. This experience in Indic NLP, open architecture and government-grade data handling made Tarento a strong engineering partner for ULCA, the Universal Language Contribution API, under MeitY’s National Language Translation Mission.

What Tarento built

Tarento designed and delivered ULCA as an open, scalable and platform-agnostic data layer for the BHASHINI ecosystem.

The work covered:

Architecture and API contracts A common specification for submitting, describing, attributing, searching, retrieving and benchmarking datasets and models.
Contributor and submission flows Tooling for research labs, MSMEs and individual contributors to publish datasets and models in a standard format.
Curation and benchmarking Pipelines for sanity checks, record-level attribution and benchmark datasets to evaluate models against shared metrics.
Open-source foundation ULCA was published under the MIT licence and became the maintained code base behind the BHASHINI platform.

Tarento also worked with India’s NLP research community, including teams from IITs, IIITs, IISc, CDAC and AI4Bharat, to bring early datasets and models into the ULCA-compliant format.

What ULCA hosts

ULCA supports datasets and models for machine translation, ASR, TTS, OCR, transliteration, named entity recognition and language identification across Indic languages.

By BHASHINI’s launch milestone, ULCA hosted around 215 million parallel translation pairs across 12 Indic languages, roughly 9,800 hours of ASR audio, hundreds of hours of studio-quality TTS data, around 6 million transliteration entries across 19 languages, and more than 240 models across translation, ASR, TTS, OCR and transliteration.

The catalogue has continued to grow through contributions from the wider BHASHINI ecosystem.

Why ULCA matters

BHASHINI now exposes more than 300 pre-trained AI models through Open Bhashini APIs. It has supported high-visibility public use cases, including the Prime Minister’s real-time Tamil speech translation in December 2023 and the Finance Minister’s 2024 Union Budget address.

ULCA provides the data and model backbone for this ecosystem, enabling startups, researchers and government bodies to build Indic language AI products on a shared open foundation.

Technology stack

Category	Technologies
Programming Languages	Java, Python
Frontend / UI	React
API & Gateway	OpenAPI, Zuul
Databases / Data Stores	MongoDB, Redis
Streaming / Real-time Data / Analytics	Apache Kafka, Apache Druid
DevOps / CI-CD	Jenkins
Cloud Platforms	Microsoft Azure, Amazon Web Services (AWS)
Infrastructure / Delivery	Content Delivery Network (CDN)

Think your idea makes lives simpler?

We can help you transform your business.

< previous

Redesigning UIDAI's mAadhaar App: From 2 Stars to a Trusted Digital Identity Channel

Next >

Paradiset: Democratising Healthy Eating

Next >

We use cookies to enhance the experience on our website. To know more please read our Privacy Policy

For details on how we use your data, see our Privacy Policy.