About this project
SanskritKatha is a research project by Mahesh Ramakrishnan, based in Bengaluru. The goal: understand how well AI can generate pedagogically sound Sanskrit stories, and where it falls short.
The research
We generated 50,000 synthetic Sanskrit stories using multiple language models, across two difficulty tiers: BalaKatha (age 4–5, simple vocabulary) and KishoraKatha (age 14–15, literary sophistication). Each story embeds a Dharmic principle, uses specific vocabulary, and follows a structural feature — all specified programmatically.
Automated LLM-as-judge evaluations scored every story on six quality dimensions. But LLM evaluators have known biases — self-preference bias of ~0.8 points on a 5-point scale, leniency bias varying by 1+ point across evaluators. We cannot trust AI to grade its own work.
That's where human evaluation comes in. By collecting structured reviews from people who know Sanskrit, we build the ground truth needed to measure exactly how much AI evaluators can be trusted for Sanskrit content. The results will feed into a paper targeting ACL/EMNLP 2026.
The methodology
This platform follows established practices from crowdsource annotation research:
- Blind review protocol — reviewers never see which AI model generated a story, other reviewers' scores, or the automated scores.
- Calibration stories — expert-scored honeypot stories are mixed in to measure reviewer accuracy.
- Inter-annotator agreement — every story needs at least 3 independent reviews before we compute consensus.
- Trust scoring — a composite score based on honeypot accuracy, agreement with other reviewers, time quality, and score distribution patterns.
- Anti-gaming measures — minimum time thresholds, score distribution monitoring, and rate limiting to ensure review quality.
The person behind it
Mahesh has spent 26 years in language technology — building NLP products, working with text at scale, and more recently, exploring what AI can and cannot do for classical languages. He founded Tataatsu Idealabs, where over 13 years he built NLP products including CollabLayer and Seer. Before that, he worked on some of the earliest applications of NLP in enterprise products.
This project sits at the intersection of two things he cares about: Sanskrit and language technology.
This is not a funded lab or a university department. It is one person's serious attempt at contributing to a field that matters. All reviewer contributions will be acknowledged in the resulting publication.
Get involved
If you read Sanskrit — whether as a scholar, teacher, student, or enthusiast — your judgment is valuable to this research. Each batch of 10 stories takes about 25 minutes. Review at your own pace, as many batches as you like.
Questions? Reach out at [email protected]