20M+ Indian legal documents with citation graphs and vector embeddings – potential uses for legal NLP? [D]
About this article
been working on structuring India's legal corpus for the past 2 years and wanted to share what I've built and hear from people working on legal NLP or low-resource Indian language models. dataset is 20M+ Indian court cases from the Supreme Court, all 25 High Courts, and 14 Tribunals. each case has structured metadata (court, bench, date, parties, judges, sections cited, acts referenced, case type). there's a citation graph across the full corpus where I've classified relationships as followed...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket