MultiLingua

Cross-lingual NLP benchmarks and models for low-resource languages

MultiLingua project screenshot

MultiLingua provides benchmarks, pre-trained multilingual embeddings, and evaluation suites for cross-lingual NLP research. It focuses on low-resource languages that are underrepresented in existing benchmarks. The project includes datasets for 45 languages spanning 12 language families, with a focus on African and Southeast Asian languages.

features

  • Benchmarks covering 45 languages and 12 language families
  • Pre-trained multilingual embeddings optimized for low-resource transfer
  • Evaluation suites for NER, POS tagging, sentiment, and QA
  • Data collection tools and annotation guidelines
  • Active community with regular benchmark updates

© 2026 You R. Name. Powered by Sitelas.