Sanchit Ahuja

Publications

Most recent publications on Google Scholar.
^‡ indicates equal contribution.

Selected
All

Megaverse: Benchmarking large language models across languages, modalities, models and tasks

Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (NAACL-HLT 2024)

Paper Code

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

Sanchit Ahuja^‡, Kumar Tanmay^‡, Hardik Hansrajbhai Chauhan, Barun Patra, Kriti Aggarwal, Luciano Del Corro, Arindam Mitra, Tejas Indulal Dhamecha, Ahmed Awadallah, Monojit Choudhary, Vishrav Chaudhary, Sunayana Sitaram

Preprint

Paper

DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures

Agrima Seth, Sanchit Ahuja, Kalika Bali, Sunayana Sitaram

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Paper Code

Scaling Laws for Multilingual Language Models

Yifei He, Alon Benhaim, Barun Patra, Praneetha Vaddamanu, Sanchit Ahuja, Parul Chopra, Vishrav Chaudhary, Han Zhao, Xia Song

Findings of the Association for Computational Linguistics ACL 2025

Paper

Contamination Report for Multilingual Benchmarks

Sanchit Ahuja^‡, Varun Gumma^‡, Sunayana Sitaram

EvalEval Workshop at NeurIPS 2024

Paper

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata, Seid Muhie Yimam, Saif M Mohammad

Findings of the Association for Computational Linguistics ACL 2024

Paper Code

Hyphen: Hyperbolic hawkes attention for text streams

Shivam Agarwal, Ramit Sawhney, Sanchit Ahuja, Ritesh Soun, Sudheer Chava

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Paper Code

Megaverse: Benchmarking large language models across languages, modalities, models and tasks

Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (NAACL-HLT 2024)

Paper Code

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

Preprint

Paper

DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures

Agrima Seth, Sanchit Ahuja, Kalika Bali, Sunayana Sitaram

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Paper Code

Scaling Laws for Multilingual Language Models

Yifei He, Alon Benhaim, Barun Patra, Praneetha Vaddamanu, Sanchit Ahuja, Parul Chopra, Vishrav Chaudhary, Han Zhao, Xia Song

Findings of the Association for Computational Linguistics ACL 2025

Paper

Contamination Report for Multilingual Benchmarks

Sanchit Ahuja^‡, Varun Gumma^‡, Sunayana Sitaram

EvalEval Workshop at NeurIPS 2024

Paper

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

Findings of the Association for Computational Linguistics ACL 2024

Paper Code

SemEval-2024 task 1: Semantic textual relatedness for african and asian languages

Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M Mohammad

Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). Association for Computational Linguistics

Paper

Hyphen: Hyperbolic hawkes attention for text streams

Shivam Agarwal, Ramit Sawhney, Sanchit Ahuja, Ritesh Soun, Sudheer Chava

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Paper Code

Sanchit Ahuja

Bio

Publications

CV