About Me
I am a recent graduate from the Indian Institute of Technology (IIT) Patna with a Bachelor of Science in Mathematics and Computing, achieving a CPI of 8.92. My academic journey has equipped me with a strong foundation in computer science, mathematics, and machine learning.
My primary research interests lie in the fields of Natural Language Processing, Computer Vision, and Generative AI. I have had the privilege of interning at prestigious research labs like Fujitsu Research, TCS Research, and IIIT Hyderabad, where I've contributed to cutting-edge projects on multi-modal models, diffusion models, and NLP for Indian languages.
Education
Indian Institute of Technology (IIT) Patna
B.S. in Mathematics and Computing
2021 - 2025
CPI: 8.92/10.0
Experience
Jan 2025 - July 2025
Fujitsu Research of India
AI-Research Intern
- Worked on a project focused on enhancing the ability of Multi-modal Large Language Models (MLLMs) to locate and reason about fine-grained details within complex documents.
- Introduced Needle in Images (NiM), a novel benchmark with 1180 Q/A pairs from Newspapers, Academic papers, Restaurant menus, Lecture slides, and Magazines.
- Worked on Spot-IT, a novel methodology based on query guided dynamic attention to enhance MLLM capabilities, achieving a 21.05% improvement on the NiM-Benchmark with GPT-4o.
- Worked on a new library for creating multi-agentic workflows with greater ease and control. Built multiple workflows for applications like AI Tutor, AI Scientist, and more.
July 2024 - Aug 2024
TCS Research & Innovation Lab
Research Intern
- Researched fine-grained preference alignment in text-to-image diffusion models by developing a controllable synthetic data generation framework that simulates diverse failure modes.
- Designed PreFine, a rule-based synthetic pipeline that applies structured perturbations to high-quality images and ranks them using dispersion-aware scoring and difficulty-based curriculum sampling.
- Fine-tuned Stable Diffusion 1.5 and SDXL-Base using Diffusion-DPO + LoRA on the generated dataset, achieving up to +15.2 gain in win rates for ImageReward and +13.0 in Aesthetic Score.
Aug 2023 - Feb 2024
AI-NLP Lab, IIT Patna
Undergraduate Research Assistant
- Proposed a new framework Hin-DPO with an improved loss function for the Hindi news explanation generation task, based on the DPO technique combined with Curriculum learning.
- Introduced a Preference Ranking-based synthetic dataset for Hindi news explanations, with preferred responses scraped from various websites and rejected responses generated using various LLMs.
- Fine tuned various LLM's (Gemma2, Llama3.2, and Mistral) and PLM's (mBART and mT5) for the alignment task and find the best METEOR score of 31.74 and BERTScore of 80.02.
May 2023 - June 2023
LTRC Lab, IIIT Hyderabad
Research Intern
- Collaborated with Dr. Manish Shrivastava on a project of Natural Language Generation from graphical representations, including AMR and Universal Dependency (UD) Relations for Indian Languages.
- Fine-tuned the BART and IndicBART models by transforming graphs into a linearized format using Preorder, Postorder, and Inorder traversal techniques.
- Integrated the attention mask with the graph's adjacency matrix to improve token relationship modeling, achieving METEOR scores of 0.76 for Hindi and 0.82 for English in UD Relations.
Publications
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
Association for Computational Linguistics (ACL 2025)
Status: Accepted
The Preference is in the Details: Text-to-Image Preference Alignment with Fine-grained Visual Cues
Winter Conference on Applications of Computer Vision (WACV 2025)
Status: Under Review
Sifting Truth from Spectacle! A Multimodal Hindi Dataset for Misinformation Detection...
IEEE Transactions on Affective Computing
Status: Under Review
From Generation to Detection: Multimodal Generative AI and the Threat of Automated Misinformation
European Conference on Artificial Intelligence (ECAI 2025)
Status: Under Review
From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations
Transactions on Machine Learning Research (TMLR)
Status: Under Review
GANtleTRUTH: Generating Polite, Fluent, and Factual Counter-Narratives for Fake News Using GANs
Empirical Methods in Natural Language Processing (EMNLP 2025)
Status: Under Review
Projects
Multiturn Chatbot with Query Expansion
A conversational AI fine-tuned on Llama2-7b with LoRA/QLoRA, capable of query expansion and domain prediction, using Gemini-1.5-pro for conversation summarization.
Satellite Image To Map Translator
A Pix2Pix GAN with an Attention U-Net generator and PatchGAN discriminator, enhanced with Deformable Convolutions, to translate satellite images into map visuals with an SSIM score of 0.78.