Pulkit Bansal - AI Researcher

About Me

I am a recent graduate from the Indian Institute of Technology (IIT) Patna with a Bachelor of Science in Mathematics and Computing, achieving a CPI of 8.92. My academic journey has equipped me with a strong foundation in computer science, mathematics, and machine learning.

My primary research interests lie in the fields of Natural Language Processing, Computer Vision, and Generative AI. I have had the privilege of interning at prestigious research labs like Fujitsu Research, TCS Research, and IIIT Hyderabad, where I've contributed to cutting-edge projects on multi-modal models, diffusion models, and NLP for Indian languages.

Education

Indian Institute of Technology (IIT) Patna

B.S. in Mathematics and Computing

2021 - 2025

CPI: 8.92/10.0

Experience

Jan 2025 - July 2025

Fujitsu Research of India

AI-Research Intern

Worked on a project focused on enhancing the ability of Multi-modal Large Language Models (MLLMs) to locate and reason about fine-grained details within complex documents.
Introduced Needle in Images (NiM), a novel benchmark with 1180 Q/A pairs from Newspapers, Academic papers, Restaurant menus, Lecture slides, and Magazines.
Worked on Spot-IT, a novel methodology based on query guided dynamic attention to enhance MLLM capabilities, achieving a 21.05% improvement on the NiM-Benchmark with GPT-4o.
Worked on a new library for creating multi-agentic workflows with greater ease and control. Built multiple workflows for applications like AI Tutor, AI Scientist, and more.

July 2024 - Aug 2024

TCS Research & Innovation Lab

Research Intern

Researched fine-grained preference alignment in text-to-image diffusion models by developing a controllable synthetic data generation framework that simulates diverse failure modes.
Designed PreFine, a rule-based synthetic pipeline that applies structured perturbations to high-quality images and ranks them using dispersion-aware scoring and difficulty-based curriculum sampling.
Fine-tuned Stable Diffusion 1.5 and SDXL-Base using Diffusion-DPO + LoRA on the generated dataset, achieving up to +15.2 gain in win rates for ImageReward and +13.0 in Aesthetic Score.

Aug 2023 - Feb 2024

AI-NLP Lab, IIT Patna

Undergraduate Research Assistant

Proposed a new framework Hin-DPO with an improved loss function for the Hindi news explanation generation task, based on the DPO technique combined with Curriculum learning.
Introduced a Preference Ranking-based synthetic dataset for Hindi news explanations, with preferred responses scraped from various websites and rejected responses generated using various LLMs.
Fine tuned various LLM's (Gemma2, Llama3.2, and Mistral) and PLM's (mBART and mT5) for the alignment task and find the best METEOR score of 31.74 and BERTScore of 80.02.

May 2023 - June 2023

LTRC Lab, IIIT Hyderabad

Research Intern

Collaborated with Dr. Manish Shrivastava on a project of Natural Language Generation from graphical representations, including AMR and Universal Dependency (UD) Relations for Indian Languages.
Fine-tuned the BART and IndicBART models by transforming graphs into a linearized format using Preorder, Postorder, and Inorder traversal techniques.
Integrated the attention mask with the graph's adjacency matrix to improve token relationship modeling, achieving METEOR scores of 0.76 for Hindi and 0.82 for English in UD Relations.

Publications

View Google Scholar Profile

Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?

Association for Computational Linguistics (ACL 2025)

Status: Accepted

The Preference is in the Details: Text-to-Image Preference Alignment with Fine-grained Visual Cues

Winter Conference on Applications of Computer Vision (WACV 2025)

Status: Under Review

Sifting Truth from Spectacle! A Multimodal Hindi Dataset for Misinformation Detection...

IEEE Transactions on Affective Computing

Status: Under Review

From Generation to Detection: Multimodal Generative AI and the Threat of Automated Misinformation

European Conference on Artificial Intelligence (ECAI 2025)

Status: Under Review

From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations

Transactions on Machine Learning Research (TMLR)

Status: Under Review

GANtleTRUTH: Generating Polite, Fluent, and Factual Counter-Narratives for Fake News Using GANs

Empirical Methods in Natural Language Processing (EMNLP 2025)

Status: Under Review

Projects

Multiturn Chatbot with Query Expansion

A conversational AI fine-tuned on Llama2-7b with LoRA/QLoRA, capable of query expansion and domain prediction, using Gemini-1.5-pro for conversation summarization.

Python PyTorch Hugging Face LLMs

View Source Code

Satellite Image To Map Translator

A Pix2Pix GAN with an Attention U-Net generator and PatchGAN discriminator, enhanced with Deformable Convolutions, to translate satellite images into map visuals with an SSIM score of 0.78.

Python PyTorch GANs Computer Vision

View Source Code

Skills

Programming & Core CS

C/C++ Python R MATLAB Data Structures & Algorithms OOP

Machine Learning & Deep Learning

PyTorch TensorFlow Hugging Face Scikit-Learn NumPy Pandas Convex Optimization

Tools & Platforms

Git Jupyter Notebook Google Colab VS Code