I’m a CS Ph.D. student @University of Southern California, advised by Prof. Jieyu Zhao. Before that, I was a M.Eng student at Graduated School of Creative Science and Engineering @Waseda University (早稲田大学), Tokyo, supervised by Prof. Masayuki Goto (Japanese only). I also spent my time as a research assistant @University of Maryland, advised by Prof.Tianyi Zhou, and University of Tokyo (東京大学), advised by Prof.Irene Li. I also work closely with Jieyu Zhang, who focuses on interactive and data-centric AI/ML.
Research Interests: My research interest lies in the realm of natural language processing and synthetic data. Specifically, I’m trying to answer the following questions:
- How can we comprehensively evaluate an LLM/VLM in different domains?
- How can we extend ability of LLM/VLM with minimal costs?
- How can we let LLM/VLMs collaborate safely, efficiently, and effectively to solve real-world problems?
📢 News
[04/08/2025] A new preprint is released. Check Efficient Reinforcement Finetuning via Adaptive Curriculum Learning for more details!
[03/31/2025] A new preprint is released. Check Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base for more details!
[04/10/2024] I will join CS@USC as a PhD student this fall!
📝 Selected Publications
(* denotes equal contribution)
Improving Language Models
- The Hallucination Tax of Reinforcement Finetuning
Linxin Song*, Taiwei Shi*, Jieyu Zhao - Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi, Yiyang Wu, Linxin Song, Tianyi Zhou, Jieyu Zhao - ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
Jieyu Zhang, Le Xue, Linxin Song, Jun Wang, Weikai Huang, Manli Shu, An Yan, Zixian Ma, Juan Carlos Niebles, silvio savarese, Caiming Xiong, Zeyuan Chen, Ranjay Krishna, Ran Xu
Posted by: VentureBeat | MarkTechPost - Investigating the Scaling Effect of Instruction Templates for Training Multimodal Language Model
Shijian Wang*, Linxin Song*, Jieyu Zhang, Ryotaro Shimizu, Ao Luo, Li Yao, Cunjian Chen, Julian McAuley, Haiqian Wu
Language Model Evaluation
- Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song, Xuwei Ding, Jieyu Zhang, Taiwei Shi, Ryotaro Shimizu, Rahul Gupta, Yang Liu, Jian Kang, Jieyu Zhao
Webpage - NLPBench: Evaluating Large Language Models on Solving NLP Problems
Linxin Song, Jieyu Zhang, Lechao Cheng, Pengyuan Zhou, Tianyi Zhou, Irene Li
ITIF @ NeurIPS 2023. - Explaining Length Bias in LLM-Based Preference Evaluations
Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Jingang Wang, Zhenyu Chen, Hui Xiong
Language Model Agent
- Adaptive In-conversation Team Building for Language Model Agents
Linxin Song*, Jiale Liu*, Jieyu Zhang, Shaokun Zhang, Ao Luo, Shijian Wang, Qingyun Wu, Chi Wang - Offline Training of Language Model Agents with Functions as Learnable Weights
Shaokun Zhang*, Jieyu Zhang*, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu
ICML 2024.
Before PhD
- Better Explain Transformers by Illuminating Important Information
Linxin Song, Yan Cui, Ao Luo, Freddy Lecue, Irene Li
EACL 2024 (findings). - Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision
Jieyu Zhang*, Linxin Song*, Alexander Ratner
AISTATS 2023. - Adaptive Ranking-based Sample Selection for Weakly Supervised Class-imbalanced Text Classification
Linxin Song, Jieyu Zhang, Tianxiang Yang, Masayuki Goto
EMNLP 2022 (findings).
🧑🏫 Teaching
- (TA) DSCI-250: Introduction to Data Science, 2024 Fall
- (TA) DSCI-566: Deep Learning and its Applications, 2025 Spring
👨💻 Internships
- Salesforce Research - Research Intern
2025.05-now
🏅 Professional Services
- Maintainer of AG2 (Autogen).
- Reviewer (for multiple years): WACV, KDD, NeurIPS, DMLR, ICLR, AISTATS, ACL, EMNLP, etc