Quick Navigation

2026

FIRE-Bench

FIRE-Bench: Evaluating Research Agents on the Rediscovery of Scientific Insights

Zhen Wang*, Fan Bai*, Zhongyan Luo*, Jinyan Su, Kaiser Sun, Xinle Yu, Jieyuan Liu, Kun Zhou, Claire Cardie, Mark Dredze, Eric P. Xing, Zhiting Hu

Preprint

A benchmark for reliably evaluating research agents on their ability to rediscover scientific insights.

Nabla Reasoner

Nabla Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space

Peihao Wang, Ruisi Cai, Zhen Wang, Hongyuan Mei, Qiang Liu, Pan Li, Zhangyang Wang

ICLR 2026

A novel approach to LLM reasoning through test-time gradient descent operations in textual space.

Simulating Humans

Simulating Humans for Personalized Language Modeling

Jinzhou Tang, Yufan Zhou, Zixuan Wang, Xinle Yu, Zhaoxiang Feng, Steven Ngo, Zhengding Hu, Luoshang Pan, Lianhui Qin, Yufei Ding, Tianmin Shu, Jingbo Shang, Zhiting Hu, Zhen Wang

In submission

A framework for simulating human behavior to enable personalized language modeling.

HypoEvolve

HypoEvolve: When Genetic Algorithm Meets Multi-Agents for Discovering Scientific Hypothesis

Jieyuan Liu, Mengzhou Hu, Jefferson Chen, Hsin-Yuan Lee, Dexter Pratt, Lianhui Qin, Trey Ideker, Zhiting Hu, Eric P. Xing, Wei Wang, Zhen Wang

In submission

A novel approach combining genetic algorithms with multi-agent systems for automated scientific hypothesis discovery.

TritonDFT

TritonDFT: Automating DFT with a Multi-Agent Framework

Zhengding Hu, Kuntal Talit, Zhen Wang, Haseeb Ahmad, Yichen Lin, Prabhleen Kaur, Christopher Lane, Elizabeth A. Peterson, Zhiting Hu, Elizabeth A. Nowadnick, Yufei Ding

In submission

A multi-agent framework for automating Density Functional Theory (DFT) calculations.

CellMaster.AI

CellMaster: Collaborative Cell Type Annotation in Single-Cell Analysis

Zhen Wang*, Yiming Gao*, Jieyuan Liu*, Enze Ma, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Zhiting Hu, Wei Wang, Trey Ideker, Eric P. Xing

In submission

A collaborative AI scientist agent for automated cell type annotation in single-cell transcriptomics analysis.

M3 Memory

M³: Multi-Tier Memory Managing System for Agentic LLM Serving

Zhengding Hu, Zaifeng Pan, Prabhleen Kaur, Vibha Murthy, Zhongkai Yu, Yue Guan, Zhen Wang, Steven Swanson, Yufei Ding

In submission

A multi-tier memory managing system designed for efficient agentic LLM serving.

Modal-mixed CoT

Learning Modal-mixed Chain-of-thought Reasoning with Latent Embedding

Yifei Shao, Kun Zhou, Mohammad Atif Quamar, Ziming Xu, Shibo Hao, Zhen Wang, Zhiting Hu, Biwei Huang

In submission

Learning chain-of-thought reasoning that seamlessly mixes different modalities through latent embeddings.

2025

scPilot

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao*, Zhen Wang*†, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Jieyuan Liu, Enze Ma, Zhiting Hu, Eric P. Xing

NeurIPS 2025

The first omics-native reasoning agent that grounds LLMs in raw single-cell data for automated analysis and biological discovery.

Nature 2025

Atlas-Guided Discovery of Transcription Factors for T Cell Programming

H. Kay Chung, Cong Liu, Anamika Battu, ... Zhen Wang, Jieyuan Liu, Yiming Gao, Zhiting Hu, ... Wei Wang

Nature 2025 (In Press)

Contributed TaijiChat, a Paper Copilot for multi-omics discovery of transcription factors for T cell programming.

MutationProjector

A Foundation Model of Cancer Genotype Enables Precise Predictions of Therapeutic Response

JungHo Kong, Ingoo Lee, Dean Boecher, Akshat Singhal, Marcus Kelly, Jimin Moon, Chang Ho Ahn, Chan-Young Ock, Tannavee Kumar, Timothy John Sears, David Laub, Sarah Wright, Patrick Wall, Hannah Carter, Zhen Wang†, Trey Ideker† (†co-corresponding authors)

Under Revision at Cancer Discovery

The first cancer genomics foundation model for tumor mutation profiles, enabling precise predictions of therapeutic response.

DeepPersona

DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas

Zhen Wang*, Yufan Zhou*, Zhongyan Luo, Lyumanshan Ye, Adam Wood, Man Yao, Saab Mansour, Luoshang Pan

Spotlight at NeurIPS 2025 LAW

A generative engine for creating deep synthetic personas that enable realistic human simulation at scale.

Decentralized Arena

Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models

Yanbin Yin*, Kun Zhou*, Zhen Wang*, Xiangdong Zhang, Yifei Shao, Shibo Hao, Yi Gu, Jieyuan Liu, Somanshu Singla, Tianyang Liu, Eric P. Xing, Zhengzhong Liu, Haojian Jin, Zhiting Hu

Democratic LLM benchmarking where models judge each other for scalable and fair evaluation.

Self-MoE

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

ICLR 2025

Transforms monolithic LLMs into modular systems with self-specialized experts for compositional capabilities.

2024

DRPO

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Somanshu Singla*, Zhen Wang*†, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing

EMNLP 2024 (Main, Long)

First tuning-free method for self-aligning LLMs with human preferences through dynamic rewarding and prompt optimization.

LLM Reasoners

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Shibo Hao*, Yi Gu*, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

COLM 2024 ⭐ 2.3k+ GitHub Stars

A library enabling LLMs to conduct complex reasoning with advanced algorithms, approaching multi-step reasoning as planning with world models and rewards.

PromptAgent

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Xinyuan Wang*, Chenxi Li*, Zhen Wang*†, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu

ICLR 2024

First principled framework to formalize API-based prompt optimization as planning with state, action, and reward; first to benchmark exploration efficiency and show transferability of optimized prompts.

GPT Turing Machine

GPT Is Becoming a Turing Machine: Here Are Some Ways to Program It

Ana Jojic, Zhen Wang, Nebojsa Jojic

ICLR 2024 AGI Workshop

Through appropriate prompting, GPT models can perform iterative behaviors to execute (not just write) programs with loops, including algorithms like logical deduction, bubble sort, and LCS.

2023

RAP

Reasoning with Language Model is Planning with World Model

Shibo Hao*, Yi Gu*, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu

EMNLP 2023 (Oral, Main) Featured in State of AI Report 2023

RAP reformulates LLM reasoning as a planning problem, incorporating external world models and principled planning for the best balance of exploration vs exploitation.

ToolkenGPT

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu

NeurIPS 2023 Oral (Top 2%) Best Paper @ SoCal NLP 2023

Augments LLMs with massive tools/APIs by representing tools as tokens ("toolken"), enabling tool calls as naturally as generating words. Super efficient, plugging in new tools is as easy as learning embeddings.

ThinkSum

ThinkSum: Probabilistic Reasoning Over Sets Using Large Language Models

Batu Ozturkler, Nikolay Malkin, Zhen Wang, Nebojsa Jojic

ACL 2023 (Main)

A two-stage probabilistic inference paradigm to improve LLMs' reasoning over multiple objects through Think (retrieval) and Sum (aggregation), beating chain-of-thought on hard BIG-bench tasks.

MPT

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

ICLR 2023

Multitask Prompt Tuning (MPT) exploits rich cross-task knowledge for efficient and generalizable transfer learning through novel prompt decomposition and distillation.

MeeT

Frustratingly Simple Entity Tracking with Effective Use of Multi-Task Learning Models

Janvijay Singh, Fan Bai, Zhen Wang

EACL 2023 (Main)

Shows how to transfer multi-task knowledge from pre-training to niche downstream tasks like entity tracking, achieving SOTA by fine-tuning T5 with specialized QA prompts and task-specific decoding.

SIGDIAL 2023

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

SIGDIAL 2023

A collaborative and engaging task-oriented dialogue system for multi-step cooking and home improvement tasks.

2022

Coherence Boosting

Coherence Boosting: When Your Pretrained Language Model is Not Paying Enough Attention

Nikolay Malkin, Zhen Wang, Nebojsa Jojic

ACL 2022 (Main, Long, Oral)

Demonstrates that LLMs have insufficiently learned the effect of distant words on next-token prediction. Coherence Boosting increases an LM's focus on long context, greatly improving NLG and NLU tasks.

SimultQA

Knowledge Transfer between Structured and Unstructured Sources for Complex Question Answering

Lingbo Mo*, Zhen Wang*, Jie Zhao, Huan Sun

NAACL 2022 SUKI Workshop

Studies knowledge transfer for multi-hop reasoning between structured (KB) and unstructured (text) knowledge. SimultQA unifies KBQA and TextQA to study how reasoning transfers between knowledge sources.

Dissertation

Toward Knowledge-Centric NLP: Acquisition, Representation, Transfer, and Reasoning

Zhen Wang

Ph.D. Dissertation, The Ohio State University, 2022

Doctoral dissertation on building foundations for knowledge-centric AI systems through acquisition, representation, transfer, and reasoning.

2021

TacoBot

Bootstrapping a User-Centered Task-Oriented Dialogue System

Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

Alexa Prize TaskBot Challenge 2021 🏆 3rd Place Winner

TacoBot, a task-oriented dialogue system for cooking and home improvement tasks. Proposes data augmentation methods including GPT-3 simulation to bootstrap neural dialogue systems into new domains.

ConPI

Modeling Context Pair Interaction for Pairwise Tasks on Graphs

Zhen Wang, Bo Zong, Huan Sun

WSDM 2021 (Long)

Explicitly models context interactions for pairwise prediction on graphs through node-centric and pair-centric perspectives, with pre-trained pair embeddings to facilitate pair-centric modeling.

2020

X-MedRELA

Rationalizing Medical Relation Prediction from Corpus-level Statistics

Zhen Wang, Jennifer Lee, Simon Lin, Huan Sun

ACL 2020 (Main, Long)

A self-interpretable framework to rationalize neural relation prediction based on corpus-level statistics, inspired by human cognitive theory about recall and recognition, providing structured knowledge triplets as rationales.

BioNEV

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon Lin, Wen Zhang, Ping Zhang, Huan Sun

Bioinformatics, Volume 36, Issue 4, February 2020

Benchmarks 11 representative graph embedding methods on five important biomedical tasks, verifying effectiveness and providing general guidelines for their usage.

2019

SurfCon

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun

KDD 2019 (Research Track, Long, Oral)

Discovers structured knowledge, synonyms, from privacy-aware clinical text corpus, leveraging both surface form and context information to discover out-of-distribution synonyms.

Before 2019

StaQC

A Comprehensive Study of StaQC for Deep Code Summarization

Jayavardhan Reddy Peddamail, Ziyu Yao, Zhen Wang, Huan Sun

KDD 2018 Deep Learning Day Spotlight

Examines three popular datasets mined from Stack Overflow on code summarization, showing that StaQC (Stack Overflow Question-Code pairs) achieves substantially better results.

HessianSC

Hessian Regularized Sparse Coding for Human Action Recognition

Weifeng Liu, Zhen Wang, Dapeng Tao, Jun Yu

MMM 2015

Proposes Hessian regularized sparse coding (HessianSC) for action recognition, preserving local geometry and steering sparse coding linearly along the data manifold.

Knowledge-Structured Representation Learning

MPT

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

ICLR 2023

Multitask Prompt Tuning (MPT) exploits rich cross-task knowledge for efficient and generalizable transfer learning through novel prompt decomposition and distillation.

Coherence Boosting

Coherence Boosting: When Your Pretrained Language Model is Not Paying Enough Attention

Nikolay Malkin, Zhen Wang, Nebojsa Jojic

ACL 2022 (Main, Long, Oral)

Demonstrates that LLMs have insufficiently learned the effect of distant words on next-token prediction. Coherence Boosting increases an LM's focus on long context, greatly improving NLG and NLU tasks.

ConPI

Modeling Context Pair Interaction for Pairwise Tasks on Graphs

Zhen Wang, Bo Zong, Huan Sun

WSDM 2021 (Long)

Explicitly models context interactions for pairwise prediction on graphs through node-centric and pair-centric perspectives, with pre-trained pair embeddings to facilitate pair-centric modeling.

X-MedRELA

Rationalizing Medical Relation Prediction from Corpus-level Statistics

Zhen Wang, Jennifer Lee, Simon Lin, Huan Sun

ACL 2020 (Main, Long)

A self-interpretable framework to rationalize neural relation prediction based on corpus-level statistics, inspired by human cognitive theory about recall and recognition, providing structured knowledge triplets as rationales.

SurfCon

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun

KDD 2019 (Research Track, Long, Oral)

Discovers structured knowledge, synonyms, from privacy-aware clinical text corpus, leveraging both surface form and context information to discover out-of-distribution synonyms.

SimultQA

Knowledge Transfer between Structured and Unstructured Sources for Complex Question Answering

Lingbo Mo*, Zhen Wang*, Jie Zhao, Huan Sun

NAACL 2022 SUKI Workshop

Studies knowledge transfer for multi-hop reasoning between structured (KB) and unstructured (text) knowledge. SimultQA unifies KBQA and TextQA to study how reasoning transfers between knowledge sources.

StaQC

A Comprehensive Study of StaQC for Deep Code Summarization

Jayavardhan Reddy Peddamail, Ziyu Yao, Zhen Wang, Huan Sun

KDD 2018 Deep Learning Day Spotlight

Examines three popular datasets mined from Stack Overflow on code summarization, showing that StaQC (Stack Overflow Question-Code pairs) achieves substantially better results.

Efficient Training and Adaptation of Foundation Models

Self-MoE

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

ICLR 2025

Transforms monolithic LLMs into modular systems with self-specialized experts for compositional capabilities.

MPT

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

ICLR 2023

Multitask Prompt Tuning (MPT) exploits rich cross-task knowledge for efficient and generalizable transfer learning through novel prompt decomposition and distillation.

ToolkenGPT

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu

NeurIPS 2023 Oral (Top 2%) Best Paper @ SoCal NLP 2023

Augments LLMs with massive tools/APIs by representing tools as tokens ("toolken"), enabling tool calls as naturally as generating words. Super efficient, plugging in new tools is as easy as learning embeddings.

MeeT

Frustratingly Simple Entity Tracking with Effective Use of Multi-Task Learning Models

Janvijay Singh, Fan Bai, Zhen Wang

EACL 2023 (Main)

Shows how to transfer multi-task knowledge from pre-training to niche downstream tasks like entity tracking, achieving SOTA by fine-tuning T5 with specialized QA prompts and task-specific decoding.

M3 Memory

M³: Multi-Tier Memory Managing System for Agentic LLM Serving

Zhengding Hu, Zaifeng Pan, Prabhleen Kaur, Vibha Murthy, Zhongkai Yu, Yue Guan, Zhen Wang, Steven Swanson, Yufei Ding

In submission to OSDI 2026

A multi-tier memory managing system designed for efficient agentic LLM serving.

Agentic Reasoning and Planning with World Models

scPilot

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao*, Zhen Wang*†, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Jieyuan Liu, Enze Ma, Zhiting Hu, Eric P. Xing

NeurIPS 2025

The first omics-native reasoning agent that grounds LLMs in raw single-cell data for automated analysis and biological discovery.

CellMaster.AI

CellMaster.AI: A Collaborative AI Scientist Agent for Cell Type Annotation in Single-Cell Transcriptomics Analysis

Zhen Wang, Yiming Gao, Jieyuan Liu, Mark Antkowiak, Enze Ma, Ding Bai, JungHo Kong, Mengzhou Hu, Dexter Pratt, Jefferson Chen, Jiajun Zhu, Trey Ideker, Zhiting Hu, Eric P. Xing

In submission to ISCB 2026

A collaborative AI scientist agent for automated cell type annotation in single-cell transcriptomics analysis.

Nabla Reasoner

Nabla Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space

Peihao Wang, Ruisi Cai, Zhen Wang, Hongyuan Mei, Qiang Liu, Pan Li, Zhangyang Wang

In submission to ICLR 2026

A novel approach to LLM reasoning through test-time gradient descent operations in textual space.

Modal-mixed CoT

Learning Modal-mixed Chain-of-thought Reasoning with Latent Embedding

Yifei Shao, Kun Zhou, Mohammad Atif Quamar, Ziming Xu, Shibo Hao, Zhen Wang, Zhiting Hu, Biwei Huang

In submission to ICLR 2026

Learning chain-of-thought reasoning that seamlessly mixes different modalities through latent embeddings.

LLM Reasoners

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Shibo Hao*, Yi Gu*, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

COLM 2024 ⭐ 2.3k+ GitHub Stars

A library enabling LLMs to conduct complex reasoning with advanced algorithms, approaching multi-step reasoning as planning with world models and rewards.

PromptAgent

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Xinyuan Wang*, Chenxi Li*, Zhen Wang*†, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu

ICLR 2024

First principled framework to formalize API-based prompt optimization as planning with state, action, and reward; first to benchmark exploration efficiency and show transferability of optimized prompts.

RAP

Reasoning with Language Model is Planning with World Model

Shibo Hao*, Yi Gu*, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu

EMNLP 2023 (Oral, Main) Featured in State of AI Report 2023

RAP reformulates LLM reasoning as a planning problem, incorporating external world models and principled planning for the best balance of exploration vs exploitation.

ToolkenGPT

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu

NeurIPS 2023 Oral (Top 2%) Best Paper @ SoCal NLP 2023

Augments LLMs with massive tools/APIs by representing tools as tokens ("toolken"), enabling tool calls as naturally as generating words. Super efficient, plugging in new tools is as easy as learning embeddings.

ThinkSum

ThinkSum: Probabilistic Reasoning Over Sets Using Large Language Models

Batu Ozturkler, Nikolay Malkin, Zhen Wang, Nebojsa Jojic

ACL 2023 (Main)

A two-stage probabilistic inference paradigm to improve LLMs' reasoning over multiple objects through Think (retrieval) and Sum (aggregation), beating chain-of-thought on hard BIG-bench tasks.

GPT Turing Machine

GPT Is Becoming a Turing Machine: Here Are Some Ways to Program It

Ana Jojic, Zhen Wang, Nebojsa Jojic

ICLR 2024 AGI Workshop

Through appropriate prompting, GPT models can perform iterative behaviors to execute (not just write) programs with loops, including algorithms like logical deduction, bubble sort, and LCS.

Human-Aligned Learning & Evaluation

DeepPersona

DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas

Zhen Wang*, Yufan Zhou*, Zhongyan Luo, Lyumanshan Ye, Adam Wood, Man Yao, Saab Mansour, Luoshang Pan

Spotlight at NeurIPS 2025 LAW Workshop; In submission to ICLR 2026

A generative engine for creating deep synthetic personas that enable realistic human simulation at scale.

FIRE-Bench

FIRE-Bench: Evaluating Research Agents on the Rediscovery of Scientific Insights

Zhen Wang*, Fan Bai*, Zhongyan Luo*, Jinyan Su, Kaiser Sun, Weiqi Liu, Albert Chen, Jieyuan Liu, Kun Zhou, Claire Cardie, Mark Dredze, Eric P. Xing, Zhiting Hu

In submission to ICLR 2026

A benchmark for reliably evaluating research agents on their ability to rediscover scientific insights.

Decentralized Arena

Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models

Yanbin Yin*, Kun Zhou*, Zhen Wang*, Xiangdong Zhang, Yifei Shao, Shibo Hao, Yi Gu, Jieyuan Liu, Somanshu Singla, Tianyang Liu, Eric P. Xing, Zhengzhong Liu, Haojian Jin, Zhiting Hu

In submission to ACL 2026

Democratic LLM benchmarking where models judge each other for scalable and fair evaluation.

DRPO

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Somanshu Singla*, Zhen Wang*†, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing

EMNLP 2024 (Main, Long)

First tuning-free method for self-aligning LLMs with human preferences through dynamic rewarding and prompt optimization.

TacoBot

Bootstrapping a User-Centered Task-Oriented Dialogue System

Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

Alexa Prize TaskBot Challenge 2021 🏆 3rd Place Winner

TacoBot, a task-oriented dialogue system for cooking and home improvement tasks. Proposes data augmentation methods including GPT-3 simulation to bootstrap neural dialogue systems into new domains.

SIGDIAL 2023

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

SIGDIAL 2023

A collaborative and engaging task-oriented dialogue system for multi-step cooking and home improvement tasks.

Scientific Foundation Models & AI Scientists

scPilot

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao*, Zhen Wang*†, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Jieyuan Liu, Enze Ma, Zhiting Hu, Eric P. Xing

NeurIPS 2025

The first omics-native reasoning agent that grounds LLMs in raw single-cell data for automated analysis and biological discovery.

CellMaster.AI

CellMaster.AI: A Collaborative AI Scientist Agent for Cell Type Annotation in Single-Cell Transcriptomics Analysis

Zhen Wang, Yiming Gao, Jieyuan Liu, Mark Antkowiak, Enze Ma, Ding Bai, JungHo Kong, Mengzhou Hu, Dexter Pratt, Jefferson Chen, Jiajun Zhu, Trey Ideker, Zhiting Hu, Eric P. Xing

In submission to ISCB 2026

A collaborative AI scientist agent for automated cell type annotation in single-cell transcriptomics analysis.

Nature 2025

Atlas-Guided Discovery of Transcription Factors for T Cell Programming

H. Kay Chung, Cong Liu, ... Zhen Wang, Jieyuan Liu, Yiming Gao, Zhiting Hu, ... Wei Wang

Nature 2025 Accepted

Contributed TaijiChat, a Paper Copilot for multi-omics discovery of transcription factors for T cell programming.

MutationProjector

A Foundation Model of Cancer Genotype Enables Precise Predictions of Therapeutic Response

JungHo Kong, Ingoo Lee, ... Hannah Carter, Zhen Wang†, Trey Ideker†

Under Revision at Cancer Discovery

The first cancer genomics foundation model for tumor mutation profiles, enabling precise predictions of therapeutic response.

BioNEV

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon Lin, Wen Zhang, Ping Zhang, Huan Sun

Bioinformatics, Volume 36, Issue 4, February 2020

Benchmarks 11 representative graph embedding methods on five important biomedical tasks, verifying effectiveness and providing general guidelines for their usage.

X-MedRELA

Rationalizing Medical Relation Prediction from Corpus-level Statistics

Zhen Wang, Jennifer Lee, Simon Lin, Huan Sun

ACL 2020 (Main, Long)

A self-interpretable framework to rationalize neural relation prediction based on corpus-level statistics, inspired by human cognitive theory about recall and recognition, providing structured knowledge triplets as rationales.

SurfCon

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun

KDD 2019 (Research Track, Long, Oral)

Discovers structured knowledge, synonyms, from privacy-aware clinical text corpus, leveraging both surface form and context information to discover out-of-distribution synonyms.