Zhen Wang

Publications

2025

Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models

Yanbin Yin*, Kun Zhou*, Zhen Wang*, Xiangdong Zhang, Yifei Shao, Shibo Hao, Yi Gu, Jieyuan Liu, Somanshu Singla, Tianyang Liu, Eric P Xing, Zhengzhong Liu, Haojian Jin, Zhiting Hu
PDF

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter
PDF (ICLR 2025)

2024

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Somanshu Singla*, Zhen Wang*, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing
PDF (EMNLP 2024)

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Shibo Hao*, Yi Gu*, Haotian Luo*, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu
[COLM 2024] 1st Conference on Language Modeling
Also presented at ICLR 2024 Workshop LLMAgents
PDF / Code

LLM Reasoners is a library to enable LLMs to conduct complex reasoning, with advanced reasoning algorithms. It approaches multi-step reasoning as planning and searches for the optimal reasoning chain, which achieves the best balance of exploration vs exploitation with the idea of "World Model" and "Reward."

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Xinyuan Wang*, Chenxi Li*, Zhen Wang*, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric Xing, Zhiting Hu
[ICLR 2024] The Twelfth International Conference on Learning Representations
Also presented at SoCal NLP 2023
PDF / Code / Slides / Poster

Tired of manual prompt engineering? PromptAgent offers the first principled framework to formalize the problem of API-based prompt optimization (state, action, reward, etc); also the first to benchmark exploration efficiency and show the transferability of optimized prompts. Targeting for expert-level prompting, there are many exciting directions ahead of PromptAgent!

GPT Is Becoming a Turing Machine: Here Are Some Ways to Program It

Ana Jojic, Zhen Wang, Nebojsa Jojic
Appear at ICLR 2024 AGI Workshop
PDF / Code

Through appropriate prompting, GPT models can be triggered to perform iterative behaviors necessary to execute (rather than just write or recall) programs that involve loops, including several popular algorithms found in computer science curricula, e.g., logical deduction, bubble sort, longest common subsequence, etc.

2023

Reasoning with Language Model is Planning with World Model

Shibo Hao*, Yi Gu*, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu
[EMNLP 2023] (Oral, Main) The 2023 Conference on Empirical Methods in Natural Language Processing
Also presented at NeurIPS GenPlan'23 Workshop / SoCal NLP 2023
PDF / Code / Slides / Poster / Featured in State of AI Report 2023

LLMs lack internal world models for effective reasoning. Reasoning via Planning (RAP) reformulates LLM reasoning as a planning problem, thus incorporating an external world model and principled planning seamlessly. This is a new framework applicable across varying tasks and an exciting direction for LLM augmentation research.

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu
[NeurIPS 2023] (Oral) Thirty-seventh Conference on Neural Information Processing Systems
Also presented at SoCal NLP 2023, Best Paper Award
PDF / Code / Slides / Poster

ToolkenGPT augments LLMs with massive tools/APIs by representing tools as tokens (“toolken”) and enabling tool calls in the same way as generating regular words. ToolkenGPT is super efficient for learning massive tools, as plugging in new tools is as easy as learning embeddings.

ThinkSum: Probabilistic Reasoning Over Sets Using Large Language Models

Batu Ozturkler, Nikolay Malkin, Zhen Wang, Nebojsa Jojic
[ACL 2023] The 61st Annual Meeting of the Association for Computational Linguistics (Main)
PDF / Code / Slides / Poster

We propose a two-stage probabilistic inference paradigm, ThinkSum, to improve LLMs' abilities of reasoning over multiple objects in two steps, Think (e.g., retrieval of associations) and Sum (e.g., aggregation of results), which beats chain-of-thought prompting in hard BIG-bench tasks.

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim
[ICLR 2023] The Eleventh International Conference on Learning Representations
PDF / Code / Slides / Poster / Huggingface PEFT PR

We propose Multitask Prompt Tuning (MPT) to exploit the rich cross-task knowledge for more efficient and generalizable transfer learning. MPT learns a single transferrable soft prompt through the use of a novel combination of prompt decomposition and prompt distillation.

Entity Tracking via Effective Use of Multi-Task Learning Models

Janvijay Singh, Fan Bai, Zhen Wang
[EACL 2023] The 17th Conference of the European Chapter of the Association for Computational Linguistics (Main)
PDF / Code / Slides / Poster

How to transfer multi-task knowledge from pre-training to niche downstream tasks, such as entity tracking on the procedural text? We show that you can reach STOA performance by simply fine-tuning T5 but with specialized QA prompts and task-specific decoding.

2022

Toward Knowledge-Centric NLP: Acquisition, Representation, Transfer, and Reasoning

Zhen Wang
The Ohio State University, Ph.D. Dissertation, 2022
PDF

Coherence Boosting: When Your Pretrained Language Model is Not Paying Enough Attention

Nikolay Malkin, Zhen Wang, Nebojsa Jojic
[ACL 2022] The 60th Annual Meeting of the Association for Computational Linguistics
PDF / Code / Slides / Poster (Long Paper, Oral Presentation)

We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. We present Coherence Boosting, an inference procedure that increases an LM’s focus on a long context, which greatly improves NLG and NLU tasks.

Knowledge Transfer between Structured and Unstructured Sources for Complex Question Answering

Lingbo Mo*, Zhen Wang*, Jie Zhao, Huan Sun
[SUKI@NAACL 2022] NAACL 2022 Structured and Unstructured Knowledge Integration
PDF / Code / Slides / Poster *Equal contribution

We study knowledge transfer for multi-hop reasoning processes between structured (Knowledge Base) and unstructured (text corpus) knowledge. We design SimultQA unifying KBQA and TextQA systems and leverage it to study how reasoning is transferred between two knowledge sources.

2021

Bootstrapping a User-Centered Task-Oriented Dialogue System

Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
[Alexa Prize TaskBot Challenge] 1st Proceedings of Alexa Prize TaskBot (Alexa Prize 2021)
PDF / Third-place honor in the TaskBot Finals!

We build TacoBot, a task-oriented dialogue system for the inaugural Alexa Prize TaskBot Challenge, to assist users in multi-step cooking and home improvement tasks. We propose several data augmentation methods, such as GPT-3 simulation, to bootstrap neural dialogue systems into new domains and make them more robust to noise user initiatives.

Modeling Context Pair Interaction for Pairwise Tasks on Graphs

Zhen Wang, Bo Zong, Huan Sun
[WSDM 2021] The 14th ACM International Conference on Web Search and Data Mining
PDF / Code / Slides / Poster (Long Paper, Online Presentation)

We propose to explicitly model context interactions for pairwise prediction tasks on graphs, which consist of two perspectives, node-centric and pair-centric. We also propose to pre-train pair embeddings to facilitate the pair-centric model.

2020

Rationalizing Medical Relation Prediction from Corpus-level Statistics

Zhen Wang, Jennifer Lee, Simon Lin, Huan Sun
[ACL 2020] The 58th Annual Meeting of the Association for Computational Linguistics
PDF / Code / Slides / Poster / Video (Long Paper, Online Presentation)

We propose a self-interpretable framework to rationalize the neural relation prediction based on corpus-level statistics. This framework is inspired by human cognitive theory about recall and recognition, which provides structured knowledge triplets as rationales.

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon Lin, Wen Zhang, Ping Zhang, Huan Sun
[Bioinformatics] Volume 36, Issue 4, 15 February 2020, Pages 1241-1251
PDF / Code / Slides / Poster

We benchmark 11 representative graph embedding methods on five important biomedical tasks. We verify the effectiveness of recent graph embedding methods and provide general guidelines for their usage.

2019

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun
[KDD 2019] The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PDF / Code / Slides / Poster (Research Track, Long Paper, Oral Presentation)

We propose to discover structured knowledge--synonyms--from the privacy-aware text corpus and present a novel framework to leverage both surface form and context information to discover out-of-distribution synonyms.

Before 2019

A Comprehensive Study of StaQC for Deep Code Summarization

Jayavardhan Reddy Peddamail, Ziyu Yao, Zhen Wang, Huan Sun
[KDD 2018 Deep Learning Day] The 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PDF / Code / Slides / Poster (SPOTLIGHT)

We examine three popular datasets mined from Stack Overflow on the code summarization task and show that StaQC (Stack Overflow Question-Code pairs) helps achieve substantially better results.

Hessian Regularized Sparse Coding for Human Action Recognition

Weifeng Liu, Zhen Wang, Dapeng Tao, Jun Yu
[MMM 2015] The 21th International Conference on Multimedia Modeling
PDF / Code / Slides / Poster / Bibtex

@inproceedings{liu2015hessian,
                  title={Hessian regularized sparse coding for human action recognition},
                  author={Liu, Weifeng and Wang, Zhen and Tao, Dapeng and Yu, Jun},
                  booktitle={International Conference on Multimedia Modeling},
                  pages={502--511},
                  year={2015},
                  organization={Springer}
                  }

We propose Hessian regularized sparse coding (HessianSC) for action recognition, which can preserve the local geometry well and steer the sparse coding varying linearly along the manifold of data distribution.