Zhen Wang / 王震

Hi! I'm Zhen, currently a postdoctoral researcher at UC San Diego, working with Prof. Zhiting Hu and Prof. Eric P. Xing, focusing on advancing foundation agentic systems and scientific discovery. I obtained my PhD from The Ohio State University, advised by Prof. Huan Sun, where I developed foundational frameworks for knowledge-centric NLP systems.

I'm fortunate to have the privilege of working with exceptional researchers like Rameswar Panda, Yoon Kim, Nebojsa Jojic, Nikolay Malkin, Leonid Karlinsky, and Bo Zong across premier industrial labs (MIT-IBM Lab, Microsoft Research, NEC Labs America) and academic institutions (UCSD, CMU, MBZUAI). I've been honored with the OpenAI Agentic AI Research Grant, SoCal NLP 2023 Best Paper Award, Alexa Prize TaskBot Challenge 2022, and Rising Star in Data Science 2021.

Outside of research, you'll find me exploring hiking trails, playing pickleball, or planning my next adventure in national parks. I'm also a passionate sports fan, cheering for the Buckeyes, Dodgers, Lakers, Inter Miami, and Chiefs (for reasons unrelated to tight ends).

Email  /  GitHub  /  Twitter  /  Google Scholar

profile photo
At a rooftop in Anchorage, Alaska 2019

Research Overview

Building Trustworthy Systems that Perceive, Think, and Act: Today's most advanced AI systems, despite their impressive capabilities, remain fundamentally reactive — unable to actively explore possibilities, strategically plan actions, or safely adapt their behavior in the real world. This limitation becomes increasingly critical as AI systems are deployed in scenarios requiring sustained interaction, complex reasoning, and reliable real-world engagement.

My research establishes Foundation Agentic Systems that transform how AI engages with complex worlds across the perception-cognition-action loop while ensuring reliable governance.

  •     World Model-based Simulation and Planning: At its core, active intelligence requires the ability to simulate and reason about potential futures. My research introduces principled world model formulation, enabling AI systems for active simulation and strategic planning. [LLM-reasoners, COLM'24; PromptAgent , ICLR'24]
  •     Structured Reasoning and Inference-compute Scaling: Structure, whether explicit in graphs or implicit in language spaces, provides the key to achieve reliable and interpretable reasoning. [RAP, EMNLP'23; ThinkSum, ACL'23; SurfCon, KDD'19]
  •     Efficient Adaptation and Real-world Interaction: Real-world deployment demands extreme computational efficiency in behavioral adaptation and real-world interaction. My research achieves this through minimal-overhead architectures and parameter-efficient techniques. [ToolkenGPT, NeurIPS'23 Oral; MPT, ICLR'23].
  •     Scalable Methods for Safety, Alignment, and Oversight: Governance must scale with AI capabilities without unsustainable scaling of computational or human resources. Our approaches pioneer algorithmic solutions that grow more effective as systems become more capable. [DRPO,EMNLP2'24; Decentralized Arena]

Research Opportunities: I consistently seek out highly motivated students, particularly from underrepresented groups, to join me in various research projects both during the school year and throughout the summer. If you are interested in LLM augmentation (reasoning, tool-using, planning, etc), LLM agents, and AI4Science research, kindly email me expressing your interest.

News

Research Highlights

dearena

    Decentralized Arena: When LLMs Replace Human Judges for Democratic and Scalable LLM Leaderboard


→ Decentralized Arena introduces an automated, scalable system for evaluating LLMs across specialized domains, where the models themselves participate in assessing each other. This democratic evaluation approach achieves 95% correlation with Chatbot Arena rankings while maintaining full transparency and reproducibility.
Blog / Leaderboard
dynamic_alignment

    Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models


→ This paper introduces DRPO, a novel tuning-free approach that enables LLMs to achieve self-alignment through dynamic rewarding and search-based optimization, eliminating the need for costly human annotations and model training. This paves the way toward cost-effective and scalable AI alignment.
[EMNLP 2024] PDF / Code / Poster
promptagent_iclr2024

    PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization


→ Tired of manual prompt engineering? PromptAgent offers the first principled framework to formalize the problem of API-based prompt optimization (state, action, reward, etc); also the first to benchmark exploration efficiency and show the transferability of optimized prompts. Targeting for expert-level prompting, there are many exciting directions ahead of PromptAgent!
[ICLR 2024] PDF / Code / Slides / Poster
rap_emnlp2023

       Reasoning with Language Model is Planning with World Model


→ LLMs lack internal world models for effective reasoning. Reasoning via Planning (RAP) reformulates LLM reasoning as a planning problem, thus incorporating an external world model and principled planning seamlessly. This is a new framework applicable across varying tasks and an exciting direction for LLM augmentation research.
[EMNLP 2023] (Oral, Main) PDF / Code / Slides / Poster / Featured in State of AI Report 2023
toolkengpt_neurips2023

    ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings


→ ToolkenGPT augments LLMs with massive tools/APIs by representing tools as tokens (“toolken”) and enabling tool calls in the same way as generating regular words. ToolkenGPT is super efficient for learning massive tools, as plugging in new tools is as easy as learning embeddings.
[NeurIPS 2023] (Oral) PDF / Code / Slides / Poster / SoCal NLP 2023 Best Paper Award
mpt_iclr2023

    Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning


→ We propose Multitask Prompt Tuning (MPT) to exploit the rich cross-task knowledge for more efficient and generalizable transfer learning. MPT learns a single transferrable soft prompt through the use of a novel combination of prompt decomposition and prompt distillation.
[ICLR 2023] PDF / Code / Slides / Poster / Huggingface PEFT PR
thinksum_acl2023

    ThinkSum: Probabilistic Reasoning Over Sets Using Large Language Models


→ We propose a two-stage probabilistic inference paradigm, ThinkSum, to improve LLMs' abilities of reasoning over multiple objects in two steps, Think (e.g., retrieval of associations) and Sum (e.g., aggregation of results), which beats chain-of-thought prompting in hard BIG-bench tasks.
[ACL 2023]PDF / Code / Slides / Poster
cb_acl2022

    Coherence Boosting: When Your Pretrained Language Model is Not Paying Enough Attention


→ We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. We present Coherence Boosting, an inference procedure that increases an LM’s focus on a long context, which greatly improves NLG and NLU tasks.
[ACL 2022] PDF / Code / Slides / Poster (Long Paper, Oral Presentation)
x-clinrela_acl__2020

    Rationalizing Medical Relation Prediction from Corpus-level Statistics


→ We propose a self-interpretable framework to rationalize the neural relation prediction based on corpus-level statistics. This framework is inspired by human cognitive theory about recall and recognition, which provides structured knowledge triplets as rationales.
[ACL 2020] PDF / Code / Slides / Poster / Video
surfcon_kdd_2019

    SurfCon: Synonym Discovery on Privacy-Aware Clinical Data


→ We propose to discover structured knowledge--synonyms--from the privacy-aware text corpus and present a novel framework to leverage both surface form and context information to discover out-of-distribution synonyms.
[KDD 2019] PDF / Code / Slides / Poster (Research Track, Long Paper, Oral Presentation)


Source code from Leonid Keselman, design and inspiration from Jon Barron and Dongkuan.