Chan-Hung Yu

01

Less Tuning, Better Planning: Simplifying Offline Model-Based Planning

Co Yong*, Chan-Hung Yu*, Shao-Hua Sun

ICML 2026 Workshop DEMO Spotlight openreview

In this work, we introduce Soft Horizon AggRegation for Planning (SHARP), an offline plug-and-play planning method that eliminates the need for an online-tuned planning horizon. Instead of using a fixed horizon across all states, SHARP performs soft horizon aggregation by dynamically weighting returns according to model uncertainty estimated from an ensemble of dynamics models. We further investigate the role of the action proposer and find that stronger offline policies do not necessarily lead to better planning performance. Instead, a simple behavior cloning (BC) policy is often sufficient as an action proposer while avoiding the effort required for extensive policy extraction. Combining these insights, we propose SHARP-BC, which consistently outperforms existing baselines while reducing reliance on extensive online hyperparameter tuning.

02

Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors

Ren-Wei Liang*, Chin-Ting Hsu*, Chan-Hung Yu, Saransh Agrawal, Shih-Cheng Huang, Chieh-Yen Lin, Shang-Tse Chen, Kuan-Hao Huang, Shao-Hua Sun

EACL 2026 Oral arXiv code

We propose Preference Vector, a novel framework inspired by task arithmetic. Instead of optimizing multiple preferences within a single objective, we train separate models on individual preferences, extract behavior shifts as preference vectors, and dynamically merge them at test time. This modular approach enables fine-grained, user-controllable preference adjustments and facilitates seamless integration of new preferences without retraining. Experiments show that our proposed Preference Vector framework improves helpfulness without excessive conservatism, allows smooth control over preference trade-offs, and supports scalable multi-preference alignment.

03

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Max Liu*, Chan-Hung Yu*, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen, Shao-Hua Sun

ICLR 2025 arXiv code

We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy — an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated programs, we develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently.

04

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

Li-Chun Lu*, Shou-Jen Chen*, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun

COLM 2024 arXiv code

Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs.

Publications

Less Tuning, Better Planning: Simplifying Offline Model-Based Planning

Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play