Publications

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

Published in ACL In submission, 2024

We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users’ implicit intentions through explicit queries. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution.

Recommended citation: Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents C Qian, B He, Z Zhuang, J Deng, Y Qin, X Cong, Z Zhang, J Zhou, Y Lin, Z Liu, M Sun… - arXiv preprint arXiv:2402.09205, 2024 https://arxiv.org/abs/2402.09205

ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback

Published in ICML In submission, 2024

We finally present UltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset, which contains over 1 million GPT-4 feedback for 250k user-assistant conversations from various aspects. Built upon UltraFeedback, we align a LLaMA-based model by best-of-n sampling and reinforcement learning, demonstrating its exceptional performance on chat benchmarks.

Recommended citation: Ultrafeedback: Boosting language models with high-quality feedback G Cui, L Yuan, N Ding, G Yao, B He, W Zhu, Y Ni, G Xie, Z Liu… - arXiv preprint arXiv:2310.01377, 2023 https://arxiv.org/abs/2310.01377

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT

Published in EMNLP Main, 2023

The paper design a zero-shot black-box method for detecting LLM-generated texts. Compared with other detection methods, our method has better generalization ability and is more stable across various datasets.

Recommended citation: Biru Zhu, Lifan Yuan, Ganqu Cui, Yangyi Chen, Chong Fu, Bingxiang He, Yangdong Deng, Zhiyuan Liu, Maosong Sun, and Ming Gu. 2023. Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7470–7483, Singapore. Association for Computational Linguistics. https://aclanthology.org/2023.emnlp-main.463/

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Published in NeurIPS Datasets & Benchmarks, 2022

The paper develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning, and propose a simple yet strong clustering-based defense baseline

Recommended citation: Cui G, Yuan L, He B, et al. A unified evaluation of textual backdoor learning: Frameworks and benchmarks[J]. Advances in Neural Information Processing Systems, 2022, 35: 5009-5023. https://arxiv.org/abs/2206.08514