Publications

You can also find my articles on my Google Scholar profile.

Preprints

FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training

arxiv, Feb 2025

Hellobench: Evaluating long text generation capabilities of large language models

arxiv, Sep 2024

Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering

arxiv, Aug 2023

Conference Papers

Enhancing LLMs via High-Knowledge Data Selection

AAAI 2025, Apr 2025

FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

ACL 2025 Findings, Feb 2025

Preference curriculum: Llms should always be pretrained on their preferred data

ACL 2025 Findings, Jan 2025

D-cpt law: Domain-specific continual pre-training scaling law for large language models

NeuIPS 2024, Dec 2024

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

EMNLP 2024 Findings, Oct 2024

Generative Spoken Language Modeling with Quantized Feature Enhancement

IJCNN 2024, Jun 2024

Llms know what they need: Leveraging a missing information guided framework to empower retrieval-augmented generation

CoLing 2025, Apr 2024