Publications

You can also find my articles on my Google Scholar profile.

Preprints


FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training

arxiv, Feb 2025

Download Paper

Hellobench: Evaluating long text generation capabilities of large language models

arxiv, Sep 2024

Download Paper

Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering

arxiv, Aug 2023

Download Paper

Conference Papers


Enhancing LLMs via High-Knowledge Data Selection

AAAI 2025, Apr 2025

Download Paper

FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

ACL 2025 Findings, Feb 2025

Download Paper

Preference curriculum: Llms should always be pretrained on their preferred data

ACL 2025 Findings, Jan 2025

Download Paper

D-cpt law: Domain-specific continual pre-training scaling law for large language models

NeuIPS 2024, Dec 2024

Download Paper

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

EMNLP 2024 Findings, Oct 2024

Download Paper

Generative Spoken Language Modeling with Quantized Feature Enhancement

IJCNN 2024, Jun 2024

Download Paper

Llms know what they need: Leveraging a missing information guided framework to empower retrieval-augmented generation

CoLing 2025, Apr 2024

Download Paper