Hellobench: Evaluating long text generation capabilities of large language models

Published in arxiv, 2024