Twilight

Chaofan Lin, Shuo Yang, et al. | Feb 1, 2025

Twilight introduces adaptive sparsity for long-context LLM decoding by bringing top-p style budgeting into sparse attention.