Twilight introduces adaptive sparsity for long-context LLM decoding by bringing top-p style budgeting into sparse attention.
Twilight
Chaofan Lin, Shuo Yang, et al.
|
Feb 1, 2025


Twilight introduces adaptive sparsity for long-context LLM decoding by bringing top-p style budgeting into sparse attention.