vAttention combines top-k and sampling-based sparse attention with explicit approximation guarantees, targeting reliable deployment of sparse decoding methods.
vAttention
Aditya Desai, Shuo Yang, et al.
|
Oct 1, 2025


vAttention combines top-k and sampling-based sparse attention with explicit approximation guarantees, targeting reliable deployment of sparse decoding methods.