HashAttention

Aditya Desai, Shuo Yang, et al. | Jan 1, 2025

HashAttention frames pivotal-token identification as a recommendation-style semantic sparsity problem and accelerates attention using GPU-friendly hashing and bitwise operations.