SLA fuses sparse and linear attention into a fine-tunable mechanism that classifies attention weights into critical, marginal, and negligible categories, accelerating diffusion transformers without sacrificing generation quality.
SLA
Jintao Zhang, Haoxu Wang, Kai Jiang, Shuo Yang, Kaiwen Zheng, Haocheng Xi, Ziteng Wang, Hongzhou Zhu, Min Zhao, Ion Stoica, Joseph E. Gonzalez, Jun Zhu, Jianfei Chen
|
Sep 29, 2025
