[NeurIPS’23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher RĂ©, Clark Barrett, Zhangyang Wang, Beidi Chen Large Language Models (LLMs) have ushered in a remarkable era in natural language processing, empowering machines to generate remarkably human-like content with unparalleled precision and fluency. […]