Abstract: The LLM decoding process poses a significant challenge for memory bandwidth due to its autoregressive nature. Prior 2D memory solutions fail to overcome this memory bottleneck due to limited ...