Varlen-Attention on LLM Stories

Varlen-Attention on LLM Storieshttps://wgzesg.github.io/llm_stories/tags/varlen-attention/Recent content in Varlen-Attention on LLM StoriesHugoenSun, 03 May 2026 00:00:00 +0000How to Batch Many Requests Through One Forward Passhttps://wgzesg.github.io/llm_stories/posts/04-batching-many-requests/Sun, 03 May 2026 00:00:00 +0000https://wgzesg.github.io/llm_stories/posts/04-batching-many-requests/How to batch many concurrent prefill requests through a TP-parallelized transformer. Walk a full block on a flattened multi-request tensor and watch where batching is free vs. where it isn't.