Megatron on LLM Stories

Megatron on LLM Storieshttps://wgzesg.github.io/llm_stories/tags/megatron/Recent content in Megatron on LLM StoriesHugoenWed, 29 Apr 2026 00:00:00 +0000Walking Tensor Parallelism Through a Full Blockhttps://wgzesg.github.io/llm_stories/posts/03-tp-through-a-full-block/Wed, 29 Apr 2026 00:00:00 +0000https://wgzesg.github.io/llm_stories/posts/03-tp-through-a-full-block/How to split a full transformer block across two GPUs, with concrete shapes traced through every step. Start with column-parallel everywhere, see why it costs four gathers per block, then pair it with row-parallel to land at the Megatron pattern of two all-reduces per block.