Megatron

Walk article 02’s two cuts through a full transformer block, with concrete shapes on each GPU at every step. Apply one cut to every matmul first — comm explodes (four gathers per block). Then pair the two cuts as duals and watch them snap into the architecture’s widen-narrow rhythm, landing at two all-reduces per block.