Prefill and Decode Disaggregation: Two Phases on Opposite Sides of the Roofline

Article 05 left two phases politely sharing one engine. This article shows they shouldn’t — prefill is compute-bound, decode is bandwidth-bound, and long context drives the gap wider, not smaller. Once we accept the asymmetry, splitting them is the structural fix.

May 9, 2026 · 17 min · Pino