ORCA and Chunked Prefill: Evening Out the Iteration
Many requests, each finishing at a different time, and some carrying prefills 1000× the size of a decode step. Per-iteration cost swings wildly. ORCA-style iteration-level scheduling fixes one half; chunked prefill bounds the largest iteration so short work isn’t dragged behind long work.