How to Batch Many Requests Through One Forward Pass
Many users hit the model at once with different-length prompts. Walk through one transformer block on a flat multi-request tensor and see which layers batch for free and which need a real fix — and whether TP has to change.