Varlen-Attention

Many users hit the model at once with different-length prompts. Walk through one transformer block on a flat multi-request tensor and see which layers batch for free and which need a real fix — and whether TP has to change.