Files
CTI-Inference-Opt/代码
OwnerSunshine530 b72e0346a9 perf: triton attention 输出按[S,H,Dh]布局写,消调用方permute-clone(x8层)
kernel输出stride可配,直接写[1,S,H,Dh]存储,调用方permute(0,2,1,3)变免费视图、
reshape不再clone。纯布局,数值不变。延续减kernel/clone方向。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 20:27:28 +08:00
..