Files
CTI-Inference-Opt/代码
OwnerSunshine530 74bb95a7bd feat: F.embedding_bag 融合查表+池化(单kernel,免[M,512]中间) — 攻最大块(dedup index25%+segment11%=36%)
triton版profile:attention已优化出top,新大头=embedding池化36%+MoE22%+add18%。
embedding_bag一个kernel做查表+按段求和。等价测试+bench --emb-bag。默认关待验证。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:30:47 +08:00
..