Files
CTI-Inference-Opt/代码
OwnerSunshine530 84db692f07 feat: INT8 dense MoE(torch._int_mm,2D拼接W1_cat/W2_cat,top-k加权折进GEMM2,per-tensor激活量化)
dense MoE两个batched GEMM重写成2D GEMM以用A800 int8 tensor core;计算减半。
quant/dequant是真compute本地可见→本地bench即可判生死。默认关,bench --moe-int8。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 01:35:55 +08:00
..