Files
CTI-Inference-Opt/代码
OwnerSunshine530 575b32f263 feat: fused MoE — baddbmm(cutlass GEMM+bias融合)+跳过推理无用的moe_loss,减kernel
GEMM保留cutlass(triton GEMM难超),融bias epilogue省add kernel;moe_loss仅训练用,
推理跳过省importance/std/mean。延续减kernel方向(embedding_bag/triton已证评测赚)。
默认开,bench --no-moe-baddbmm/--no-skip-moe-loss 对照。AUC无损。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 14:27:59 +08:00
..