Files
CTI-Inference-Opt/代码
Serendipity 4dbee83097 feat: 2:4 非结构化稀疏仅裁剪 Expert FFN(不碰 attention/gate)
- 合规:单个权重置零,矩阵形状不变
- 只裁剪 8层×8expert×2fc = 128 个 Expert Linear
- lambda forward 直调 sparse matmul,绕开 nn.Linear 兼容问题
2026-06-13 14:09:42 +08:00
..