Logo
Explore Help
Register Sign In
Serendipity/CTI-Inference-Opt
1
1
Fork 0
You've already forked CTI-Inference-Opt
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
Files
84db692f07bf1522f574bcdf6861cf9b19256afb
CTI-Inference-Opt/代码/code
T
History
OwnerSunshine530 84db692f07 feat: INT8 dense MoE(torch._int_mm,2D拼接W1_cat/W2_cat,top-k加权折进GEMM2,per-tensor激活量化)
dense MoE两个batched GEMM重写成2D GEMM以用A800 int8 tensor core;计算减半。
quant/dequant是真compute本地可见→本地bench即可判生死。默认关,bench --moe-int8。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 01:35:55 +08:00
..
tests
perf: _triton_block_meta 消除最后一个host同步(grid用shape派生上界,空block在kernel内mask空跑)
2026-06-19 20:51:37 +08:00
bench.py
feat: INT8 dense MoE(torch._int_mm,2D拼接W1_cat/W2_cat,top-k加权折进GEMM2,per-tensor激活量化)
2026-06-20 01:35:55 +08:00
build_env.sh
fix: build_env.sh 简化为纯净版本(避免 CUDA 预热导致异常)
2026-06-12 21:55:09 +08:00
EXPERIMENTS.md
docs: 收尾 — 最终67.998/记录RepEncoder预计算尝试与结论
2026-06-16 13:18:48 +08:00
infer.py
feat: INT8 dense MoE(torch._int_mm,2D拼接W1_cat/W2_cat,top-k加权折进GEMM2,per-tensor激活量化)
2026-06-20 01:35:55 +08:00
requirements.txt
revert: requirements.txt 还原为原始完整依赖列表
2026-06-12 21:24:22 +08:00
RISKS.md
docs: 潜在风险说明(RepEncoder预计算合规灰区/max_feasign一致性)与合规保底
2026-06-15 20:44:57 +08:00
Powered by Gitea Version: 26.3.1 Page: 629ms Template: 29ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API