Logo
Explore Help
Register Sign In
Serendipity/CTI-Inference-Opt
1
1
Fork 0
You've already forked CTI-Inference-Opt
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
Files
6114c78354aec24e9f66854bdabf9eb665b71544
CTI-Inference-Opt/代码/code
T
History
OwnerSunshine530 6114c78354 perf: triton wrapper 去掉 q/k/v.contiguous(),用实际stride读非连续(省13% clone开销)
profile显示triton的.contiguous()产生492次clone占13%。kernel本就用stride参数,
传q.stride()+out.stride()直接读split+permute后的非连续qkv,免clone。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:44:10 +08:00
..
tests
feat: F.embedding_bag 融合查表+池化(单kernel,免[M,512]中间) — 攻最大块(dedup index25%+segment11%=36%)
2026-06-17 13:30:47 +08:00
bench.py
feat: F.embedding_bag 融合查表+池化(单kernel,免[M,512]中间) — 攻最大块(dedup index25%+segment11%=36%)
2026-06-17 13:30:47 +08:00
build_env.sh
fix: build_env.sh 简化为纯净版本(避免 CUDA 预热导致异常)
2026-06-12 21:55:09 +08:00
EXPERIMENTS.md
docs: 收尾 — 最终67.998/记录RepEncoder预计算尝试与结论
2026-06-16 13:18:48 +08:00
infer.py
perf: triton wrapper 去掉 q/k/v.contiguous(),用实际stride读非连续(省13% clone开销)
2026-06-17 13:44:10 +08:00
requirements.txt
revert: requirements.txt 还原为原始完整依赖列表
2026-06-12 21:24:22 +08:00
RISKS.md
docs: 潜在风险说明(RepEncoder预计算合规灰区/max_feasign一致性)与合规保底
2026-06-15 20:44:57 +08:00
Powered by Gitea Version: 26.3.1 Page: 820ms Template: 9ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API