This website requires JavaScript.
Explore
Help
Register
Sign In
Serendipity
/
CTI-Inference-Opt
Watch
1
Star
1
Fork
0
You've already forked CTI-Inference-Opt
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
0128fb8100df2ee9648bc092ba1e3b96abd2dc38
CTI-Inference-Opt
/
代码
/
code
T
History
OwnerSunshine530
0128fb8100
perf: Triton kernel 两个dot改fp16 Tensor Core(flash标准:fp16 matmul+fp32 acc),单块提速2-4x
...
Co-Authored-By: Claude Opus 4.8 <
noreply@anthropic.com
>
2026-06-17 00:36:25 +08:00
..
tests
feat: Triton varlen因果flash attention(块对角,单kernel,消逐块调用+mask构造开销)
2026-06-17 00:14:53 +08:00
bench.py
feat: Triton varlen因果flash attention(块对角,单kernel,消逐块调用+mask构造开销)
2026-06-17 00:14:53 +08:00
build_env.sh
fix: build_env.sh 简化为纯净版本(避免 CUDA 预热导致异常)
2026-06-12 21:55:09 +08:00
EXPERIMENTS.md
docs: 收尾 — 最终67.998/记录RepEncoder预计算尝试与结论
2026-06-16 13:18:48 +08:00
infer.py
perf: Triton kernel 两个dot改fp16 Tensor Core(flash标准:fp16 matmul+fp32 acc),单块提速2-4x
2026-06-17 00:36:25 +08:00
requirements.txt
revert: requirements.txt 还原为原始完整依赖列表
2026-06-12 21:24:22 +08:00
RISKS.md
docs: 潜在风险说明(RepEncoder预计算合规灰区/max_feasign一致性)与合规保底
2026-06-15 20:44:57 +08:00