docs: 更新优化路线和提交记录（Flash Attention 56.98 分, 94.5s）

2026-06-12 21:55:58 +08:00
parent 61bab9d0e3
commit c5fee2da9b
1 changed files with 8 additions and 7 deletions
@@ -104,13 +104,13 @@ score_all     = score_latency * 70 + score_model * 30
 Baseline 数据：推理 229s，AUC 0.759，PCOC 1.110，得分 25.85。
-1. **接口对齐**（必须先做）— 确认能在评测系统跑通（得分 > 0）
+1. ✅ **接口对齐** — 确认能在评测系统跑通（得分 > 0）
-2. **FP16 量化** — `model.half()`，Embedding 保留 FP32，预期 229s → ~120s
+2. ✅ **FP16 量化** — `model.half()`，Embedding 保留 FP32，152s
-3. **Flash Attention** — 替换 `scaled_dot_product` 为 `F.scaled_dot_product_attention`，数学等价
+3. ✅ **Flash Attention** — 替换 `scaled_dot_product` 为 `F.scaled_dot_product_attention`，94.5s
-4. **torch.compile** — `mode="reduce-overhead"` → `"max-autotune"`，build_env.sh 中预热
+4. 🔲 **torch.compile** — `mode="reduce-overhead"`，待验证
-5. **数据流优化** — 缓存时预转 FP16 + 预搬到 GPU
+5. 🔲 **数据流优化** — 缓存时预转 FP16 + 预搬到 GPU
-6. **MoE 优化** — 统计 expert 负载，合并/移除低频 expert
+6. 🔲 **MoE 优化** — 统计 expert 负载，合并/移除低频 expert
-7. **INT8 量化**（可选）— 精度风险较高，仅在前几步不够时尝试
+7. 🔲 **INT8 量化**（可选）— 精度风险较高，仅在前几步不够时尝试
 CUDA Graph 已评估并放弃（batch 形状不固定，不适用）。
@@ -133,4 +133,5 @@ CUDA Graph 已评估并放弃（batch 形状不固定，不适用）。
 | 日期 | 提交次数 | 得分 | AUC | PCOC | 耗时 | 优化手段 | 备注 |
 |------|----------|------|-----|------|------|----------|------|
 | 06/12 | 6 | **56.98** | 0.7526 | 1.059 | 94.5s | + Flash Attention | build_env.sh 预热脚本导致第5次异常 |
 | 06/12 | 3 | **43.55** | 0.7525 | 1.059 | 152s | 接口对齐 + FP16 量化 | 第 1、2 次因 requirements.txt 异常 |