Files
CTI-Inference-Opt/代码/main.ipynb
T
Serendipity d0bbb8f3e2 chore: 初始化 CTI 推理优化项目
- baseline infer.py + requirements.txt + build_env.sh
- GRAB / HSTU 两篇核心论文
- 比赛规则和提交接口说明
- 项目 CLAUDE.md
2026-06-03 13:49:30 +08:00

388 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 百度2026CTI:生成式推荐广告排序推理性能优化\n",
"\n",
"> 报名链接:[https://aistudio.baidu.com/competition/detail/1461](https://aistudio.baidu.com/competition/detail/1461/0/introduction)\n",
"\n",
"## 赛道概要背景:\n",
"\n",
"传统广告排序模型已难以满足个性化推荐需求,生成式广告排序模型凭借强大的序列建模与语义理解能力成为行业趋势。该类模型依托 Transformer 架构,能深度挖掘用户点击、转化等超长行为序列中的长距离依赖关系,精准捕捉用户兴趣演化规律,从而生成更具吸引力的个性化广告内容,提升广告点击率与用户体验。但在实际应用中,存在很多挑战,如模型参数规模大、注意力计算复杂、存在超长历史序列、量化影响推理精度等。\n",
"因此,本次赛事聚焦于如何提升生成式广告排序的推理性能,我们期待参赛选手能够从框架优化、算法创新、高性能计算等多个角度出发,提出突破现有技术瓶颈的创新方案。\n",
"\n",
"本次任务提供百度商业真实的用户行为数据、广告信息,选手需要在保证模型推理效果的前提下,极致优化推理性能。\n",
"\n",
"## 数据集介绍:\n",
"\n",
"1. 用户行为数据:包括全局唯一的日志ID和用户ID、广告曝光时间、广告点击时间等信息;\n",
"2. 广告内容:包括广告的文本描述、图片信息、广告主信息等;\n",
"3. 上下文信息:包括用户的地理位置、职业、性别、设备类型等;\n",
"4. 用户统计信息:包括用户的活跃度、兴趣标签、历史点击率等统计数据。\n",
"\n",
"数据示例(按行字段,时间戳格式为Unix Timestamp):\n",
"![数据介绍](https://ai-studio-static-online.cdn.bcebos.com/aff32cdcb2c143d29fffdfa922014bf96dad0b2ddddb4f11bfc83c20a15664db)\n",
"\n",
"## 评估指标\n",
"\n",
"1. 推理效率评估:参赛者提交inference脚本后,会通过统计inference脚本的运行时间,来计算在测试集上单条样本的平均推理时间。推理效率打分采用如下公示,如平均推理时间超过定义的时间限制,则本项和最终得分为0:\n",
"\n",
"![评分指标](https://ai-studio-static-online.cdn.bcebos.com/4e60708f1a364075892de2ebdc9d52a0370c407d5eac40049aa658ba24757763)\n",
"\n",
"2. 策略效果评估:综合考虑AUC及PCOC指标,PCOC需满足[0.85, 1.15]AUC需满足[0.65, 1],方可进入榜单排序,否则本项和最终得分为0,具体规则如下:得分由pcoc和auc组合而成:\n",
"\n",
"![](https://bj.bcebos.com/v1/ai-studio-match/file/ac0c9093b664488cbb85f92694a22ef696720c9ae7994eae9e4dba760d97c998?authorization=bce-auth-v1%2FALTAKzReLNvew3ySINYJ0fuAMN%2F2026-04-08T11%3A20%3A21Z%2F-1%2F%2Fec5f8c64ba11c8a52fb1a71ef05572ac4673c74750766218c5a9b956ad48e72d)\n",
"\n",
"\n",
"* 指标说明:\n",
" * AUCROC曲线下的面积,越接近与1越好\n",
" * PCOC:预估转化率 / 真实转化率,越接近于1越好\n",
"\n",
"3. 计分规则:综合考虑推理性能和策略效果两个指标,计分规则如下所示;\n",
"\n",
"![](https://ai-studio-static-online.cdn.bcebos.com/262518fa5b3447b0b3524a3955764e502bba322aadf743b5a573637eac7e4048)\n",
"\n",
"**警告⚠️**\n",
"\n",
"> * 评估容器有整体运行时间限制,如果超出则无法计入成绩;(build_env.sh等其他部分耗时均要在20min内)\n",
"> * 任何作弊行为将会取消队伍成绩。\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1 依赖库安装\n",
"\n",
"由于平台安装依赖很慢,提供了如下两种方式:\n",
"1. 自行安装\n",
"2. tar包 【建议:方便、快】\n",
"\n",
"任选一种即可。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/opt/conda/envs/python35-paddle120-env/bin/python\r\n"
]
}
],
"source": [
"!which python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [],
"source": [
"# 自行安装,网络好的话 也很快\n",
"# 平台上应该有显示 bug,等待下载一段时间后,页面状态不更新\n",
"# 从终端top看下进程无cpu占用,大概率已经安装完成了,点击上方的重启内核后正常下一步执行infer即可\n",
"!mkdir -p /home/aistudio/libraries\n",
"!pip install uv\n",
"!/home/aistudio/libraries/bin/uv pip install \\\n",
" -r /home/aistudio/code/requirements.txt \\\n",
" --target /home/aistudio/libraries \\\n",
" -i https://mirrors.aliyun.com/pypi/simple/\n",
"# -i https://mirror.baidu.com/pypi/simple/\n",
"# -i https://pypi.tuna.tsinghua.edu.cn/simple/"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2026-04-09 19:53:40-- https://studio-package.bj.bcebos.com/2026-shangye-python-package/external-libraries.tar\r\n",
"Resolving studio-package.bj.bcebos.com (studio-package.bj.bcebos.com)... 100.64.80.160, 100.67.184.196, 100.64.80.202\r\n",
"Connecting to studio-package.bj.bcebos.com (studio-package.bj.bcebos.com)|100.64.80.160|:443... connected.\r\n",
"HTTP request sent, awaiting response... 200 OK\r\n",
"Length: 5699010560 (5.3G) [application/octet-stream]\r\n",
"Saving to: 'libraries.tar'\r\n",
"\r\n",
"libraries.tar 100%[===================>] 5.31G 120MB/s in 45s \r\n",
"\r\n",
"2026-04-09 19:54:25 (120 MB/s) - 'libraries.tar' saved [5699010560/5699010560]\r\n",
"\r\n",
"Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n",
"Requirement already satisfied: uv in ./external-libraries/lib/python3.10/site-packages (0.10.7)\r\n",
"\u001b[2mUsing CPython 3.10.10 interpreter at: /opt/conda/envs/python35-paddle120-env/bin/python\u001b[0m\r\n",
"\u001b[2mAudited \u001b[1m29 packages\u001b[0m \u001b[2min 36ms\u001b[0m\u001b[0m\r\n"
]
}
],
"source": [
"# 强烈建议使用 tar 包 安装\n",
"!mkdir -p /home/aistudio/libraries /home/aistudio/external-libraries\n",
"!wget https://studio-package.bj.bcebos.com/2026-shangye-python-package/external-libraries.tar -O libraries.tar\n",
"# 解压到 libraries 目录(跳过 external-libraries 层级)\n",
"!tar -xf libraries.tar --strip-components=1 -C libraries\n",
"!pip install uv\n",
"!/home/aistudio/external-libraries/bin/uv pip install \\\n",
" -r /home/aistudio/code/requirements.txt \\\n",
" --target /home/aistudio/libraries \\\n",
" -i https://mirrors.aliyun.com/pypi/simple/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2 数据集和模型链接\n",
"\n",
"文件太大, 请耐心等待项目中数据和权重下载完成后进行 ln\n",
"\n",
"选手测试集样本的 auc 和 pcoc 仅供参考,以最终提交验证集结果为准\n",
"\n",
"\n",
"选手不可对数据集进行任何更改 【违规为0分】"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [],
"source": [
"!ln -s /home/aistudio/data/datasets/375013/2026_cti_data/dataset /home/aistudio/code/dataset"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [],
"source": [
"!cat /home/aistudio/data/models/45703/2026_cti_model/ckpt.part.0* > /home/aistudio/code/ckpt.pt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 21488359\r\n",
"-rw-r--r-- 1 aistudio aistudio 39 Apr 9 15:53 build_env.sh\r\n",
"-rw-r--r-- 1 aistudio aistudio 10605862351 Apr 9 19:08 ckpt.pt\r\n",
"lrwxrwxrwx 1 aistudio aistudio 57 Apr 9 15:53 dataset -> /home/aistudio/data/datasets/375013/2026_cti_data/dataset\r\n",
"-rw-r--r-- 1 aistudio aistudio 5699010560 Apr 3 16:33 external-libraries.tar\r\n",
"-rw-r--r-- 1 aistudio aistudio 25338 Apr 9 15:53 infer.py\r\n",
"-rw-r--r-- 1 aistudio aistudio 5699010560 Apr 3 16:33 libraries.tar\r\n",
"-rw-r--r-- 1 aistudio aistudio 161928 Apr 9 19:36 predict.txt\r\n",
"-rw-r--r-- 1 aistudio aistudio 652 Apr 9 19:03 requirements.txt\r\n"
]
}
],
"source": [
"!ls -l code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3 推理\n",
"\n",
"选手不可对组网和相关参数进行修改。 【违规为0分】\n",
"\n",
"量化稀疏剪枝除外"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/aistudio/code\r\n",
"/home/aistudio/libraries/torch/cuda/__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.\r\n",
" import pynvml # type: ignore[import]\r\n",
"[INFO] loading cached batch shards from /home/aistudio/code/dataset/cached_batches\r\n",
"[INFO] loaded 217 batches from shard_0000.pt\r\n",
"[INFO] loaded 218 batches from shard_0001.pt\r\n",
"[INFO] loaded 215 batches from shard_0002.pt\r\n",
"[INFO] loaded 226 batches from shard_0003.pt\r\n",
"[INFO] loaded 240 batches from shard_0004.pt\r\n",
"[INFO] loaded 260 batches from shard_0005.pt\r\n",
"[INFO] loaded 284 batches from shard_0006.pt\r\n",
"[INFO] loaded 335 batches from shard_0007.pt\r\n",
"[INFO] loaded 44 batches from shard_0008.pt\r\n",
"[INFO] loaded 2039 cached batches total from 9 shards\r\n",
"[INFO] data loading done\r\n",
"[INFO] Loaded checkpoint from /home/aistudio/code/ckpt.pt (epoch=1)\r\n",
"[INFO] Model ready. Device: cuda:0\r\n",
"******************** start inference ********************\r\n",
"Inference: 100%|████████████████████████████| 2039/2039 [03:57<00:00, 8.57it/s]\r\n",
"[INFO] inference time: 229.1826s\r\n",
"******************** end inference ********************\r\n",
"[INFO] predictions written to predict.txt, total: 7774\r\n",
"[INFO] AUC: 0.759232\r\n",
"[INFO] PCOC: 1.110063\r\n",
"[INFO] Latency: 229.1826s\r\n",
"[INFO] score_latency: 0.236058\r\n",
"[INFO] score_model: 0.310817\r\n",
"[INFO] score_all: 25.848547\r\n"
]
}
],
"source": [
"%cd /home/aistudio/code\n",
"!/opt/conda/envs/python35-paddle120-env/bin/python infer.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4 提交\n",
"\n",
"参赛选手需要提交一个命名为xxx.zip的压缩包,压缩包内需要包含以下内容:\n",
"\n",
"1. 额外的python包环境,选手可以通过将python环境打包放在当前工作目录\n",
"2. 优化过的模型文件,如量化后的模型等\n",
"3. 程序入口infer.py脚本\n",
"\n",
"PS: \n",
"> 打包不要包含 **eval 文件夹** 和 **dataset 文件夹** \n",
"> 权重若使用原版,无需修改权重参数且无需上传权重 \n",
"> 若修改权重,请自行完善和修改 infer.py相关逻辑 ****\n",
"> 若需要进行编译等其他复杂操作,请在 build_env.sh 中完成 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 其他注意事项\n",
"\n",
"详见 任务提交接口说明.md"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/aistudio/code\r\n",
"updating: requirements.txt (deflated 51%)\r\n",
"updating: build_env.sh (stored 0%)\r\n",
"updating: infer.py (deflated 71%)\r\n"
]
}
],
"source": [
"%cd /home/aistudio/code\n",
"!zip -y -r ../eval.zip requirements.txt build_env.sh infer.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**❤️ 将上述压缩包提交至比赛平台 ❤️**\n",
"\n",
"[https://aistudio.baidu.com/competition/detail/1461/0/submit-result](https://aistudio.baidu.com/competition/detail/1461/0/submit-result)\n",
"\n",
"![](https://ai-studio-static-online.cdn.bcebos.com/5709f981e35c481190f65b14e11154cbdf959039be5744fd835e331eb0a96352)\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "py35-paddle1.2.0"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
}
},
"nbformat": 4,
"nbformat_minor": 1
}