基本信息
论文标题:Diffusion Models for Safety Validation of Autonomous Driving Systems 作者:斯坦福的几个研究生和老师 arXiv编号:2506.08459v1 代码已开源:https://github.com/sisl/Diffusion-Based-AV-Safety-Validation
论文下载:点击这里
摘要部分逐句拆
第一句:问题在哪
“Safety validation of autonomous driving systems is extremely challenging due to the high risks and costs of real-world testing as well as the rarity and diversity of potential failures.”
测自动驾驶的安全性,三个大坑:
- 真车上路测试,出事了就完蛋,成本高风险大
- 事故案例太少,跟抽奖似的
- 出问题的情况千奇百怪,没法穷举
第二句:他们怎么搞的
“To address these challenges, we train a denoising diffusion model to generate potential failure cases of an autonomous vehicle given any initial traffic state.”
训了个去噪扩散模型,输入任意初始交通状态,输出可能导致撞车的情况。关键是”任意初始状态”,不是只能测固定场景。
第三句:在哪测的
“Experiments on a four-way intersection problem show that in a variety of scenarios, the diffusion model can generate realistic failure samples while capturing a wide variety of potential failures.”
测试场景是十字路口,这个选得还行,十字路口确实是自动驾驶最复杂的场景之一。
第四句:有什么优势
“Our model does not require any external training dataset, can perform training and inference with modest computing resources, and does not assume any prior knowledge of the system under test.”
三点优势:
- 不需要外部数据集,自己造数据训练
- 用的GTX 1080Ti就能跑,不是什么3090或者H100
- 黑盒测试,不用提前了解被测系统
第三点挺实用,换个自动驾驶系统也能直接用。
引言第一段:三重诅咒
开场白
“Verifying the safety of an autonomous driving system requires knowledge of the potential failure modes of the system.”
要验证安全性,得知道系统会在哪挂。这是整个研究的起点。
学习失效分布有三大难题
作者总结了三个”诅咒”:
诅咒1:维度诅咒
“The high-dimensional nature of the state space of the autonomous vehicle and the long time horizons over which it operates.”
状态空间维度爆炸。自动驾驶要考虑的东西太多:
- 自己车的位置速度加速度
- 周围车的状态
- 时间序列演化
- 传感器读数
这些加起来搜索空间大到没边。
诅咒2:稀疏性诅咒
“Failures tend to be rare for most safety-critical autonomous driving systems, which makes the search for failures highly time-consuming.”
设计得好的系统,失效本来就是小概率事件。可能要跑几百万次仿真才能抓到一个bug,计算量顶不住。
诅咒3:多模态诅咒
“The system may exhibit multiple failure modes of different nature, making it hard to capture the full range of variability in its failure distribution.”
失效模式五花八门:
- 传感器坏了
- 决策算法有bug
- 其他车乱开
- 各种组合
不是找到一种失效就完事,得全覆盖。
引言第二段:现有方法的坑
为啥都搞仿真
“Due to the high risks and costs of deploying the system on hardware for scenario-based and real-world testing, there has been growing interest in designing effective virtual testing frameworks.”
真车测试太危险太贵,所以都在搞虚拟测试。
传统方法怎么做
“Most of these frameworks typically involve constructing a virtual simulation environment and running pre-designed simulated tests tailored to the system.”
传统套路:
- 搭个仿真环境(CARLA、SUMO之类的)
- 人工设计测试场景
- 针对系统调参
痛点
“These testing frameworks, however, often rely on extensive prior knowledge of the SUT and require significant redesigns when being adapted to validate different driving systems.”
两个问题:
- 得深入了解被测系统(System Under Test,简称SUT)
- 换个系统就得重新设计,移植性差
每次测新系统都从头来,工作量大。
引言第三段:思路转变
核心创新点
“Instead of gathering as many individual failure cases as possible through simulation or real-world testing, directly modeling the true failure distribution of the SUT can provide more comprehensive insights into the failure modes of the system.”
这段是核心idea。
老思路:一个一个找bug(穷举失效案例) 新思路:建模bug的分布(学习失效分布)
不是穷举,而是把失效的概率分布学出来,然后从分布里采样。理论上覆盖更全。
为啥用扩散模型
“Recently, denoising diffusion models have demonstrated the capacity for generative modeling of complex target distributions, enabling the synthesis of high-quality texts, images, and robotic actions.”
扩散模型现在很火:
- 文本生成(ChatGPT那套)
- 图像生成(Stable Diffusion、Midjourney)
- 机器人动作生成
能生成这些复杂数据,用来生成失效案例应该也行。扩散模型能捕捉复杂多模态分布,正好对应自动驾驶失效的多样性。
引言第四段:具体怎么搞
训练目标
“This study applies diffusion models to perform safety validation for autonomous vehicles.”
用扩散模型做自动驾驶安全验证。
“we employ a diffusion model to learn the true distribution over collision-causing sequences of observation errors of the SUT under different traffic situations.”
学习”导致撞车的观测误差序列”的分布。
注意生成的是传感器观测误差序列,不是直接生成车辆轨迹。为啥这么设计?因为自动驾驶决策基于传感器观测,传感器有误差是常态。通过生成特定误差序列,测试系统在不利条件下的表现。
训练流程亮点
“Our training pipeline does not require any external training dataset.”
不需要外部数据集,这点很赞。很多类似工作需要真实事故记录训练,这种数据很难搞。
“At each iteration, the model generates a batch of failure-causing sensor disturbance samples and trains itself on samples closer to actual collisions.”
自举训练:
- 模型生成一批传感器扰动
- 跑仿真看哪些导致撞车
- 在这些样本上继续训练
- 循环迭代
模型越训练越会生成接近撞车的样本。
“During inference, the model can generate failure samples based on any given initial traffic state.”
推理时给定任意初始状态都能生成失效样本,不是固定场景。
测试场景
“we train the diffusion model to generate failure samples for a four-way intersection problem involving an autonomous ego vehicle and one intruder vehicle.”
场景设定:
- 四向十字路口
- 一辆自动驾驶车(ego vehicle,被测车)
- 一辆入侵车(intruder vehicle,可能造成威胁的车)
“The simulation environment randomly initializes the positions, velocities, and routes of the vehicles.”
初始条件全随机:位置、速度、路线。
代码开源
代码开源了:https://github.com/sisl/Diffusion-Based-AV-Safety-Validation
想复现的可以去看看。
主要贡献(6点)
作者列了6个contribution:
1. 条件生成 支持基于任意初始状态生成失效样本,不是固定场景。
2. 真实且多样 生成的样本符合真实传感器误差特征,覆盖多种失效模式。
3. 采样效率高 比蒙特卡洛方法效率高得多。蒙特卡洛就是暴力随机,可能跑几百万次才找到一个失效。这个方法针对性强。
4. 自我改进训练 多阶段训练,逐步逼近撞车。不需要外部数据集。
5. 黑盒测试 不需要了解被测系统内部结构,通用性强。
6. 硬件要求低 GTX 1080Ti就能跑,不需要高端卡。
论文结构
- Section II:相关工作
- Section III:方法论(问题定义、扩散模型架构、训练算法、评估指标)
原创文章,作者:智驾星闻,如若转载,请注明出处:https://www.key-iot.cn/zj/jssf/507.html
