计算机应用   2017, Vol. 37 Issue (1): 284-288  DOI: 10.11772/j.issn.1001-9081.2017.01.0284 0

### 引用本文

ZHANG Jun, HU Zhenbo, ZHU Xinshan. Real-time traffic accident prediction based on AdaBoost classifier[J]. JOURNAL OF COMPUTER APPLICATIONS, 2017, 37(1): 284-288. DOI: 10.11772/j.issn.1001-9081.2017.01.0284.

### 文章历史

Real-time traffic accident prediction based on AdaBoost classifier
ZHANG Jun, HU Zhenbo, ZHU Xinshan
School of Electrical and Automation Engineering, Tianjin University, Tianjin 300072, China
Abstract: The traditional road traffic accident forecast mainly uses the historical data, including the number and the loss of traffic accidents, to predict the future trend, however, the traditional method can not reflect the relationship between the traffic accident and real-time traffic characteristics, and it also can not prevent accidents effectively. In order to solve the problems above, a real-time traffic accident prediction method based on AdaBoost classifier was proposed. Firstly, the road traffic states were divided into normal conditions and dangerous conditions, and the real-time collection of traffic flow data was used as the characteristic variable to characterize the different states, so the real-time prediction problem could be converted to a classification problem. Secondly, the Probability Density Function (PDF) of traffic flow characteristics under the two conditions in different time scales were estimated by Parzen window nonparametric estimation method, and the estimated density function was analyzed by the separability criterion based on probability distribution, then the sample data with appropriate characteristic variable and time scale could be determined. Finally, the AdaBoost classifier was trained to classify different traffic conditions. The experimental results show that the correct ratio by using standard deviation of traffic flow characteristics to classify test samples is 7.9% higher than that by using average value. The former can reflect the differences of different traffic states better, and also get better classification results.
Key words: intelligent transportation    accident prediction    classifier    traffic flow characteristic    Parzen window    separability criterion
0 引言

1 道路交通事故实时预测原理

 图 1 交通状态划分 Figure 1 Classification of traffic conditions

 ${{\mathbf{x}}_{i}}={{\left\{ {{T}_{i}},{{L}_{i}},{{C}_{i}},{{W}_{i}},{{F}_{i}} \right\}}^{\mathsf{T}}};i=1,2,\ldots ,N$ (1)

 $h({{\mathbf{x}}_{i}})=\left\{ \begin{matrix} \begin{matrix} 0 & ,{{\mathbf{x}}_{i}}\in {{\omega }_{1}} \\ \end{matrix} \\ \begin{matrix} 1 & ,{{\mathbf{x}}_{i}}\in {{\omega }_{2}} \\ \end{matrix} \\ \end{matrix} \right.$ (2)

 图 2 实时交通事故预测过程 Figure 2 Process of real-time traffic accident prediction

2 交通数据准备及处理

2.1 数据采集及预处理

1) 设定参数的合理范围以及精度，对不符合条件的数据进行修正。例如，平均速度的合理范围一般在0和地点限制速度的1.3或1.5倍之间，时间占有率的合理范围是0到100%。

2) 检验每组交通流数据的记录时间，正确数据应当从0到719，删除重复数据，对乱序数据重新排序修正。

3) 对缺失数据以及其他异常数据进行填补，如果只有个别数据存在问题，则利用其相邻数据平均值进行填充，如果一段时间内存在问题，则利用同期的历史数据平均值进行填充。

2.2 交通状态特征选择

2.2.1 特征选择原理

 图 3 错误率 Figure 3 Misclassification probabilities

P(ω1)、P(ω2)为交通状态的先验概率，样本在Φ1中属于危险交通状态的概率即第一类错误率为：

 ${{P}_{2}}(e)=\int_{{{\Phi }_{2}}}{p(\mathbf{x}|{{\omega }_{2}})d\mathbf{x}}$ (3)

 ${{P}_{1}}(e)=\int_{{{\Phi }_{1}}}{p(\mathbf{x}|{{\omega }_{1}})d\mathbf{x}}$ (4)

 \begin{align} & P(e)=\int_{-\infty }^{t}{P({{\omega }_{2}}|\mathbf{x})p(\mathbf{x})d\mathbf{x}}+\int_{t}^{+\infty }{P({{\omega }_{1}}|\mathbf{x})p(\mathbf{x})d\mathbf{x}} \\ & =P({{\omega }_{2}}){{P}_{2}}(e)+P({{\omega }_{1}}){{P}_{1}}(e) \\ \end{align} (5)

 图 4 特征选择 Figure 4 Feature selection
2.2.2 Parzen窗密度函数估计

 $k(\mathbf{x},{{\mathbf{x}}_{i}})=\frac{1}{\sqrt{2\mathrm{ }\!\!\pi\!\!\text{ }}\sigma }\exp \{-\frac{{{(\mathbf{x}-{{\mathbf{x}}_{i}})}^{2}}}{2{{\sigma }^{2}}}\}$ (6)

 $\hat{p}(z)=\frac{1}{N}\sum\limits_{i=1}^{N}{\frac{1}{\sqrt{2\pi }\sigma }\exp \{-\frac{{{(z-z_{\Delta t}^{i})}^{2}}}{2{{\sigma }^{2}}}\}}$ (7)

2.2.3 基于概率分布的可分性判据

 ${{J}_{D}}=\int_{z}{[p(z|{{\omega }_{1}})-p(z|{{\omega }_{2}})]\ln \frac{p(z|{{\omega }_{1}})}{p(z|{{\omega }_{2}})}}dz$ (8)

1) 初始化训练样本的权重：

 ${{\omega }_{i}}=\frac{1}{N}\begin{matrix} ; & i=1,2,...,N \\ \end{matrix}$ (9)

2) 对第m次迭代，利用加权后的样本构造弱分类器fm(x)，分类并计算分类错误率em，令

 ${{c}_{m}}={{\log }_{2}}(1-{{e}_{m}}/{{e}_{m}})$ (10)

3) 更新样本的权值，令

 ${{\omega }_{i}}={{\omega }_{i}}\exp [{{c}_{m}}{{l}_{({{y}_{i}}\ne {{f}_{m}}({{\mathbf{x}}_{i}}))}}]$ (11)

 $\sum\limits_{i=1}^{N}{{{\omega }_{i}}=1}$ (12)

 ${{l}_{({{y}_{i}}\ne {{f}_{m}}({{\mathbf{x}}_{i}}))}}=\left\{ \begin{matrix} \begin{matrix} 1, & {{y}_{i}}\ne {{f}_{m}}({{\mathbf{x}}_{i}}) \\ \end{matrix} \\ \begin{matrix} 0, & {{y}_{i}}={{f}_{m}}({{\mathbf{x}}_{i}}) \\ \end{matrix} \\ \end{matrix} \right.$ (13)

4) 重复执行2) ~3) 步直到m达到最大迭代次数M。

5) 由M组弱分类器组合得到强分类器h(x)，对于待分类样本x，分类器的输出为：

 $h(\mathbf{x})=sgn [\sum\limits_{m=1}^{M}{{{c}_{m}}{{f}_{m}}(\mathbf{x})}]$ (14)

4 实验及结果 4.1 数据准备

4.2 特征选择

 图 5 不同时间尺度标准差特征散度距离 Figure 5 Divergence distance of standard difference characteristics among different time scales

4.3 分类结果

5 结语

 [1] WANG L, ABDEL-ATY M. Predicting crashes on expressway ramps with real-time traffic and weather data[C]//TRB 2015:Proceedings of the Transportation Research Board 94th Annual Meeting. Washington, DC:Transportation Research Board Business Office, 2015:32-38. [2] WANG L, ABDEL-ATY M. Real-time crash prediction for expressway weaving segments[J]. Transportation Research Part C, 2015, 61 : 1-10. doi: 10.1016/j.trc.2015.10.008 [3] YU R, ABDEL-ATY M. Utilizing support vector machine in real-time crash risk evaluation[J]. Accident Analysis and Prevention, 2013, 51 : 252-259. doi: 10.1016/j.aap.2012.11.027 [4] HOSSAIN M, MUROMACHI Y. A Bayesian network based framework for real-time crash prediction on the basic freeway segments of urban expressways[J]. Accident Analysis and Prevention, 2012, 45 (1) : 373-381. [5] XU C, LIU P, WANG W, et al. Evaluation of the impacts of traffic states on crash risks on freeways[J]. Accident Analysis and Prevention, 2012, 47 : 162-171. doi: 10.1016/j.aap.2012.01.020 [6] XU C, ANDREW P T, WANG W, et al. Predicting crash likelihood and severity on freeways with real-time loop detector data[J]. Accident Analysis and Prevention, 2013, 57 : 30-39. doi: 10.1016/j.aap.2013.03.035 [7] 林震, 杨浩. 基于车速的交通事故贝叶斯预测[J]. 中国安全科学学报, 2003, 13 (2) : 34-36. ( LIN Z, YANG H. Bayesian prediction of traffic accident based on vehicle speed[J]. China Safety Science Journal, 2003, 13 (2) : 34-36. ) [8] 秦小虎, 刘利, 张颖. 一种基于贝叶斯网络模型的交通事故预测方法[J]. 计算机仿真, 2005, 22 (11) : 230-232. ( QIN X H, LIU L, ZHANG Y. A traffic accident prediction method based on Bayesian network model[J]. Computer Simulation, 2005, 22 (11) : 230-232. ) [9] LV Y S, TANG S M. Real-time highway traffic accident prediction based on the k-nearest neighbor method[C]//ICMTMA 2009:Proceedings of the 2009 International Conference on Measuring Technology and Mechatronics Automation. Piscataway, NJ:IEEE, 2009:547-550. [10] LV Y S, TANG S M. Real-time highway accident prediction based on support vector machines[C]//CCDC'09:Proceedings of the21st Annual International Conference on2009 Chinese Control and Decision Conference. Piscataway, NJ:IEEE, 2009:4403-4407. [11] 贺邓超, 张宏军, 郝文宁. 基于Parzen窗条件互信息计算的特征选择方法[J]. 计算机应用研究, 2015, 32 (5) : 1387-1390. ( HE D C, ZHANG H J, HAO W N. Feature selection based on conditional mutual information computation with Parzen window[J]. Application Research of Computers, 2015, 32 (5) : 1387-1390. ) [12] 张宏稷, 杨健, 李延. 基于条件熵和Parzen窗的极化SAR舰船检测[J]. 清华大学学报(自然科学版), 2012, 52 (12) : 1693-1697. ( ZHANG H J, YANG J, LI Y. Ship detection in polarimetric SAR images based on the conditional entropy and Parzen windows[J]. Journal of Tsinghua University (Science and Technology), 2012, 52 (12) : 1693-1697. ) [13] 张学工. 模式识别[M]. 北京: 清华大学出版社, 2010 : 146 -150. ( ZHANG X G. Pattern Recognition[M]. Beijing: Tsinghua University Press, 2010 : 146 -150. ) [14] 曹莹, 苗启广, 刘家辰. AdaBoost算法研究进展与展望[J]. 自动化学报, 2013, 39 (6) : 745-758. ( CAO Y, MIAO Q G, LIU J C. Advance and prospects of AdaBoost algorithm[J]. Acta Automatica Sinica, 2013, 39 (6) : 745-758. doi: 10.1016/S1874-1029(13)60052-X ) [15] 贾润莹, 李静, 王刚. 基于AdaBoost和遗传算法的硬盘故障预测模型优化及选择[J]. 计算机研究与发展, 2014, 51 (Suppl.) : 148-154. ( JIA R Y, LI J, WANG G. Optimization and choice of hard drive failure prediction models based on AdaBoost and genetic algorithm[J]. Journal of Computer Research and Development, 2014, 51 (Suppl.) : 148-154. )