计算机应用   2017, Vol. 37 Issue (1): 138-144  DOI: 10.11772/j.issn.1001-9081.2017.01.0138 0

### 引用本文

MAO Yingchi, QI Hai, JIE Qing, WANG Longbao. M-TAEDA: temporal abnormal event detection algorithm for multivariate time-series data of water quality[J]. JOURNAL OF COMPUTER APPLICATIONS, 2017, 37(1): 138-144. DOI: 10.11772/j.issn.1001-9081.2017.01.0138.

### 文章历史

M-TAEDA:多变量水质参数时序数据异常事件检测算法

M-TAEDA: temporal abnormal event detection algorithm for multivariate time-series data of water quality
MAO Yingchi, QI Hai, JIE Qing, WANG Longbao
College of Computer and Information, Hohai University, Nanjing Jiangsu 211100, China
Abstract: The real-time time-series data of multiple water parameters are acquired via the water sensor networks deployed in the water supply network. The accurate and efficient detection and warning of pollution events to prevent pollution from spreading is one of the most important issues when the pollution occurs. In order to comprehensively evaluate the abnormal event detection to reduce the detection deviation, a Temproal Abnormal Event Detection Algorithm for Multivariate time series data (M-TAEDA) was proposed. In M-TAEDA, it could analyze the time-series data of multiple parameters with BP (Back Propagation) model to determine the possible outliers, respectively. M-TAEDA algorithm could detect the potential pollution events through Bayesian sequential analysis to estimate the probability of an abnormal event. Finally, it can make decision through the multiple event probability fusion in the water supply systems. The experimental results indicate that the proposed M-TAEDA algorithm can get the 90% accuracy with BP model and improve the rate of detection about 40% and reduce the false alarm rate about 45% compared with the temporal abnormal event detection of Single-Variate Temproal Abnormal Event Detection Algorithm (S-TAEDA).
Key words: Wireless Sensor Network (WSN)    abnormal event detection    Back Propagation (BP) model    multivariate water quality parameter    time-series data
0 引言

1 相关工作

2 问题陈述

 图 1 供水管网拓扑结构实例 Figure 1 An example of water supply network topology

3 M-TAEDA主要思想

 图 2 离线阶段流程 Figure 2 Procedure in off-line phase
 图 3 M-TAEDA(在线阶段)流程 Figure 3 Procedure in on-line phase (M-TAEDA)

1) 数据分析。通过BP模型模拟水质参数之间的相互作用。

2) 识别异常值。计算残差，每个水质参数在训练阶段得到固定的阈值，将观察值归类为正常或异常值。

3) 确定单变量参数异常事件。基于误差结果的分类，通过序贯更新贝叶斯更新确定单变量水质参数的事件概率。

4) 融合决策。来自多个水质监测指标的信息融合，提供统一的决策结果，确定供水管网在具体节点处是否有异常事件发生。

4 M-TAEDA方法 4.1 BP模型模拟水质参数

 ${{f}_{k}}(x,w)={{\varphi }_{0}}[{{w}_{0}}+\sum\limits_{j}{{{w}_{jk}}\varphi ({{w}_{0j}}+\sum\limits_{i}{{{w}_{ij}}}{{x}_{i}})}]$ (1)

 ${{\hat{x}}_{i}}(t)=f({{x}_{1}}(t),...,{{x}_{i-1}}(t),{{x}_{i}}(t-1),{{x}_{i+1}}(t),...,{{x}_{n}}(t))$ (2)

 ${{\hat{x}}_{游离氯}}(t)=f[{{x}_{EC}}(t),{{x}_{pH}}(t),{{x}_{温度}}(t),{{x}_{TOC}}(t),{{x}_{浊度}}(t),{{x}_{游离氯}}(t\text{-1})]$ (3)
 图 4 游离氯参数的BP网络结构 Figure 4 BP network structure of free chlorine
4.2 误差评估与分类

 $E{{R}_{i}}(t)={{x}_{i}}(t)-f(\cdot )={{x}_{i}}(t)-{{\hat{x}}_{i}}(t)$ (4)

4.3 序贯贝叶斯更新

RD和FAR是异常事件检测性能评价常用标准。RD表示检测出异常的数目占实际发生异常总次数的比值。FAR表示检测出的虚假异常占所有决策次数的比值，如式(5) 所示：

 $\left\{ \begin{array}{l} RD = \frac{{TP}}{{TP + FN}} \times 100\% \\ FAR = \frac{{FP}}{{TN + FP}} \times 100\% \end{array} \right.$ (5)

 $P({E_t}){\rm{ = }}\left\{ \begin{array}{l} P({E_t}|{O_t}){\rm{ }}if{\rm{ Residua}}{{\rm{l}}_t}{\rm{ is Outlier}}\\ P({E_t}|\overline {{O_t}} ){\rm{ }}if{\rm{ Residua}}{{\rm{l}}_t}{\rm{ is Normal}} \end{array} \right.$ (6)
 $\left\{ \begin{array}{*{35}{l}} P\text{(}{{E}_{t}}|{{O}_{t}})= & \frac{P(O|E)\times P({{E}_{t-1}})}{P(O|E)\times P({{E}_{t-1}}\text{)}+P\text{(}O|\overline{E}\text{)}\times P\text{(}{{\overline{E}}_{t-1}}\text{)}}= \\ {} & \frac{RD\times P\text{(}{{E}_{t-1}}\text{)}}{RD\times P\text{(}{{E}_{t-1}}\text{)}+FAR\times \text{(}1-P\text{(}{{E}_{t-1}}\text{))}} \\ P\text{(}{{E}_{t}}|{{\overline{O}}_{t}}\text{)}= & \frac{P\text{(}O|E\text{)}\times P\text{(}{{E}_{t-1}}\text{)}}{P\text{(}\overline{O}|E\text{)}\times P\text{(}{{E}_{t-1}}\text{)}+P\text{(}\overline{O}|\overline{E}\text{)}\times P\text{(}{{\overline{E}}_{t-1}}\text{)}}= \\ {} & \frac{RD\times P({{E}_{t-1}})}{\text{(}1-RD\text{)}\times P\text{(}{{E}_{t-1}}\text{)}+\text{(}1-FAR\text{)}\times \text{(}1-P\text{(}{{E}_{t-1}}\text{))}} \\ \end{array} \right.$ (7)

4.4 多变量融合决策

 图 5 单变量参数的事件概率 Figure 5 Event probability of single water quality parameter

5 实验验证 5.1 实验环境设置

 图 7 多变量参数时间序列 Figure 7 Time series of multiple water quality parameters
5.2 实验结果分析 5.2.1 BP模型预测效果验证

5.2.2 与S-TAEDA对比分析

 图 8 两种算法检出率的对比 Figure 8 Comparison of rate of detection between M-TAEDA and S-TAEDA
 图 9 两种算法误报率的对比 Figure 9 Comparison of false alarm rate between M-TAEDA and S-TAEDA

ROC曲线是检出率和误报率一种更直观的表现，以可视化的方式表示RD和FAR之间的权衡关系。本实验通过设定出多个不同的临界值，计算出S-TAEDA和M-TAEDA的多个检出率和误报率的值。图 10显示了S-TAEDA与M-TAEDA的ROC曲线。从图 10中可以看出，M-TAEDA的ROC曲线下面积明显大于比S-TAEDA，表明M-TAEDA比S-TAEDA的检测精确度高，误报率低，检测效果理想。

 图 10 两种算法对应的ROC曲线 Figure 10 Comparison of ROC curve between M-TAEDA and S-TAEDA

6 结语

 [1] HALL J, HERRMANN J G. On-line water quality parameters as indicators of distribution system contamination[J]. Journal American Water Works Association, 2007, 99 (1) : 66-77. [2] HUANG T, MA X, JI X, et al. Online detecting spreading events with the spatio-temporal relationship in water distribution networks[M]//Advanced Data Mining and Applications. Berlin:Springer, 2013:145-156. [3] STOTEY M V, GAAG B V D, BURNS B P. Advances in on-line drinking water quality monitoring and early warning systems[J]. Water Research, 2011, 45 (2) : 741-747. doi: 10.1016/j.watres.2010.08.049 [4] YIM S J, CHOI Y H. Fault-tolerant event detection using two thresholds in wireless sensor networks[C]//Proceedings of the 15th IEEE Pacific Rim International Symposium on Dependable Computing. Piscataway, NJ:IEEE, 2009:331-335. [5] XUE W, LUO Q, WU H. Pattern-based event detection in sensor networks[J]. Distributed & Parallel Databases, 2012, 30 (1) : 27-62. [6] BYRT D, CARLSON K H. Expanded summary:real-time detection of intentional chemical contamination in the distribution system[J]. Journal American Water Works Association, 2005, 97 (7) : 130-133. [7] WANG X R, LIZIER J T, OBST O, et al. Spatiotemporal anomaly detection in gas monitoring sensor networks[C]//EWSN 2008:Proceedings of the 5th European Conference on Wireless Sensor Networks. Berlin:Springer, 2008:90-105. [8] UUSITAL L. Advantages and challenges of Bayesian networks in environmental modelling[J]. Ecological Modelling, 2014, 203 (3/4) : 312-318. [9] ELIADED G, LAMBROU T P, PANAYIOTOU C G, et al. Contamination event detection in water distribution systems using a model-based approach[J]. Procedia Engineering, 2014, 89 : 1089-1096. doi: 10.1016/j.proeng.2014.11.229 [10] 侯迪波, 陈玥, 赵海峰, 等. 基于RBF神经网络和小波分析的水质异常检测方法[J]. 传感器与微系统, 2013, 32 (2) : 138-141. ( HOU D B, CHEN Y, ZHAO H F, et al. Based on the RBF neural network and wavelet analysis the water quality of anomaly detection method[J]. Transducer and Microsystem Technologies, 2013, 32 (2) : 138-141. ) [11] PERELMAN L, OSTFELD A. Bayesian networks for source intrusion detection[J]. Journal of Water Resources Planning and Management, 2012, 139 (4) : 426-432. [12] 孔英会, 景美丽. 基于混淆矩阵和集成学习的分类方法研究[J]. 计算机工程与科学, 2012, 34 (6) : 111-117. ( KONG Y H, JING M L. Classification method based on confusion matrix and the integrated learning research[J]. Computer Engineering and Science, 2012, 34 (6) : 111-117. ) [13] MURRAY R, HAXTON T, et al Water quality event detection systems for drinking water contamination warning systems:Development testing and application of CANARY[EB/OL].[2016-06-20] . https://cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=496189. [14] KLISE K A, MCKENNA S A. Multivariate applications for detecting anomalous water quality[C]//Proceedings of the 2006 Symposium on Water Distribution Systems Analysis. Cincinnati, OH:American Society of Civil Engineers, 2011:1-11. [15] MCKENNA S A, WILSON M, KLISE K A. Detecting changes in water quality data[J]. Journal American Water Works Association, 2008, 77 (1) : 74-85.