计算机应用   2017, Vol. 37 Issue (1): 138-144  DOI: 10.11772/j.issn.1001-9081.2017.01.0138 0

MAO Yingchi, QI Hai, JIE Qing, WANG Longbao. M-TAEDA: temporal abnormal event detection algorithm for multivariate time-series data of water quality[J]. JOURNAL OF COMPUTER APPLICATIONS, 2017, 37(1): 138-144. DOI: 10.11772/j.issn.1001-9081.2017.01.0138.

M-TAEDA:多变量水质参数时序数据异常事件检测算法

M-TAEDA: temporal abnormal event detection algorithm for multivariate time-series data of water quality
MAO Yingchi, QI Hai, JIE Qing, WANG Longbao
College of Computer and Information, Hohai University, Nanjing Jiangsu 211100, China
Abstract: The real-time time-series data of multiple water parameters are acquired via the water sensor networks deployed in the water supply network. The accurate and efficient detection and warning of pollution events to prevent pollution from spreading is one of the most important issues when the pollution occurs. In order to comprehensively evaluate the abnormal event detection to reduce the detection deviation, a Temproal Abnormal Event Detection Algorithm for Multivariate time series data (M-TAEDA) was proposed. In M-TAEDA, it could analyze the time-series data of multiple parameters with BP (Back Propagation) model to determine the possible outliers, respectively. M-TAEDA algorithm could detect the potential pollution events through Bayesian sequential analysis to estimate the probability of an abnormal event. Finally, it can make decision through the multiple event probability fusion in the water supply systems. The experimental results indicate that the proposed M-TAEDA algorithm can get the 90% accuracy with BP model and improve the rate of detection about 40% and reduce the false alarm rate about 45% compared with the temporal abnormal event detection of Single-Variate Temproal Abnormal Event Detection Algorithm (S-TAEDA).
Key words: Wireless Sensor Network (WSN)    abnormal event detection    Back Propagation (BP) model    multivariate water quality parameter    time-series data
0 引言

1 相关工作

2 问题陈述

 图 1 供水管网拓扑结构实例 Figure 1 An example of water supply network topology

3 M-TAEDA主要思想

 图 2 离线阶段流程 Figure 2 Procedure in off-line phase
 图 3 M-TAEDA(在线阶段)流程 Figure 3 Procedure in on-line phase (M-TAEDA)

1) 数据分析。通过BP模型模拟水质参数之间的相互作用。

2) 识别异常值。计算残差，每个水质参数在训练阶段得到固定的阈值，将观察值归类为正常或异常值。

3) 确定单变量参数异常事件。基于误差结果的分类，通过序贯更新贝叶斯更新确定单变量水质参数的事件概率。

4) 融合决策。来自多个水质监测指标的信息融合，提供统一的决策结果，确定供水管网在具体节点处是否有异常事件发生。

4 M-TAEDA方法 4.1 BP模型模拟水质参数

 ${{f}_{k}}(x,w)={{\varphi }_{0}}[{{w}_{0}}+\sum\limits_{j}{{{w}_{jk}}\varphi ({{w}_{0j}}+\sum\limits_{i}{{{w}_{ij}}}{{x}_{i}})}]$ (1)

 ${{\hat{x}}_{i}}(t)=f({{x}_{1}}(t),...,{{x}_{i-1}}(t),{{x}_{i}}(t-1),{{x}_{i+1}}(t),...,{{x}_{n}}(t))$ (2)

 ${{\hat{x}}_{游离氯}}(t)=f[{{x}_{EC}}(t),{{x}_{pH}}(t),{{x}_{温度}}(t),{{x}_{TOC}}(t),{{x}_{浊度}}(t),{{x}_{游离氯}}(t\text{-1})]$ (3)
 图 4 游离氯参数的BP网络结构 Figure 4 BP network structure of free chlorine
4.2 误差评估与分类

 $E{{R}_{i}}(t)={{x}_{i}}(t)-f(\cdot )={{x}_{i}}(t)-{{\hat{x}}_{i}}(t)$ (4)

4.3 序贯贝叶斯更新

RD和FAR是异常事件检测性能评价常用标准。RD表示检测出异常的数目占实际发生异常总次数的比值。FAR表示检测出的虚假异常占所有决策次数的比值，如式(5) 所示：

 $\left\{ \begin{array}{l} RD = \frac{{TP}}{{TP + FN}} \times 100\% \\ FAR = \frac{{FP}}{{TN + FP}} \times 100\% \end{array} \right.$ (5)

 $P({E_t}){\rm{ = }}\left\{ \begin{array}{l} P({E_t}|{O_t}){\rm{ }}if{\rm{ Residua}}{{\rm{l}}_t}{\rm{ is Outlier}}\\ P({E_t}|\overline {{O_t}} ){\rm{ }}if{\rm{ Residua}}{{\rm{l}}_t}{\rm{ is Normal}} \end{array} \right.$ (6)
 $\left\{ \begin{array}{*{35}{l}} P\text{(}{{E}_{t}}|{{O}_{t}})= & \frac{P(O|E)\times P({{E}_{t-1}})}{P(O|E)\times P({{E}_{t-1}}\text{)}+P\text{(}O|\overline{E}\text{)}\times P\text{(}{{\overline{E}}_{t-1}}\text{)}}= \\ {} & \frac{RD\times P\text{(}{{E}_{t-1}}\text{)}}{RD\times P\text{(}{{E}_{t-1}}\text{)}+FAR\times \text{(}1-P\text{(}{{E}_{t-1}}\text{))}} \\ P\text{(}{{E}_{t}}|{{\overline{O}}_{t}}\text{)}= & \frac{P\text{(}O|E\text{)}\times P\text{(}{{E}_{t-1}}\text{)}}{P\text{(}\overline{O}|E\text{)}\times P\text{(}{{E}_{t-1}}\text{)}+P\text{(}\overline{O}|\overline{E}\text{)}\times P\text{(}{{\overline{E}}_{t-1}}\text{)}}= \\ {} & \frac{RD\times P({{E}_{t-1}})}{\text{(}1-RD\text{)}\times P\text{(}{{E}_{t-1}}\text{)}+\text{(}1-FAR\text{)}\times \text{(}1-P\text{(}{{E}_{t-1}}\text{))}} \\ \end{array} \right.$ (7)

4.4 多变量融合决策

 图 5 单变量参数的事件概率 Figure 5 Event probability of single water quality parameter

5 实验验证 5.1 实验环境设置

 图 7 多变量参数时间序列 Figure 7 Time series of multiple water quality parameters
5.2 实验结果分析 5.2.1 BP模型预测效果验证

5.2.2 与S-TAEDA对比分析

 图 8 两种算法检出率的对比 Figure 8 Comparison of rate of detection between M-TAEDA and S-TAEDA
 图 9 两种算法误报率的对比 Figure 9 Comparison of false alarm rate between M-TAEDA and S-TAEDA

ROC曲线是检出率和误报率一种更直观的表现，以可视化的方式表示RD和FAR之间的权衡关系。本实验通过设定出多个不同的临界值，计算出S-TAEDA和M-TAEDA的多个检出率和误报率的值。图 10显示了S-TAEDA与M-TAEDA的ROC曲线。从图 10中可以看出，M-TAEDA的ROC曲线下面积明显大于比S-TAEDA，表明M-TAEDA比S-TAEDA的检测精确度高，误报率低，检测效果理想。

 图 10 两种算法对应的ROC曲线 Figure 10 Comparison of ROC curve between M-TAEDA and S-TAEDA

6 结语

