计算机应用   2017, Vol. 37 Issue (1): 228-232  DOI: 10.11772/j.issn.1001-9081.2017.01.0228 0

### 引用本文

GAO Yaodong, HOU Lingyan, YANG Dali. Automatic image annotation method using multi-label learning convolutional neural network[J]. JOURNAL OF COMPUTER APPLICATIONS, 2017, 37(1): 228-232.

### 基金项目

“十二五”国家科技支撑计划项目（2015BAK12B00）

### 文章历史

Automatic image annotation method using multi-label learning convolutional neural network
GAO Yaodong, HOU Lingyan, YANG Dali
College of Computer, Beijing Information Science and Technology University, Beijing 100101, China
Abstract: Focusing on the shortcoming of the automatic image annotation, the lack of information caused by artificially selecting features, convolutional neural network was used to learn the characteristics of samples. Firstly, in order to adapt to the characteristics of multi label learning of automatic image annotation and increase the recall rate of the low frequency words, the loss function of convolutional neural network was improved and a Convolutional Neural Network of Multi-Label Learning (CNN-MLL) model was constructed. Secondly, the correlation between the image annotation words was used to improve the output of the network model. Compared with other traditional methods on the Technical Committee 12 of the International Association for Pattern Recognition (IAPR TC-12) benchmark image annotation database, the experimental result show that the Convolutional Neural Network using Mean Square Error function (CNN-MSE) method achieves the average recall rate of 12.9% more than the Support Vector Machine (SVM) method, the average accuracy of 37.9% more than the Back Propagation Neural Network (BPNN) method. And the average accuracy rate and average recall rate of marked results improved CNN-MLL method is 23% and 20% higher than those of the traditional CNN. The results show that the marked results improved CNN-MLL method can effectively avoid the information loss caused by the artificially selecting features, and increase the recall rate of the low frequency words.
Key words: automatic image annotation    multi-label learning    Convolution Neural Network (CNN)    loss function
0 引言

1 卷积神经网络

1.1 卷积层与池化层

 $x_{j}^{l}=f(\sum\limits_{i\in {{M}_{j}}}{x_{i}^{l-1}*k_{j}^{l}}+b_{j}^{l})$ (1)

 图 1 规模为2×2的最大池化示意图 Figure 1 Schematic diagram of 2×2 max pooling

1.2 基于多标签学习的损失函数

 $E=\frac{1}{m}\sum\limits_{i=1}^{m}{E(i)}$ (2)
 $E(i)=\frac{1}{2}\sum\limits_{k=1}^{n}{{{(d_{i}^{k}-y_{i}^{y})}^{2}}}$ (3)

 $E=\sum\limits_{i=1}^{m}{{{E}_{i}}}=\sum\limits_{i=1}^{m}{\frac{1}{|{{Y}_{i}}||\overline{{{Y}_{i}}}|}}\sum\limits_{(k,l)\in {{Y}_{i}}\times \overline{{{Y}_{i}}}}{\exp (-(c_{k}^{i}-c_{l}^{i}))}$ (4)

 $E=\sum\limits_{i=1}^{m}{{{E}_{i}}}=\sum\limits_{i=1}^{m}{\frac{1}{|{{Y}_{i}}||\overline{{{Y}_{i}}}|}}\sum\limits_{(k,l)\in {{Y}_{i}}\times \overline{{{Y}_{i}}}}{\exp (-({{\alpha }_{k}}c_{k}^{i}-c_{l}^{i}))}$ (5)

 ${{\alpha }_{k}}=\frac{{{L}_{k}}/n}{\max (L/n)}$ (6)

2 基于标注词共生矩阵的标注改善

 ${{R}_{ij}}=S(i,j)/S(i)$ (7)

 $O=R*C$ (8)
3 本文的卷积神经网络模型结构

 图 2 本文使用的卷积神经网络结构 Figure 2 CNN structure used by this paper
4 实验与结果分析

4.1 评价指标

 $\text{P=}\frac{1}{n}\sum\limits_{i=1}^{n}{\frac{A_{i}^{r}}{A_{i}^{y}}}$
 $\text{R}=\frac{1}{n}\underset{i=1}{\overset{n}{\mathop \sum }}\,\frac{A_{i}^{r}}{A_{i}^{d}}$
 $\text{F}1=2PR/\left( P+R \right)$

4.2 实验结果

 图 3 数据集中每个标签的准确率、召回率和F1的值 Figure 3 Accuracy rate,recall rate and F1 value of each label in dataset

5 结语

