计算机应用   2017, Vol. 37 Issue (1): 228-232  DOI: 10.11772/j.issn.1001-9081.2017.01.0228 0

### 引用本文

GAO Yaodong, HOU Lingyan, YANG Dali. Automatic image annotation method using multi-label learning convolutional neural network[J]. JOURNAL OF COMPUTER APPLICATIONS, 2017, 37(1): 228-232. DOI: 10.11772/j.issn.1001-9081.2017.01.0228.

### 基金项目

“十二五”国家科技支撑计划项目（2015BAK12B00）

### 文章历史

Automatic image annotation method using multi-label learning convolutional neural network
GAO Yaodong, HOU Lingyan, YANG Dali
College of Computer, Beijing Information Science and Technology University, Beijing 100101, China
Abstract: Focusing on the shortcoming of the automatic image annotation, the lack of information caused by artificially selecting features, convolutional neural network was used to learn the characteristics of samples. Firstly, in order to adapt to the characteristics of multi label learning of automatic image annotation and increase the recall rate of the low frequency words, the loss function of convolutional neural network was improved and a Convolutional Neural Network of Multi-Label Learning (CNN-MLL) model was constructed. Secondly, the correlation between the image annotation words was used to improve the output of the network model. Compared with other traditional methods on the Technical Committee 12 of the International Association for Pattern Recognition (IAPR TC-12) benchmark image annotation database, the experimental result show that the Convolutional Neural Network using Mean Square Error function (CNN-MSE) method achieves the average recall rate of 12.9% more than the Support Vector Machine (SVM) method, the average accuracy of 37.9% more than the Back Propagation Neural Network (BPNN) method. And the average accuracy rate and average recall rate of marked results improved CNN-MLL method is 23% and 20% higher than those of the traditional CNN. The results show that the marked results improved CNN-MLL method can effectively avoid the information loss caused by the artificially selecting features, and increase the recall rate of the low frequency words.
Key words: automatic image annotation    multi-label learning    Convolution Neural Network (CNN)    loss function
0 引言

1 卷积神经网络

1.1 卷积层与池化层

 $x_{j}^{l}=f(\sum\limits_{i\in {{M}_{j}}}{x_{i}^{l-1}*k_{j}^{l}}+b_{j}^{l})$ (1)

 图 1 规模为2×2的最大池化示意图 Figure 1 Schematic diagram of 2×2 max pooling

1.2 基于多标签学习的损失函数

 $E=\frac{1}{m}\sum\limits_{i=1}^{m}{E(i)}$ (2)
 $E(i)=\frac{1}{2}\sum\limits_{k=1}^{n}{{{(d_{i}^{k}-y_{i}^{y})}^{2}}}$ (3)

 $E=\sum\limits_{i=1}^{m}{{{E}_{i}}}=\sum\limits_{i=1}^{m}{\frac{1}{|{{Y}_{i}}||\overline{{{Y}_{i}}}|}}\sum\limits_{(k,l)\in {{Y}_{i}}\times \overline{{{Y}_{i}}}}{\exp (-(c_{k}^{i}-c_{l}^{i}))}$ (4)

 $E=\sum\limits_{i=1}^{m}{{{E}_{i}}}=\sum\limits_{i=1}^{m}{\frac{1}{|{{Y}_{i}}||\overline{{{Y}_{i}}}|}}\sum\limits_{(k,l)\in {{Y}_{i}}\times \overline{{{Y}_{i}}}}{\exp (-({{\alpha }_{k}}c_{k}^{i}-c_{l}^{i}))}$ (5)

 ${{\alpha }_{k}}=\frac{{{L}_{k}}/n}{\max (L/n)}$ (6)

2 基于标注词共生矩阵的标注改善

 ${{R}_{ij}}=S(i,j)/S(i)$ (7)

 $O=R*C$ (8)
3 本文的卷积神经网络模型结构

 图 2 本文使用的卷积神经网络结构 Figure 2 CNN structure used by this paper
4 实验与结果分析

4.1 评价指标

 $\text{P=}\frac{1}{n}\sum\limits_{i=1}^{n}{\frac{A_{i}^{r}}{A_{i}^{y}}}$
 $\text{R}=\frac{1}{n}\underset{i=1}{\overset{n}{\mathop \sum }}\,\frac{A_{i}^{r}}{A_{i}^{d}}$
 $\text{F}1=2PR/\left( P+R \right)$

4.2 实验结果

 图 3 数据集中每个标签的准确率、召回率和F1的值 Figure 3 Accuracy rate,recall rate and F1 value of each label in dataset

5 结语

 [1] 许红涛, 周向东, 向宇, 等. 一种自适应的Web图像语义自动标注方法[J]. 软件学报, 2010, 21 (9) : 2186-2195. ( XU H T, ZHOU X D, XIANG Y, et al. Adaptive model for Web image semantic automatic annotation[J]. Journal of Software, 2010, 21 (9) : 2186-2195. ) [2] YANG C B, DONG M, HUA J. Region-based image annotation using asymmetrical support vector machine-based multiple instance learning[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2006:2057-2063. [3] GAO Y, FAN J, XUE X, et al. Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers[C]//Proceedings of the 2006 ACM International Conference on Multimedia. New York:ACM, 2006:901-910. [4] MURTHY V N, CAN E F, MANMATHA R. A hybrid model for automatic image annotation[C]//Proceedings of the 2014 ACM International Conference on Multimedia Retrieval. New York:ACM, 2014:369. [5] 吴伟, 聂建云, 高光来. 一种基于改进的支持向量机多分类器图像标注方法[J]. 计算机工程与科学, 2015, 37 (7) : 1338-1343. ( WU W, NIE J Y, GAO G L. Improved SVM multiple classifiers for image annotation[J]. Computer Engineering & Science, 2015, 37 (7) : 1338-1343. ) [6] MORAN S, LAVRENKO V. Sparse kernel learning for image annotation[C]//Proceedings of the 2014 International Conference on Multimedia Retrieval. New York:ACM, 2014:113. [7] VERMA Y, JAWAHAR C V. Image annotation using metric learning in semantic neighbourhoods[M]//ECCV'12:Proceedings of the 12th European Conference on Computer Vision. Berlin:Springer, 2012:836-849. [8] HOU J, CHEN Z, QIN X, et al. Automatic image search based on improved feature descriptors and decision tree[J]. Integrated Computer Aided Engineering, 2011, 18 (2) : 167-180. [9] 蒋黎星, 侯进. 基于集成分类算法的自动图像标注[J]. 自动化学报, 2012, 38 (8) : 1257-1262. ( JIANG L X, HOU J. Image annotation using the ensemble learning[J]. Acta Automatica Sinica, 2012, 38 (8) : 1257-1262. doi: 10.3724/SP.J.1004.2012.01257 ) [10] ZHANG M L, ZHOU Z H. Multilabel neural networks with applications to functional genomics and text categorization[J]. IEEE Transactions on Knowledge & Data Engineering, 2006, 18 (10) : 1338-1351. [11] READ J, PEREZCRUZ F. Deep learning for multi-label classification[J]. Machine Learning, 2014, 85 (3) : 333-359. [12] WU F, WANG Z H, ZHANG Z F, et al. Weakly semi-supervised deep learning for multi-label image annotation[J]. IEEE Transactions on Big Data, 2015, 1 (3) : 109-122. doi: 10.1109/TBDATA.2015.2497270 [13] DUYGULU P, BARNARD K, DE FREITAS J F G, et al. Object recognition as machine translation:learning a lexicon for a fixed image vocabulary[C]//ECCV 2002:Proceedings of the 7th European Conference on Computer Vision. Berlin:Springer, 2002:97-112. [14] BALLAN L, URICCHIO T, SEIDENARI L, et al. A cross-media model for automatic image annotation[C]//Proceedings of the 2014 International Conference on Multimedia Retrieval. New York:ACM, 2014:73. [15] WANG C, BLEI D, LI F F. Simultaneous image classification and annotation[C]//Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2009:1903-1910. [16] 李志欣, 施智平, 李志清, 等. 融合语义主题的图像自动标注[J]. 软件学报, 2011, 22 (4) : 801-812. ( LI Z X, SHI Z P, LI Z Q, et al. Automatic image annotation by fusing semantic topics[J]. Journal of Software, 2011, 22 (4) : 801-812. doi: 10.3724/SP.J.1001.2011.03742 ) [17] 刘凯, 张立民, 孙永威, 等. 利用深度玻尔兹曼机与典型相关分析的自动图像标注算法[J]. 西安交通大学学报, 2015, 49 (6) : 33-38. ( LIU K, ZHANG L M, SUN Y W, et al. An automatic image algorithm using deep Boltzmann machine and canonical correlation analysis[J]. Journal of Xi'an Jiaotong University, 2015, 49 (6) : 33-38. ) [18] FUKUSHIMA K, MIYAKE S. Neocognitron:a new algorithm for pattern recognition tolerant of deformations and shifts in position[J]. Pattern Recognition, 1982, 15 (6) : 455-469. doi: 10.1016/0031-3203(82)90024-3 [19] LE CUN Y, BOSER B, DENKER J S, et al. Handwritten digit recognition with a back-propagation network[M]. San Francisco, CA: Morgan Kaufmann Publishers, 1990 : 396 -404. [20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[M]. . [21] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//ECCV 2014:Proceedings of the 13th European Conference on Computer Vision. Berlin:Springer, 2014:346-361. [22] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:1-9. [23] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2015:1026-1034. [24] IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. Washington, DC:IEEE Computer Society, 2015:448-456. [25] JIN C, JIN S W. Image distance metric learning based on neighborhood sets for automatic image annotation[J]. Journal of Visual Communication and Image Representation, 2016, 34 : 167-175. doi: 10.1016/j.jvcir.2015.10.017