ZHANG Yong, YANG Hao. Image classification method based on optimized bag-of-visual words model[J]. Journal of Computer Applications, 2017, 37(8): 2244-2247, 2252. DOI: 10.11772/j.issn.1001-9081.2017.08.2244.

Image classification method based on optimized bag-of-visual words model
ZHANG Yong, YANG Hao
School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China
Abstract: Concerning the problem that too large visual dictionary may increase the time cost of image classification in the Bag-Of-Visual words (BOV) model, a Weighted-Maximal Relevance-Minimal Semantic similarity (W-MR-MS) criterion was proposed to optimize visual dictionary. Firstly, the Scale Invariant Feature Transform (SIFT) features of images were extracted, and the K-Means algorithm was used to generate an original visual dictionary. Secondly, the correlation between visual words and image categories and semantic similarity among visual words were calculated, and a weighted parameter was introduced to measure the importance of the correlation and the semantic similarity in image classification. Finally, based on the weighing result, the visual word which correlation with image categories was weak and semantic similarity among visual words was high was removed, which achieved the purpose of optimizing the visual dictionary. The experimental results show that the classification precision of the proposed method is 5.30% higher than that of the traditional K-Means algorithm under the same visual dictionary scale; the time cost of the proposed method is reduced by 32.18% compared with the traditional K-Means algorithm under the same classification precision. Therefore, the proposed method has high classification efficiency and it is suitable for image classification.
Key words: image classification    Bag-Of-Visual words (BOV) model    feature extraction    visual dictionary
0 引言

1 词袋模型下对图像的表示

BOV模型下视觉词典的规模对图像的分类性能具有较大影响，而基于BOV模型对图像的表示就是将图像特征一一量化到视觉词典上，即用视觉单词频率直方图来表示一幅图像，视觉单词频率直方图在这里被称为视觉词汇直方图。图 1为BOV下的图像表示示意图，由图 1可知，视觉词汇直方图的好坏可以决定图像分类的精度，而且视觉词汇直方图的维度大小直接影响了图像分类中的时间复杂度，因此，适当的视觉词典规模能够提升图像分类的性能。

 图 1 BOV模型下对图像的表示 Figure 1 Image representation of BOV model
2 W-MR-MS准则的定义 2.1 视觉单词与图像类别间的相关性

 $\mathit{AvI}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}\mathit{c}{\rm{) = }}\frac{1}{{\left| \mathit{C} \right|}}\sum\limits_{\mathit{c} \in \left( {{\rm{1, 2, \ldots, }}\mathit{C}} \right)} {\mathit{I}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}\mathit{c}{\rm{)}}}$ (1)
 $\mathit{I}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}\mathit{c}{\rm{) = }}\sum\limits_{{\mathit{d}_\mathit{m}} \in \left\{ {0, 1} \right\}} {\mathit{p}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}\mathit{c}{\rm{)}}\;} {\rm{lb}}\frac{{\mathit{p}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}\mathit{c}{\rm{)}}}}{{\mathit{p}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{)}}\mathit{p}\left( \mathit{c} \right)}}$ (2)
2.2 视觉单词之间的语义相似性

 ${\mathit{d}_{\mathit{ij}}}{\rm{ = exp( - }}\mathit{h}_{\mathit{ij}}^2{\rm{/2)}}$ (3)

 ${\mathit{h}_{\mathit{ij}}}{\rm{ = }}{\left\| {{\mathit{\boldsymbol{u}}_\mathit{i}}{\rm{ - }}{\mathit{\boldsymbol{u}}_\mathit{j}}} \right\|_{\rm{2}}}{\rm{/(}}{\mathit{r}_\mathit{c}}{\rm{ \times }}{\mathit{s}_\mathit{i}}{\rm{)}}$ (4)

 $\mathit{c}{\mathit{d}_\mathit{k}}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{) = }}\sum\limits_{{\mathit{p}_\mathit{j}} \in \mathit{SC}{\mathit{R}_{{\mathit{p}_\mathit{i}}}}{\rm{, }}{\mathit{p}_\mathit{j}} \in {\mathit{H}_\mathit{k}}} {{\mathit{d}_{\mathit{ij}}}}$ (5)

 $\mathit{SC}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{) = [}}\mathit{c}{\mathit{d}_{\rm{1}}}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{), }}\mathit{c}{\mathit{d}_{\rm{2}}}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{), \ldots, }}\mathit{c}{\mathit{d}_\mathit{n}}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{), \ldots, }}\mathit{c}{\mathit{d}_\mathit{N}}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{)]}}$ (6)

 $\mathit{SC}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{) = }}\frac{1}{{\left| {{\mathit{R}_\mathit{k}}} \right|}}\sum\limits_{{\mathit{p}_\mathit{i}} \in {\mathit{R}_\mathit{m}}} {\mathit{SC}{\rm{(}}{\mathit{p}_\mathit{i}}{\rm{)}}}$ (7)

 $\begin{array}{l} \mathit{sim}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}{\mathit{d}_\mathit{n}}{\rm{) = cos(}}\mathit{SC}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{), }}\mathit{SC}{\rm{(}}{\mathit{d}_\mathit{n}}{\rm{)) = }}\\ \;\;\;\;\;\;\;\mathit{SC}\left( {{\mathit{d}_\mathit{m}}} \right){\rm{/}}{\left\| {\mathit{SC}\left( {{\mathit{d}_\mathit{m}}} \right)} \right\|_{\rm{2}}}{\rm{\cdot}}\mathit{SC}\left( {{\mathit{d}_\mathit{n}}} \right){\rm{/}}{\left\| {\mathit{SC}\left( {{\mathit{d}_\mathit{n}}} \right)} \right\|_{\rm{2}}} \end{array}$ (8)

 $\mathit{I'}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{) = }}\sum\limits_{\mathit{n}{\rm{ = 1, }}\mathit{n} \ne \mathit{m}}^\mathit{N} {\mathit{sim}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}{\mathit{d}_\mathit{n}}{\rm{)}}}$ (9)

2.3 W-MR-MS准则

 $\mathit{I}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{) = }}\left( {{\rm{1 - }}\mathit{\alpha }} \right){\rm{ \times }}\mathit{AvI}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{, }}\mathit{c}{\rm{) - }}\mathit{\alpha }{\rm{ \times }}\mathit{I'}{\rm{(}}{\mathit{d}_\mathit{m}}{\rm{)}}$ (10)

3 基于W-MR-MS准则的BOV模型

 图 2 基于W-MR-MS准则图像分类的系统框图 Figure 2 System diagram of image classification based on W-MR-MS criterion

1) 通过K-Means算法对局部特征聚类生成视觉词典D，其大小为K，本文中K=1 200。

2) 用视觉词典D对训练图像进行表示与分类，得到分类精度为P

3) 用式(10) 选出T个使I(dm)值最小的视觉单词，并从视觉词典中去掉这T个视觉单词，得到一个大小为K-T的视觉词典D，如果K-T大于阈值H，继续步骤2)；否则，停止循环。本文中T=10，H=400。

4 实验结果与分析 4.1 实验设置

4.2 实验结果 4.2.1 参数α对图像分类性能的影响

 图 3 参数α对图像分类性能的影响 Figure 3 Influence of α on image classification
4.2.2 视觉词典规模对图像分类性能的影响

 图 4 优化视觉词典的规模对图像分类性能的影响 Figure 4 Influence of scale of visual dictionary on image classification
4.2.3 本文方法与K-Means算法的比较

 图 5 本文方法在COREL数据集上的混淆矩阵 Figure 5 Confusion matrix of the proposed method on COREL dataset
5 结语

