1. 武汉科技大学 计算机科学与技术学院, 武汉 430065;
2. 智能信息处理与实时工业系统湖北省重点实验室 (武汉科技大学), 武汉 430065

Cross-media retrieval based on latent semantic topic reinforce
HUANG Yu1,2, ZHANG Hong1,2
1. School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System (Wuhan University of Science and Technology), Wuhan Hubei 430065, China
Abstract: As an important and challenging problem in the multimedia area, common semantic topic has different expression across different modalities, and exploring the intrinsic semantic information from different modalities in a collaborative manner was usually neglected by traditional cross-media retrieval methods. To address this problem, a Latent Semantic Topic Reinforce cross-media retrieval (LSTR) method was proposed. Firstly, the text semantic was represented based on Latent Dirichlet Allocation (LDA) and the corresponding images were represented with Bag of Words (BoW) model. Secondly, multiclass logistic regression was used to classify both texts and images, and the posterior probability under the learned classifiers was exploited to indicate the latent semantic topic of images and texts. Finally, the learned posterior probability was used to regularize their image counterparts to reinforce the image semantic topics, which greatly improved the semantic similarity between them. In the Wikipedia data set, the mean Average Precision (mAP) of retrieving text with image and retrieving image with text is 57.0%, which is 35.1%, 34.8% and 32.1% higher than that of the Canonical Correlation Analysis (CCA), Semantic Matching (SM) and Semantic Correlation Matching (SCM) method respectively. Experimental results show that the proposed method can effectively improve the average precision of cross-media retrieval.
Key words: cross-media retrieval    latent semantic topic    multiclass logistic regression    posterior probability    regularization
0 引言

1) 利用多分类逻辑回归对图像和文本进行分类, 得到分类模型, 然后利用分类模型计算图像和文本基于多分类的后验概率, 使用该后验概率向量表示图像和文本的潜语义主题。

2) 由于文本的潜语义主题比图像潜语义主题更加明晰, 为了使文本和图像的潜语义主题的相关性最大, 用文本潜语义主题正则化图像潜语义主题, 使图像和文本的潜语义主题趋于一致。

3) 利用皮尔逊相关系数来度量文本和图像向量之间的相似性, 实现图像和文本之间的相互检索。

1 提取图像和文本的潜语义主题

 $\begin{array}{l} {\mathit{\boldsymbol{h}}_\mathit{\boldsymbol{\theta }}}\left( {{\mathit{\boldsymbol{x}}^{\left( i \right)}}} \right) = \left[\begin{array}{l} p\left( {{\mathit{\boldsymbol{y}}^{\left( i \right)}} = 1\left| {{\mathit{\boldsymbol{x}}^{\left( i \right)}};\mathit{\boldsymbol{\theta }}} \right.} \right)\\ p\left( {{\mathit{\boldsymbol{y}}^{\left( i \right)}} = 2\left| {{\mathit{\boldsymbol{x}}^{\left( i \right)}};e} \right.} \right)\\ \;\;\;\;\;\;\;\;\;\; \vdots \\ p\left( {{\mathit{\boldsymbol{y}}^{\left( i \right)}} = k\left| {{\mathit{\boldsymbol{x}}^{\left( i \right)}};\mathit{\boldsymbol{\theta }}} \right.} \right) \end{array} \right] = \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\frac{1}{{\sum\limits_{j = 1}^k {\exp \left( {{\mathit{\boldsymbol{\theta }} ^\rm{T}}_j{\boldsymbol{x}^{\left( i \right)}}} \right)} }}\left[\begin{array}{l} {\rm{exp}}\left( {\mathit{\boldsymbol{\theta }}_1^{\rm{T}}{\mathit{\boldsymbol{x}}^{\left( i \right)}}} \right)\\ {\rm{exp}}\left( {\mathit{\boldsymbol{\theta }}_2^{\rm{T}}{\mathit{\boldsymbol{x}}^{\left( i \right)}}} \right)\\ \;\;\;\;\;\;\; \vdots \\ {\rm{exp}}\left( {\mathit{\boldsymbol{\theta }}_k^{\rm{T}}{\mathit{\boldsymbol{x}}^{\left( i \right)}}} \right) \end{array} \right] \end{array}$ (1)

 $\begin{array}{l} J\left( \mathit{\boldsymbol{\theta }} \right) =- \frac{1}{m}\left[{\sum\limits_{i = 1}^m {\sum\limits_{j = 1}^k {1\left\{ {{\mathit{\boldsymbol{y}}^{\left( i \right)}} = j} \right\}{\rm{log}}\frac{{{\rm{exp}}\left( {\mathit{\boldsymbol{\theta }}_j^{\rm{T}}{\mathit{\boldsymbol{x}}^{\left( i \right)}}} \right)}}{{\sum\limits_{l = 1}^k {{\rm{exp}}\left( {\mathit{\boldsymbol{\theta }}_l^{\rm{T}}{\mathit{\boldsymbol{x}}^{\left( i \right)}}} \right)} }}} } } \right]{\rm{ + }}\\ \;\;\;\;\;\;\;\;\;\;\frac{\lambda }{2}\sum\limits_{i = 1}^k {\sum\limits_{j = 0}^n {\mathit{\boldsymbol{\theta }}_{ij}^2} } \end{array}$ (2)

2 基于正则化的潜语义主题加强

 图 1 基于潜语义主题加强的跨媒体检索算法 Figure 1 Latent semantic topic reinforce cross-media retrieval

 $\boldsymbol{H}:{\boldsymbol{x}_i} \to {\boldsymbol{t}_i}$ (3)

H为一个线性转换矩阵:

 $\boldsymbol{T} = \boldsymbol{XH}$ (4)

 $\left( \begin{array}{l} {\boldsymbol{t}_1}^T\\ {\boldsymbol{t}_2}^T\\ \vdots \\ {\boldsymbol{t}_n}^T \end{array} \right) = \left( \begin{array}{l} {\boldsymbol{x}_1}^T\\ {\boldsymbol{x}_2}^T\\ \vdots \\ {\boldsymbol{x}_n}^T \end{array} \right)\left[{{\boldsymbol{h}_1}, {\boldsymbol{h}_2}, \ldots, {\boldsymbol{h}_k}} \right]$ (5)

 ${\boldsymbol{x}_i}^T{\boldsymbol{h}_k} \ge 0, \forall i = 1, 2, \ldots, N{\rm{;}}\forall k = 1, 2, \ldots, K$ (6)
 $\sum {{\boldsymbol{x}_i}^{\rm{T}}\boldsymbol{H} = 1, \forall i = 1, 2, \ldots, K}$ (7)

 $\boldsymbol{b} = \boldsymbol{Mx}$ (8)

 $\left[{\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{t}}_1}}\\ {{\mathit{\boldsymbol{t}}_2}}\\ \vdots \\ {{\mathit{\boldsymbol{t}}_N}} \end{array}} \right] = \left[{\begin{array}{*{20}{c}} {\mathit{\boldsymbol{x}}_1^{\rm{T}}}&0& \cdots &0\\ 0&{\mathit{\boldsymbol{x}}_1^{\rm{T}}}& \cdots &0\\ \vdots &{\; \vdots }&{}&{\; \vdots }\\ 0&0& \cdots &{\mathit{\boldsymbol{x}}_1^{\rm{T}}}\\ {\mathit{\boldsymbol{x}}_2^{\rm{T}}}&0& \cdots &0\\ {\; \vdots }&{\; \vdots }&{}&{\; \vdots }\\ 0&0& \cdots &{\mathit{\boldsymbol{x}}_N^{\rm{T}}} \end{array}} \right]\left[{\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{h}}_1}}\\ {{\mathit{\boldsymbol{h}}_2}}\\ \vdots \\ {{\mathit{\boldsymbol{h}}_L}} \end{array}} \right]$ (9)

 $\mathit{\boldsymbol{S}} = \left[{\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{x}}_1}^T}&{{\mathit{\boldsymbol{x}}_1}^T}& \cdots &{{\mathit{\boldsymbol{x}}_1}^T}\\ {{\mathit{\boldsymbol{x}}_2}^T}&{{\mathit{\boldsymbol{x}}_2}^T}& \cdots &{{\mathit{\boldsymbol{x}}_2}^T}\\ \vdots & \vdots &{}& \vdots \\ {{\mathit{\boldsymbol{x}}_N}^T}&{{\mathit{\boldsymbol{x}}_N}^T}& \cdots &{{\mathit{\boldsymbol{x}}_N}^T} \end{array}} \right]$ (10)

 ${\mathit{\boldsymbol{x}}^*} = \mathop {{\rm{arg}}\;{\rm{min}}}\limits_\mathit{\boldsymbol{x}} {\left\| {\mathit{\boldsymbol{Mx}}-\mathit{\boldsymbol{b}}} \right\|_2}^2$ (11)
 ${\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\;\;\mathit{\boldsymbol{Mx}} \ge \mathit{\boldsymbol{0}}{\rm{ ; }}\mathit{\boldsymbol{Sx}} = \mathit{\boldsymbol{1}}$

1) 根据式 (1)、(2) 求解得到图像和文本的潜语义主题。

2) 对每一个类别 (i=1, 2, …, L) 求解:

${\mathit{\boldsymbol{x}}^*}=\mathop {{\rm{argmin}}}\limits_\mathit{\boldsymbol{x}} {\left\| {\mathit{\boldsymbol{Mx}}-\mathit{\boldsymbol{b}}} \right\|_2}^2$

${\rm{s}}{\rm{.t}}{\rm{.}}\; \; \; \mathit{\boldsymbol{Mx}} \ge \mathit{\boldsymbol{0}}{\rm{; }}\mathit{\boldsymbol{Sx}}=\mathit{\boldsymbol{1}}$

3 实验分析 3.1 实验数据集和数据表示

3.2 度量标准

 $\begin{array}{l} {\rho _{x, y}} = \frac{{{\mathop{\rm cov}} \left( {\mathit{\boldsymbol{X}}, Y} \right)}}{{{\sigma _\mathit{\boldsymbol{X}}}{\sigma _Y}}} = \\ \;\;\;\;\;\;\;\;\;\frac{{E\left( {\left( {\mathit{\boldsymbol{X}}-{\mu _\mathit{\boldsymbol{X}}}} \right)\left( {Y-{\mu _Y}} \right)} \right)}}{{\sqrt {E\left( {{\mathit{\boldsymbol{X}}^2}} \right)-{E^2}\left( \mathit{\boldsymbol{X}} \right)\sqrt {E\left( {{Y^2}} \right) - {E^2}\left( Y \right)} } }} \end{array}$
3.3 实验结果的评价

 $AP = \frac{1}{L}\sum\limits_{r = 1}^R {prec\left( r \right)\delta \left( r \right)}$

3.4 实验结果与分析

 图 2 不同类别样例的平均查准率 (图像检索文本) Figure 2 mAP for different classes (retrieving text with image)
 图 3 不同类别样例的平均查准率 (文本检索图像) Figure 3 mAP for different classes (retrieving image with text)
 图 4 不同类别样例的平均查准率 Figure 4 mAP for different classes

 图 5 图像检索文本的准确率-召回率曲线 Figure 5 Precision-recall curves of retrieving text with image
 图 6 文本检索图像的准确率-召回率曲线 Figure 6 Precision-recall curves of retrieving image with text

4 结语

