﻿ 基于神经网络和粒子群算法的遗传位点与患病信息的关联性分析
 北京化工大学学报(自然科学版)  2018, Vol. 45 Issue (1): 97-102  DOI: 10.13543/j.bhxbzr.2018.01.016

引用本文

LI Jie, LI ZhiQiang, LIU Xiao, YAN BaiLu. Correlation analysis of genetic site and disease information based on neural networks and particle swarm optimization[J]. Journal of Beijing University of Chemical Technology (Natural Science), 2018, 45(1): 97-102. DOI: 10.13543/j.bhxbzr.2018.01.016.

文章历史

1. 北京化工大学 经济管理学院, 北京 100029;
2. 北京化工大学 理学院, 北京 100029

Correlation analysis of genetic site and disease information based on neural networks and particle swarm optimization
LI Jie 1, LI ZhiQiang 2, LIU Xiao 1, YAN BaiLu 2
1. School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China;
2. Faculty of Science, Beijing University of Chemical Technology, Beijing 100029, China
Abstract: The method of screening the most powerful loci combinations has been studied under consideration of the interactions between loci when genetic diseases are associated with these genetic loci. In this paper, the prediction accuracy based on neural networks is taken as the evaluation criterion to find the optimal combination of loci by the particle swarm algorithm through iterative approximation. Compared with the weight analysis method, this method has higher accuracy, and has a good recognition effect for a disease, and can thus provide a reference for disease diagnosis.
Key words: genetic locus    interaction    particle swarm optimization (PSO)    neural network

1 基于粒子群算法的位点组合选取 1.1 基于粒子群算法的迭代计算

 $v_{ij}^{k + 1} = \omega v_{ij}^k + {c_1}{r_1}\left( {P_{ij}^k-s_{ij}^k} \right) + {c_2}{r_2}\left( {P_{gj}^k-s_{ij}^k} \right)$ (1)

 ${S_{ij}} = \left\{ \begin{array}{l} 1\;\;\;\;\;\;\;R < f\left( {{v_{ij}}} \right)\\ 0\;\;\;\;\;\;\;{\rm{others}} \end{array} \right.$ (2)

1.2 基于神经网络的适应度计算

 ${\mathit{\boldsymbol{X}}_p} = {\left( {x_1^p, x_2^p, \cdots, x_l^p} \right)^{\rm{T}}}$

 $h_j^p = f\left( {{a_{0j}} + \sum\limits_{i = 1}^l {{\omega _{ij}}x_i^p} } \right), j = 1, 2, \cdots, L$

BP神经网络第p个样本输出层第k个结点值为

 $y_k^p = \sum\limits_{j = 1}^L {{\omega _{jk}}h_j^p, } k = 1, 2, \cdots, N$ (3)

1.3 算法构建

(1) 利用两点分布B(1, 0.5)生成n个粒子的每个分量，建立BP神经网络；对每个粒子，以粒子中所有位置为1的位点作为输入值，以预测样本的正确率作为适应度函数评估各粒子，记录第i个粒子的个体最优值并作为当前粒子适应度，全局最优值为适应度值最大的粒子。

(2) 根据PSO算法的公式(1)和(2)更新n个粒子的速度和位置，产生新的一组粒子，建立BP神经网络；将每个粒子以粒子中所有位置为1的位点作为输入值，以预测样本的正确率为适应度函数来评估各粒子。比较当前第i个粒子和个体最优的适应度函数值，将其中具有较大适应度函数值的粒子作为第i个粒子的个体最优；比较所有粒子和全局最优的适应度函数值，将具有最大适应度函数值的粒子作为全局最优。

(3) 判断是否满足停止准则，若满足，则将全局最优输出，结束；若不满足，则返回步骤(2)。

(4) 重复步骤(1)~(3)，可以得到多个位点组合。将出现在位点组合中次数最多的部分位点作为具有较好预测效果的位点组合。

2 实验及结果分析 2.1 数据来源

2.2 初步筛选步骤 2.2.1 皮尔逊卡方检验

 ${\chi ^2} = \sum\limits_{i, j} {\frac{{{{\left( {{t_{ij}}-{f_i}{q_j}/1000} \right)}^2}}}{{{f_i}{q_j}/1000}}}$ (4)

2.2.2 Logistic回归

 ${x_{2i-1}} = \left\{ \begin{array}{l} 1\;\;\;AA\\ 0\;\;\;{\rm{other}} \end{array} \right.$
 ${x_{2i}} = \left\{ \begin{array}{l} 1\;\;\;\;GG\\ 0\;\;\;\;{\rm{other}} \end{array} \right.$

 $p = \frac{{\exp \left( {{\beta _0} + {\beta _1}{x_1} + {\beta _2}{x_2} + \cdots + {\beta _{2m}}{x_{2m}}} \right)}}{{1 + \exp \left( {{\beta _0} + {\beta _1}{x_1} + {\beta _2}{x_2} + \cdots + {\beta _{2m}}{x_{2m}}} \right)}}$ (5)

 $T = \frac{{{{\hat \beta }_i}-{\beta _i}}}{{{\sigma _{{{\hat \beta }_i}}}}} \sim N\left( {0, 1} \right)$ (6)

 $p = P\left\{ {\left| T \right| > t} \right\}$

2.3 最优位点组合求解

2.4 结果对比及分析

 图 1 预测样本中真实值与预测值对比图 Fig.1 A comparison of the true and predicted values for the predictiv sample

3 结束语

