﻿ 基于神经网络和粒子群算法的遗传位点与患病信息的关联性分析
 文章快速检索 高级检索
 北京化工大学学报(自然科学版)  2018, Vol. 45 Issue (1): 97-102  DOI: 10.13543/j.bhxbzr.2018.01.016 0

### 引用本文

LI Jie, LI ZhiQiang, LIU Xiao, YAN BaiLu. Correlation analysis of genetic site and disease information based on neural networks and particle swarm optimization[J]. Journal of Beijing University of Chemical Technology (Natural Science), 2018, 45(1): 97-102. DOI: 10.13543/j.bhxbzr.2018.01.016.

### 文章历史

1. 北京化工大学 经济管理学院, 北京 100029;
2. 北京化工大学 理学院, 北京 100029

Correlation analysis of genetic site and disease information based on neural networks and particle swarm optimization
LI Jie 1, LI ZhiQiang 2, LIU Xiao 1, YAN BaiLu 2
1. School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China;
2. Faculty of Science, Beijing University of Chemical Technology, Beijing 100029, China
Abstract: The method of screening the most powerful loci combinations has been studied under consideration of the interactions between loci when genetic diseases are associated with these genetic loci. In this paper, the prediction accuracy based on neural networks is taken as the evaluation criterion to find the optimal combination of loci by the particle swarm algorithm through iterative approximation. Compared with the weight analysis method, this method has higher accuracy, and has a good recognition effect for a disease, and can thus provide a reference for disease diagnosis.
Key words: genetic locus    interaction    particle swarm optimization (PSO)    neural network

1 基于粒子群算法的位点组合选取 1.1 基于粒子群算法的迭代计算

 $v_{ij}^{k + 1} = \omega v_{ij}^k + {c_1}{r_1}\left( {P_{ij}^k-s_{ij}^k} \right) + {c_2}{r_2}\left( {P_{gj}^k-s_{ij}^k} \right)$ (1)

 ${S_{ij}} = \left\{ \begin{array}{l} 1\;\;\;\;\;\;\;R < f\left( {{v_{ij}}} \right)\\ 0\;\;\;\;\;\;\;{\rm{others}} \end{array} \right.$ (2)

1.2 基于神经网络的适应度计算

 ${\mathit{\boldsymbol{X}}_p} = {\left( {x_1^p, x_2^p, \cdots, x_l^p} \right)^{\rm{T}}}$

 $h_j^p = f\left( {{a_{0j}} + \sum\limits_{i = 1}^l {{\omega _{ij}}x_i^p} } \right), j = 1, 2, \cdots, L$

BP神经网络第p个样本输出层第k个结点值为

 $y_k^p = \sum\limits_{j = 1}^L {{\omega _{jk}}h_j^p, } k = 1, 2, \cdots, N$ (3)

1.3 算法构建

(1) 利用两点分布B(1, 0.5)生成n个粒子的每个分量，建立BP神经网络；对每个粒子，以粒子中所有位置为1的位点作为输入值，以预测样本的正确率作为适应度函数评估各粒子，记录第i个粒子的个体最优值并作为当前粒子适应度，全局最优值为适应度值最大的粒子。

(2) 根据PSO算法的公式(1)和(2)更新n个粒子的速度和位置，产生新的一组粒子，建立BP神经网络；将每个粒子以粒子中所有位置为1的位点作为输入值，以预测样本的正确率为适应度函数来评估各粒子。比较当前第i个粒子和个体最优的适应度函数值，将其中具有较大适应度函数值的粒子作为第i个粒子的个体最优；比较所有粒子和全局最优的适应度函数值，将具有最大适应度函数值的粒子作为全局最优。

(3) 判断是否满足停止准则，若满足，则将全局最优输出，结束；若不满足，则返回步骤(2)。

(4) 重复步骤(1)~(3)，可以得到多个位点组合。将出现在位点组合中次数最多的部分位点作为具有较好预测效果的位点组合。

2 实验及结果分析 2.1 数据来源

2.2 初步筛选步骤 2.2.1 皮尔逊卡方检验

 ${\chi ^2} = \sum\limits_{i, j} {\frac{{{{\left( {{t_{ij}}-{f_i}{q_j}/1000} \right)}^2}}}{{{f_i}{q_j}/1000}}}$ (4)

2.2.2 Logistic回归

 ${x_{2i-1}} = \left\{ \begin{array}{l} 1\;\;\;AA\\ 0\;\;\;{\rm{other}} \end{array} \right.$
 ${x_{2i}} = \left\{ \begin{array}{l} 1\;\;\;\;GG\\ 0\;\;\;\;{\rm{other}} \end{array} \right.$

 $p = \frac{{\exp \left( {{\beta _0} + {\beta _1}{x_1} + {\beta _2}{x_2} + \cdots + {\beta _{2m}}{x_{2m}}} \right)}}{{1 + \exp \left( {{\beta _0} + {\beta _1}{x_1} + {\beta _2}{x_2} + \cdots + {\beta _{2m}}{x_{2m}}} \right)}}$ (5)

 $T = \frac{{{{\hat \beta }_i}-{\beta _i}}}{{{\sigma _{{{\hat \beta }_i}}}}} \sim N\left( {0, 1} \right)$ (6)

 $p = P\left\{ {\left| T \right| > t} \right\}$

2.3 最优位点组合求解

2.4 结果对比及分析

 图 1 预测样本中真实值与预测值对比图 Fig.1 A comparison of the true and predicted values for the predictiv sample

3 结束语

 [1] Taylor K C, Evans D S, Edwards D R V, et al. A genome-wide association study meta-analysis of clinical fracture in 10, 012 African American women[J]. Bone Reports, 2016, 5: 233-242. DOI:10.1016/j.bonr.2016.08.005 (in Chinese) [2] 赵冀, 周超, 邓小凡, 等. microRNA-137基因与原发性肝细胞癌患病风险和手术治疗预后分析[J]. 中国临床研究, 2016, 29(7): 880-883. Zhao J, Zhou C, Deng X F, et al. Association of the micro RNA-137 gene with morbid risk and surgical treatment prognosis for primary hepatocellular carcinoma[J]. Chinese Journal of Clinical Research, 2016, 29(7): 880-883. (in Chinese) [3] Nikolic Z, Savic P D, Vucic N, et al. Assessment of association between genetic variants in microRNA genes hsa-miR-499, hsa-miR-196a2 and hsa-miR-27a and prostate cancer risk in Serbian population[J]. Experimental & Molecular Pathology, 2015, 99(1): 145-150. (in Chinese) [4] 杨亮, 李涌涛, 齐新, 等. CYP1B1基因rs1056836位点多态性与新疆维吾尔族乳腺癌易感性的研究[J]. 临床肿瘤学杂志, 2014, 19(8): 728-733. Yang L, Li Y T, Qi X, et al. The relationship between the polymorphism in CYP1B1 gene rs1056836 and the susceptibility to breast cancer in Xinjiang Uygur women[J]. Chinese Clinical Oncology, 2014, 19(8): 728-733. (in Chinese) [5] Falk C T, Gilchrist J M, Pericak-Vance M A, et al. Using neural networks as an aid in the determination of disease status:comparison of clinical diagnosis to neural-network predictions in a pedigree with autosomal dominant limb-girdle muscular dystrophy[J]. American Journal of Human Genetics, 1998, 62(4): 941-949. DOI:10.1086/301780 (in Chinese) [6] 杜文聪, 陆莹, 叶新华, 等. 应用BP人工神经网络探讨脂联素基因多态性位点间交互作用与汉族人群2型糖尿病遗传易感性的关系[J]. 中国糖尿病杂志, 2012, 20(1): 20-23. Du W C, Lu Y, Ye X H, et al. Association between adiponectin (APN) gene polymorphism locus interacts and type 2 diabetes risk in a Chinese Han population studied by BPANN[J]. Chinese Journal of Diabetes, 2012, 20(1): 20-23. (in Chinese) [7] Wu X S, Jin L, Xiong M M, Composite measure of linkage disequilibrium for testing interaction between unlinked loci[J]. Eur J Hum Genet, 2008, 16(5): 644-651. DOI:10.1038/sj.ejhg.5202004 (in Chinese) [8] 徐静. 基于得分检验的整体基因间共关联作用统计方法研究[D]. 济南: 山东大学, 2016. Xu J. Statistical method study for detecting co-association of whole genes based on score test[D]. Jinan: Shandong University, 2016. (in Chinese) [9] Eichler E E, Flint J, Gibson G, et al. Missing heritability and strategies for finding the underlying causes of complex disease[J]. Nature Reviews Genetics, 2010, 11(6): 446-450. DOI:10.1038/nrg2809 (in Chinese) [10] 李芳玉. 多数量性状的整体基因间交互作用统计推断方法研究[D]. 济南: 山东大学, 2014. Li F Y. Statistical methods for detecting gene-based gene-gene interaction on multiple quantitative traits[D]. Jinan: Shandong University, 2014. (in Chinese) [11] 彭倩倩. 群体病例对照研究设计的整体基因关联分析统计推断方法研究[D]. 济南: 山东大学, 2009. Peng Q Q. Whole-gene-statistical-method research for population-based case-control study[D]. Jinan: Shandong University, 2009. (in Chinese) [12] Schaid D J, McDonnell S K, Hebbring S J, et al. Nonparametric tests of association of multiple genes with human disease[J]. Am J Hum Genet, 2005, 76(5): 780-793. DOI:10.1086/429838 (in Chinese) [13] Xu S H, Mu X D, Chai D, et al. Multi-objective quantum-behaved particle swarm optimization algorithm with double-potential well and share-learning[J]. Optik-International Journal for Light and Electron Optics, 2016, 127(12): 4921-4927. DOI:10.1016/j.ijleo.2016.02.049 (in Chinese) [14] Karami A, Guerrero-Zapata M, A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in named data networking[J]. Neuro-computing, 2015, 151: 1262-1282. (in Chinese) [15] 吕思晨. 基于遗传和粒子群搜索的SNP关联分析算法[D]. 西安: 西安电子科技大学, 2014. Lv S C. SNP association study by genetic particle swarm optimization[D]. Xi'an: Xidian University, 2014. (in Chinese) http://cdmd.cnki.com.cn/Article/CDMD-10701-1015437567.htm