-
乳腺癌是常见的妇科恶性肿瘤之一,其诊断、治疗策略和预后往往基于其病理分型.近年来研究表明,传统的病理分型对新开展的治疗方法并没有提供有效的指导,而大量乳腺癌分子生物学特征的数据表明可将乳腺癌分为管腔上皮A型(LuminalA型)、管腔上皮B型(LuminalB型)包括LuminalB1型、LuminalB2型、HER2过表达型及基底样型(三阴性型、正常乳腺样细胞型)[1].不同的类型其预后以及辅助治疗的效果都有明显差异,因此识别不同的乳腺癌亚型显得尤为重要.
近年来实验发现,用免疫组化方法检测药理学标记物(ER,PR,Ki67,HER2)的表达,可将乳腺癌患者进行亚型分类,但实验费用昂贵,实际操作也比较困难,目前只能局限于实验室,因此很难广泛应用于临床.随着计算机技术的发展,计算机技术不断地应用于生物信息和医疗领域,取得了显著的成果,如Osamu Gotoh等提出的快速矩阵检测技术可以比较有效地对基因进行检测[2].本文将数据挖掘技术应用到乳腺癌识别中来,通过建立不同的分类器,即随机森林、支持向量机和k近邻算法,可以快速地识别乳腺癌不同亚型.另外随机森林还可以找出乳腺癌不同亚型的高风险基因,从而对不同亚型的病人实施针对性治疗.
An Identification Method for Breast Cancer Subtypes Based on Data Mining Technology
-
摘要: 随机森林算法可对特征进行重要性排序,并能提高运行效率和分类的准确率.采用方差分析、随机森林算法对乳腺癌基因进行筛选,使得用随机森林算法、支持向量机算法和k近邻算法测试集的准确率分别达到95.6%,92.9%和92.7%,并发现了区分乳腺癌不同亚型的两种最重要的基因GATA3和ESR1.Abstract: The random forest algorithm can rank features in accordance with their importance and improve the efficiency of operation and the accuracy of classification. In a study reported herein, variance analysis and the random forest algorithm were used to select the characteristics of breast cancer, and the accuracy rate of the random forest algorithm, the CVM (support vector machine) algorithm and the KNN (k-nearest neighbor) algorithm were 95.6%, 92.9% and 92.7%, respectively. Two most important genes, GATA3 and ESR1, were discovered, which can distinguish different subtypes of breast cancer.
-
Key words:
- data mining /
- microarray /
- breast cancer /
- classification .
-
[1] 孙尚韶, 王玉玺, 梁品.乳腺癌分子亚型分类及其与新辅助治疗的关系[J].中国肿瘤外科杂志, 2011(3): 369-371. doi: http://mall.cnki.net/magazine/Article/ZLWK201106016.htm [2] BURSTEIN H J, ELIAS A D, RUGO H S, et al. Phase Ⅱ Study of Sunitinib Malate, an Oral Multitargeted Tyrosine Kinase Inhibitor, in Patients with Metastatic Breast Cancer Previously Treated with an Anthracycline and a Taxane[J]. Clin Oncol, 2008, 26(11): 1810-1816. doi: 10.1200/JCO.2007.14.5375 [3] GONZALEZ-ROIBON N, FARAJ S F, MUNARI E, et al. Comprehensive Profile of GATA Binding Protein 3 Immunohistochemical Expression in Primary and Metastatic Renal Neoplasms[J]. Hum Pathol, 2014, 45(2): 244-248. doi: 10.1016/j.humpath.2013.08.020 [4] LI Y, ISHIGURO H, KAWAHARA T, et al. Loss of GATA3 in Bladder Cancer Promotes Cell Migration and Invasion[J]. Cancer Bio &Therapy, 2014, 15(4): 428-435. [5] MIETTINEN M, MCCUE P A, SARLOMO-RIKALA M, et al. GATA3: a Multispecific but Potentially Useful Marker in Surgical Pathology: a Systematic Analysis of 2500 Epithelial and Nonepithelial Tumors[J]. Am J Surg Pathol, 2014, 38(1): 13-22. doi: 10.1097/PAS.0b013e3182a0218f [6] NELSON G. Value of GATA3Immunostaining in Tumor Diagnosis: a Review[J]. Anat Pathol, 2013, 20(5): 352-360. doi: 10.1097/PAP.0b013e3182a28a68 [7] 王冬青, 李月峰, 罗一烽, 等.抑郁症患者杏仁核、海马形态功能变化的MR研究[J].中华放射学杂志, 2011, 45(7): 623-627. doi: http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=zhfsx201107003 [8] DOWLATI Y, HERRMANN N, SWARDFAGER W, et al. A Meta-Analysis of Cytokines in Major Depression[J]. Biol Psychiatry, 2010, 67(5): 446-457. doi: 10.1016/j.biopsych.2009.09.033 [9] SEDLACIK J, HELM K, RAUSCHER A, et al. Investigations on the Effect of Caffeine on Cerebral Venous Vessel Contrast by Using Susceptibility-Weighted Imaging(SWI) at 1.5, 3 and 7 T[J]. Neuroimage, 2008, 40(1): 11-18. doi: 10.1016/j.neuroimage.2007.11.046 [10] CLARK B Z, BERIWAL S, DABBS D J, et al. Semiquantitative GATA3 Immunoreactivity in Breast, Bladder, Gyneologic Tract, and Other Cytokeratin 7-Positive Carcinomas[J]. Am J Clin Pathol, 2014, 142(1): 64-71. doi: 10.1309/AJCP8H2VBDSCIOBF [11] 程凯, 周晓碟, 余波, 等.乳腺肿瘤组织中GATA3的表达及临床意义[J].临床与实验病理学杂志, 2015, 31(7): 725-728. doi: http://med.wanfangdata.com.cn/Paper/Detail/PeriodicalPaper_lcysyblxzz201507003 [12] 张凤春, 徐迎春, 王红霞, 等.雌激素受体基因ESR1多态性与乳腺癌易感性的关系[J].现代肿瘤医学, 2011, 19(9): 1706-1708. doi: http://med.wanfangdata.com.cn/Paper/Detail/PeriodicalPaper_bqeykdxxb200406021 [13] DING S L, YU J C, CHEN S T. Diverse Associations Between ESR1 Polymorphism and Breast Cancer Development and Progression[J]. Clin Cancer Res, 2010, 16(13): 3473-3484. doi: 10.1158/1078-0432.CCR-09-3092 [14] FARMER P, BONNEFOI H, BECETTE V, et al. Identication of Molecular Apocrine Breast Tumours by Microarray Analysis[J]. Oncogene, 2005, 24(29): 4660-4671. doi: 10.1038/sj.onc.1208561 [15] 贺汝燕. 基因在不同人肿瘤细胞中的表达及辐射对其的诱导效应[D]. 苏州: 苏州大学, 2012.