Analysis and Comparison of Various Normalization Methods on Microarray Data of MiRNA

Li-yun HOU; Xu ZHANG; Zhen WU

doi:10.13718/j.cnki.xsxb.2020.05.016

Message Board

Dear readers, authors and reviewers,you can add a message on this page. We will reply to you as soon as possible!

Detecting the level of miRNA in cells with microarray has become a widely used technology. There are many normalization methods for microarray of miRNA. Different normalization methods have different effects on microarray data of miRNA. In this paper, six normalization methods for microarray data of Agilent platform have been studied, including global normalization, locally weighted regression method, quantile normalization, trimmed mean method, variance stabilizing normalization and scale normalization. And the distribution changes of miRNA microarray data have been presented and compared before and after normalization by drawing MA plots and box plots. The six normalization methods have also been evaluated by Kolmogorov-Smirnov statistic and mean square error. The result shows that the locally weighted regression method and quantile normalization method are better than other methods for miRNA microarray data, and the locally weighted regression method is the best.

Corresponding author: Xu ZHANG ;

Received Date: 14/03/2019

Available Online: 20/05/2020

Key words:

Abstract: Detecting the level of miRNA in cells with microarray has become a widely used technology. There are many normalization methods for microarray of miRNA. Different normalization methods have different effects on microarray data of miRNA. In this paper, six normalization methods for microarray data of Agilent platform have been studied, including global normalization, locally weighted regression method, quantile normalization, trimmed mean method, variance stabilizing normalization and scale normalization. And the distribution changes of miRNA microarray data have been presented and compared before and after normalization by drawing MA plots and box plots. The six normalization methods have also been evaluated by Kolmogorov-Smirnov statistic and mean square error. The result shows that the locally weighted regression method and quantile normalization method are better than other methods for miRNA microarray data, and the locally weighted regression method is the best.

HTML

广泛存在于真核细胞中的miRNA是一类长度约为18-25个核苷酸非编码的单链RNA分子，且在调控基因表达、细胞周期、生物体发育等方面起重要作用^[1-2].为了更好研究miRNA与癌症的关系，需要对miRNA微阵列数据进行统计分析，而数据归一化是进行统计分析的必要步骤.由于miRNA微阵列数据存在系统误差，所以进行归一化处理的目的就是减小系统误差.本文就与胃癌相关的miRNA微阵列数据比较了6种不同的归一化方法.通过绘制MA图与箱线图来比较归一化方法对数据分布情况的影响，并且使用K-S检验和均方误差来综合衡量每种归一化方法.

1. 材料与方法

本文数据取自于NCBI(National Center for Biotechnology Information：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28700)^[3]的GEO数据库中与胃癌有关的数据，其中包括22个正常样本和22个胃癌样本.实验组为胃癌样本，对照组为正常样本.把实验组miRNA的表达量用R_i(i=1，2，3，…，556)表示，对照组miRNA的表达量用G_i表示.把对数强度比，即log₂(R_i/G_i)，记为M_i；把平均对数强度，即log₂ $\sqrt{R_{i}G_{i} }$，记为A_i.本文在RStudio中进行相关操作.

我们将比较全局归一化^[4]、局部加权回归方法^[5]、分位数归一化^[6]、修正均值归一化^[7]、方差稳定归一化^[8]以及尺度归一化^[9]对miRNA微阵列数据的影响.本文使用MA图来比较6种归一化方法对数据分布的影响. MA图可以清楚看到系统误差的大小.如果MA图中的纵坐标M值集中分布在M=0附近，说明数据之间的差异较小^[10].箱线图是利用数据中的5个统计量：最小值、第一、四分位数、中位数、第三、四分位数与最大值来描述数据的一种方法.我们可以从箱线图中粗略地看出数据是否具有对称性与分布的集中或离散等信息.

本文用K-S检验和均方误差来验证6种归一化方法的优良性. K-S检验是一种拟合优度检验. K-S统计量的值越小，归一化效果越好^[11-12].均方误差是偏差平方与方差之和.较小的方差和偏差值表示更好的归一化，即均方误差越小，表明归一化方法的效果越好^[13].

3. 讨论

数据归一化是miRNA微阵列数据分析中的一个关键步骤，而且miRNA微阵列数据对归一化方法的选择可能与miRNA表达的特点有关.为了探究适合miRNA微阵列数据的归一化方法，我们以与胃癌相关的数据GSE28700为例，比较了6种归一化方法对数据的影响.本文使用MA图和箱线图分别来比较归一化前后的数据分布情况，还使用了K-S检验和均方误差来衡量不同归一化方法的优良性.综合比较6种归一化方法的K-S统计量值和均方误差值发现：对于miRNA微阵列数据，局部加权回归方法的归一化效果最好，其次是分位数归一化方法.

Figure (5) Reference (13)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

[1]	PRADERVAND S, WEBER J, THOMASJ, et al.Impact of Normalization on miRNA Microarray Expression Profiling[J].RNA, 2009, 15(3):493-501. Google Scholar
[2]	胡建刚, 何俊琳, 黎刚, 等.不明原因复发性自然流产胚胎绒毛组织MiRNAs的表达研究[J].西南大学学报(自然科学版), 2010, 32(12):56-62. Google Scholar
[3]	CHEN C, JUAN H. The National Center for Biotechnology Information[DB/OL].(2018-06-12)[2019-02-20]. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28700. Google Scholar
[4]	SMYTH G K, YANG Y H, SPEED T.Statistical Issues in cDNA Microarray Data Analysis[M]//FunctionalGenomics. NewJersey: HumanaPress, : 111-136. Google Scholar
[5]	CLEVELAND W S.Robust Locally Weighted Regression and Smoothing Scatterplots[J].Journal of the American Statistical Association, 1979, 74(368):829-836. Google Scholar
[6]	HUBER W, VON HEYDEBRECK A, SULTMANN H, et al.Variance Stabilization Applied to Microarray Data Calibration and to the Quantification of Differential Expression[J].Bioinformatics, 2002, 18(Suppl 1):S96-S104. Google Scholar
[7]	PEREIRA M B, WALLROTH M, JONSSON V, et al.Comparison of Normalization Methods for the Analysis of Metagenomic Gene Abundance Data[J].BMC Genomics, 2018, 19:274. Google Scholar
[8]	HUBER W, VON HEYDEBRECK A, SUELTMANN H, et al.Parameter Estimation for the Calibration and Variance Stabilization of Microarray Data[J].Statistical Applications in Genetics and Molecular Biology, 2003, 2(1):1-22. Google Scholar
[9]	SMYTH G K, SPEED T.Normalization of cDNA Microarray Data[J].Methods, 2003, 31(4):265-273. Google Scholar
[10]	QUACKENBUSH J.Microarray Data Normalization and Transformation[J].Nature Genetics, 2002, 32(S4):496-501. Google Scholar
[11]	WILSOND L, BUCKLEYMJ, HELLIWELLCA, et al.New Normalization Methods forcDNA Microarray Data[J].Bioinformatics, 2003, 19(11):1325-1332. Google Scholar
[12]	ZHAO Y D, WANG E N, LIU H, et al.Evaluation of Normalization Methods for Two-Channel microRNA Microarrays[J].Journal of Translational Medicine, 2010, 8(1):69. Google Scholar
[13]	XIONG H L, ZHANG D P, MARTYNIUK C J, et al.Using Generalized Procrustes Analysis (GPA) for Normalization of cDNA Microarray Data[J].BMC Bioinformatics, 2008, 9(1):25. Google Scholar

Message Board

Analysis and Comparison of Various Normalization Methods on Microarray Data of MiRNA

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors