The Study of the Adaptive Elastic Net Method in the Variable Selection of the Cox Model

Xin-xing WEI; Chun-hong LI; Hong-shuai DAI

doi:10.13718/j.cnki.xdzk.2017.09.013

In this paper, we study the adaptive elastic net method applied in variable selection of the Cox model. We prove the grouping effect property of its estimators under certain conditions. Finally, we show the grouping effect property by a numerical simulation and a real case, demonstrating that for the Cox model, the adaptive elastic net method performs better than the Lasso method, the adaptive Lasso method and the elastic net method.

HTML

Cox模型^[1]是处理生存数据的一种经典方法，常被广泛运用于医学、生物学、经济学、保险学等众多领域.尽管Cox模型是目前为止最有用的生存分析方法，但它却要求自变量间相互独立，至少不存在强相关的情况.此外，它还要求数据是大n小p类型.于是，经典Cox模型在处理强相关及大p小n问题时，就不再适用了.

Lasso方法为解决此类问题提供了新的思路. 1997年Tibshirani成功将该方法应用于Cox模型^[2]，进一步验证了它的实用性.针对Lasso估计在某些情况下不相合的问题，Zou于2006年提出了具有Oracle性质的Adaptive Lasso方法^[3-5]，很好克服了Lasso的不足.对于存在组效应的数据结构，Zou和Hastie在Lasso的基础上，提出了Elastic Net^[6-8]，防止了模型的过于稀疏，且有效处理了大p小n问题.同样，Elastic Net估计也不具有Oracle性质^[9-10]，而Zou和Zhang在Elastic Net的基础上，对l₁惩罚部分加权，提出了具有Oracle性质的Adaptive Elastic Net方法^[11].

在Cox模型的诸多变量选择方法^{[7, 12]}中，对于存在强相关性的变量，Elastic Net方法较Lasso方法有更好的拟合效果和更高的预测能力，能将强相关变量全部选入或全部剔除模型.但美中不足的是，在模型精确度方面，Elastic Net方法对于零变量的估计却不及Adaptive Elastic Net方法.为此，将Adaptive Elastic Net方法应用于Cox模型的变量选择中，研究在该模型下Adaptive Elastic Net方法的相关性质是一件有意义的工作.

1. Cox模型AEN估计的定义

对于第i个个体，Cox模型的表达式为：

其中，样本容量为n，预测变量个数为p，协变量矩阵为X=(X₁，X₂，…，X_n)，X_i=(x_i1，x_i2，…，x_ip)^T为第i个个体的p个协变量，回归向量为β=(β₁，β₂，…，β_p)^T，h₀(t_i)为第i个个体的基准风险率，i=1，2，…，n.

现记观测数据为(Z_i，δ_i，X_i)，Z_i为第i个个体的研究时间，令h₀(t)恒定，则似然函数^[13]为：

其中：δ_i为示性函数，事件删失时δ_i=0，事件发生时δ_i=1；R_i为t_i时刻个体的风险集；k=1，2，…，p.

于是，借鉴Tibshirani^[2]及Fan^[10]提出的处理思想，极小化偏对数似然函数的相反数并添上适当的惩罚项便可定义Cox模型的Elastic Net估计：

进一步由

可得

其中λ₁和λ₂为调整参数，且满足λ₁≥0，λ₂≥0.

进一步，借鉴普通线性模型中Adaptive Elastic Net估计的定义思想^{[7, 11]}，在(1) 式的基础上，对l₁惩罚部分加权，便可定义Cox模型的Adaptive Elastic Net估计：

其中： ${\hat \omega _k} = {(\left| {{{\hat \beta }_{(EN)k}}} \right|)^{ - \gamma }}$ ，γ为一正常数.

2. Cox模型AEN估计的性质

现研究Cox模型Adaptive Elastic Net估计的组效应性质.

定理1 对Cox模型，给定数据(Z_i，δ_i，X_i)及参数(λ₁^*，λ₂)，响应变量已经中心化且自变量已经标准化.令x_a=(x_1a，x_2a，…，x_na)为n个个体的第a个协变量，x_b=(x_1b，x_2b，…，x_nb)为n个个体的第b个协变量，a，b=1，2，…，p. $\hat \beta (\lambda_1^*,\lambda_2)$ 表示AEN估计，其中 $\hat \beta_a (\lambda_1^*,\lambda_2)$ 和 $\hat \beta_b (\lambda_1^*,\lambda_2)$ 是任意一组强相关变量x_a和x_b的系数.假设 $\hat \beta_a (\lambda_1^*,\lambda_2)\hat \beta_b (\lambda_1^*,\lambda_2) > 0$ .

定义

则

证由于 $\hat \beta_a (\lambda_1^*,\lambda_2)\hat \beta_b (\lambda_1^*,\lambda_2) > 0$ ，故符号函数 $\mathop{\rm sgn} \{\hat \beta_a (\lambda_1^*,\lambda_2)\} = \mathop{\rm sgn} \{\hat \beta_b (\lambda_1^*,\lambda_2)\}$ ，且 $\hat \beta_a (\lambda_1^*,\lambda_2) \ne 0$ ， $\hat \beta_b (\lambda_1^*,\lambda_2) \ne 0$ .

现令 $\hat \beta_m (\lambda_1^*,\lambda_2) \ne 0$ ， $\hat \beta (\lambda_1^*,\lambda_2) $ 满足

其中

则由于 $\hat \beta_a (\lambda_1^*,\lambda_2) \ne 0$ ，故有

成立.

即

于是

同理

将(3)，(4) 式相减得到

由于Cox模型的偏残差^[14-15]为

其中r=1，2，…，p，故(5) 式可变形为

从而

于是，对于强相关变量x_a和x_b，由于x_a和x_b强相关，即E[x_ax_b^T]→1，故对第i个个体，有

从而

由(6)，(8) 式，有

故

由

可知

于是

由(7)，(9)，(10) 式，得到

即

证毕.

D_{λ₁^*，λ₂}(a，b)刻画了两个变量系数估计之间的差距，这表明若x_a和x_b高度相关，则对应的系数估计之间的差距将趋于0.也就是说，Cox模型的AEN估计具有组效应性质，即强相关变量得到的系数估计大致相同.

3. 数值模拟

上节从理论上揭示了Cox模型Adaptive Elastic Net估计的组效应性质.现通过数值模拟加以验证.

设x_i~N(0，1)，i=1，2，…，10，其中x₃=x₂，x₇=x₆， $x_4 = 2x_1 +\frac{1}{3}x_2 +\frac{1}{3}x_3$ .则x₃与x₂强相关，x₇与x₆强相关，且x₄与x₁，x₂及x₃之间存在共线性.考虑Cox模型 $h(t) = h_0(t)\exp(\sum\limits_{i=1}^{10} \beta_i x_i)$ ，t~U[0, 1]，且真实参数为(-1，3，3，0， $\frac{1}{2}$ ，2，2，0，0，0)^T，同时将该模型模拟1 000次，得到n=1 000，p=10的样本数据.

分别运用Lasso方法、Adaptive Lasso(ALasso)方法、Elastic Net(EN)方法及Adaptive Elastic Net(AEN)方法对上述数据进行变量筛选^[16-18]，其中后3种方法的系数估计值可先转化为Lasso方法的形式，再利用Lars算法^[19]得到.取 $\lambda_2 = \frac{1}{3}$ ，γ=3，而其他参数由交叉验证方法^[20]选出，重复计算50次，取系数估计值的平均值，得到的系数估计值见下表 1.

由表 1可知：

1) 对与x₁，x₂，x₃存在共线性的x₄，4种方法均没有将其选入模型，说明Lasso方法、Adaptive Lasso方法、Elastic Net方法及Adaptive Elastic Net方法均能处理共线性问题.

2) 比较Lasso方法和ALasso方法：在对x₈，x₉及x₁₀这3个零变量的处理上，ALasso方法比Lasso方法精确.这体现了Adaptive Lasso方法在零变量的处理方面优于Lasso方法.

3) 比较EN方法和AEN方法：在对变量x₈，x₉及x₁₀这3个零变量的处理上，AEN方法比EN方法精确.这体现了Adaptive Elastic Net方法在零变量的处理方面优于Elastic Net方法.

4) 比较ALasso方法和AEN方法：在对x₂与x₃，x₆与x₇这两组强相关变量的处理上，AEN方法能将强相关变量x₂与x₃，x₆与x₇全部选入模型，且这两组强相关变量的系数估计值相同，而ALasso方法只能选择强相关变量组中的一个变量.这体现了AEN方法具有组效应性质.

4. 实例分析

接下来，我们通过电信客户的实际数据来验证Cox模型Adaptive Elastic Net估计的优越性.

本实例来自对某高校在校大学生手机卡使用情况的调查. x₁，x₂，…，x₁₀分别表示性别、年级、是否学生干部、是否少数民族、是否农业户口、是否生源地就读、是否移动用户、月均电话费用、售后服务质量、月均生活费用10个变量.调查时间从2007年1月开始至2014年1月结束，最终得到380份有效问卷.

对数据进行简单统计分析后发现，大多数变量间存在较高的相关性，故经典Cox模型不再适用.接下来，我们分别将Lasso方法、Adaptive Lasso(ALasso)方法、Elastic Net(EN)方法及Adaptive Elastic Net(AEN)方法运用于Cox模型中，得到的变量选择结果见表 2.

由表 2可知：

1) 4种方法均没有将变量x₁和x₇选入模型，这说明将Adaptive Elastic Net方法运用于Cox模型是可行的.

2) 对于具有较强相关性的x₈与x₁₀，Lasso和ALasso只选择了x₁₀，而EN和AEN则把这两个强相关变量同时选入了模型，这表明Cox模型的Elastic Net方法和Adaptive Elastic Net方法能把强相关变量组中的变量全部选出；此外，在所有系数的估计值中，二者系数的差距最小.这表明Elastic Net方法和Adaptive Elastic Net方法能体现变量间的相关性，且相关系数越大，它们系数估计的差距就越小，这体现了Elastic Net方法和Adaptive Elastic Net方法的组效应性质.

3) 对与手机卡的流失无影响的x₃和x₄，AEN在对这两个零变量的处理上，比EN精确得多，这体现了Cox模型的Adaptive Elastic Net估计在零变量的处理方面优于Elastic Net.

综上，Cox模型的Adaptive Elastic Net估计优于其它3种估计.

5. 结论

Cox模型的Elastic Net方法在处理具有强相关性的生存数据方面，优于Cox模型的Lasso方法.但在模型精确度方面，Elastic Net方法对零变量的估计却不太理想.

为克服这一缺陷，本文将Adaptive Elastic Net方法运用于Cox模型的变量选择中，证明了在一定条件下，Cox模型的Adaptive Elastic Net估计具有组效应性质，即Adaptive Elastic Net方法能将强相关变量全部选入Cox模型.此外，数值模拟和具体实例既验证了其组效应性质，也表明了Cox模型的Adaptive Elastic Net估计对零变量的处理更准确.这表明，Cox模型的Adaptive Elastic Net方法优于其他3种方法.

Table (2) Reference (20)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

[1]	COX D R. Regression Models and Life Tables [J]. Journal of Royal Statistical Society, 1972(34): 187-220. Google Scholar
[2]	TIBSHITANI R. The Lasso Method for Variable Selection in the Cox Model [J]. Statistics in Medicine, 1997, 16(4): 385-395. doi: 10.1002/(ISSN)1097-0258 CrossRef Google Scholar
[3]	ZHANG H H, LU W. Adaptive Lasso for Cox's Proportional Hazards Model [J]. Biometrika, 2007, 94(3): 691-703. doi: 10.1093/biomet/asm037 CrossRef Google Scholar
[4]	ZOU H. The Adaptive Lasso and Its Oracle Properties [J]. Journal of the American Statistical Association, 2006, 101(476): 1418-1429. doi: 10.1198/016214506000000735 CrossRef Google Scholar
[5]	HUANG J, MA S G, Zhang C H. Adaptive Lasso for Sparse High-Dimensional Regression Models [J]. Statistica Sinica, 2008, 18(4): 1603-1618. Google Scholar
[6]	ZOU H, HASTIE T. Regularization and Variable Selection via the Elastic Net [J]. Journal of the Royal Statistical Society, Series B, 2005, 67(1): 301-320. Google Scholar
[7]	卢颖. 广义线性模型基于Elastic Net的变量选择方法研究[D]. 北京: 北京交通大学, 2011.http://cdmd.cnki.com.cn/Article/CDMD-10004-1011198769.htm Google Scholar
[8]	闫丽娜. 惩罚Cox模型和弹性网技术在高维数据生存分析中的应用[D]. 太原: 山西医科大学, 2011.http://cdmd.cnki.com.cn/Article/CDMD-10114-1011092484.htm Google Scholar
[9]	FAN J, LI R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties [J]. Journal of the American Statistical Association, 2001, 96(456): 1348-1360. doi: 10.1198/016214501753382273 CrossRef Google Scholar
[10]	FAN J, LI R. Variable Selection for Cox's Proportional Hazards Model and Frailty Model [J]. Annals of Statistics, 2002, 30(1): 74-99. doi: 10.1214/aos/1015362185 CrossRef Google Scholar
[11]	ZOU H, ZHANG H H. On the Adaptive Elastic Net with a Diverging Number of Parameters [J]. Annals of Statistics, 2009, 37(4): 1733-1751. doi: 10.1214/08-AOS625 CrossRef Google Scholar
[12]	毕伯竹. 高维多重共线性数据的变量选择问题[D]. 济南: 山东大学, 2011.http://cdmd.cnki.com.cn/Article/CDMD-10422-1011225830.htm Google Scholar
[13]	王启华.生存数据统计分析[M].北京:科学出版社, 2006: 232-237. Google Scholar
[14]	郜艳晖, 何大卫. Cox模型的残差分析和影响诊断[J].现代预防医学, 2000, 27(1): 48-50. Google Scholar
[15]	SCHOENFELD D. Partial Residuals for the Proportional Hazards Regression Model [J]. Biometrika, 1982, 69(1): 239-241. doi: 10.1093/biomet/69.1.239 CrossRef Google Scholar
[16]	吴喜之.复杂数据统计方法——基于R的应用[M].北京:中国人民大学出版社, 2012. Google Scholar
[17]	王斌会.多元统计分析及R语言建模[M].广州:暨南大学出版社, 2010. Google Scholar
[18]	董英, 黄品贤. Cox模型及预测列线图在R软件中的实现[J].数理医药学杂志, 2012, 25(6): 711-713. Google Scholar
[19]	EFRON B, HASTIE T, JOHNSTONE I, et al. Least Angle Regression [J]. Technical Report, Stanford University, 2004, 32(2): 407-451. Google Scholar
[20]	VERWEIJ P J. Cross-Validation in Survival Analysis [J]. Statist Med, 1993, 12(24): 2305-2314. doi: 10.1002/(ISSN)1097-0258 CrossRef Google Scholar

Message Board

The Study of the Adaptive Elastic Net Method in the Variable Selection of the Cox Model

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors

The Study of the Adaptive Elastic Net Method in the Variable Selection of the Cox Model

HTML

Catalog

Message Board

The Study of the Adaptive Elastic Net Method in the Variable Selection of the Cox Model

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors

The Study of the Adaptive Elastic Net Method in the Variable Selection of the Cox Model

HTML

Catalog

Export File

Citation

Format

Content