欢迎访问中国科学院大学学报,今天是

中国科学院大学学报

• • 上一篇    下一篇

基于MCP惩罚的深度部分线性Cox模型及其在卵巢癌预后中的应用*

吴蔚琰, 张三国   

  1. 中国科学院大学数学科学学院,中国科学院大数据挖掘与知识管理重点实验室,北京 100049
  • 收稿日期:2026-02-02 修回日期:2026-05-08 发布日期:2026-05-09
  • 通讯作者: E-mail:sgzhang@ucas.ac.cn
  • 基金资助:
    *国家自然科学基金(批准号:12571298),中央高校基本科研业务费专项资金和教育部学科先导突破项目(JYB2025XDXM612)资助

A deep partially linear Cox model with MCP penalty and its application in ovarian cancer prognosis

WU Weiyan, ZHANG Sanguo   

  1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100049, China
  • Received:2026-02-02 Revised:2026-05-08 Published:2026-05-09

摘要: 针对高维生存分析数据中普遍存在的线性与非线性效应共存及复杂交互问题,传统的线性 Cox 比例风险模型难以捕捉复杂的非线性关系,而纯神经网络方法则面临高维过拟合与解释性缺失的挑战。本文提出了一种基于Minimax Concave Penalty (MCP)正则化的深度部分线性 Cox 模型(DMCOX)。该模型在保留经典 Cox 模型可解释性的基础上,通过将深度神经网络嵌入部分线性 Cox 模型(PLCM)框架,利用神经网络强大的通用逼近能力灵活捕捉低维协变量的非线性效应,同时引入 MCP 正则化项以实现对高维线性协变量的无偏估计与精确特征筛选。本文构建了包含 MCP 正则化与神经网络逼近的混合优化目标,并设计了基于坐标下降与梯度更新的交替优化算法进行模型求解。广泛的数值模拟实验表明,相较于传统的 Cox 模型、单纯结合深度学习的模型以及基于 Lasso、SCAD、L0 等惩罚策略的方法,DMCOX 在不同删失率和非线性复杂度场景下,均展现出更优越的预测精度(C-index)和变量选择性能(Recall, F1-score),有效克服了过拟合与欠拟合问题。进一步将模型应用于高等级浆液性卵巢癌(High-grade serous ovarian cancer,HGSOC)真实数据,结合 SP-LIME 方法进行特征筛选与可解释性分析,成功识别出 TAP1、CXCL9、COL11A1 等具有重要预后价值的关键基因特征,并取得了优于现有方法的预测表现,验证了该方法在精准医疗与生物标志物发现中的有效性与临床应用潜力。

关键词: 高维生存分析, 深度神经网络, 部分线性 Cox 模型, MCP 正则化

Abstract: In high-dimensional survival analysis, complex relationships involving both linear and nonlinear effects are prevalent. Traditional linear Cox proportional hazards models often fail to capture these complex nonlinearities, while purely neural network-based methods struggle with overfitting and a lack of interpretability in high-dimensional settings. This paper proposes a Deep Partial Linear Cox Model with adaptive Minimax Concave Penalty (MCP) regularization, termed DMCOX. While preserving the interpretability of the classic Cox model, the proposed method integrates deep neural networks into the Partial Linear Cox Model (PLCM) framework. It leverages the universal approximation capability of neural networks to flexibly capture the non-linear effects of low-dimensional covariates, while simultaneously introducing MCP regularization to achieve unbiased estimation and precise feature selection for high-dimensional linear covariates. A hybrid optimization objective combining MCP regularization and neural network approximation is constructed, and an alternating optimization algorithm based on coordinate descent and gradient-based updates is designed for model solving. Extensive numerical simulation experiments demonstrate that DMCOX outperforms traditional Cox models, simple deep learning-based models, and methods using Lasso, SCAD, or L0 penalties. Under various censoring rates and non-linear complexity scenarios, DMCOX exhibits superior predictive accuracy (C-index) and variable selection performance (Recall, F1-score), effectively overcoming overfitting and underfitting issues. Furthermore, the application of the model to real-world high-grade serous ovarian cancer (HGSOC) data, combined with SP-LIME for feature screening and interpretability analysis, successfully identified key prognostic gene features such as TAP1, CXCL9, and COL11A1. The model achieved predictive performance superior to existing methods, validating its effectiveness and clinical potential in precision medicine and biomarker discovery.

Key words: high-dimensional survival analysis, deep neural networks, partial linear Cox model, MCP regularization

中图分类号: