欢迎访问中国科学院大学学报,今天是

中国科学院大学学报

• •    

MSM-MIL: 一种面向病理图像分类的多阶段掩码多实例学习框架*

王玮1, 张秋鹂2, 姜海勇1, 卢政达1, 鲍迎秋2, 傅裕2, 肖俊1,†   

  1. 1.中国科学院大学,人工智能学院,北京 100049;
    2.北京医院(国家老年医学中心)皮肤科,老年疾病国家临床医学研究中心,国家卫生健康委老年医学重点实验室,中国医学科学院老年医学研究院,北京 100730
  • 收稿日期:2026-02-04 修回日期:2026-04-14 发布日期:2026-04-21
  • 通讯作者: E-mail: xiaojun@ucas.ac.cn、fuyu3116@bjhmoh.cn
  • 基金资助:
    *国家自然科学基金项目(82574012)和中央高水平医院临床科研业务费(BJ-2024-090)资助

MSM-MIL: A Multi-Stage Masked Multi-Instance Learning Framework for Pathology Image Classification

WANG Wei1, ZHANG Qiuli2, JIANG Haiyong1, LU Zhengda1, BAO Yingqiu2, FU Yu2, XIAO Jun1   

  1. 1. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;
    2. Beijing Hospital (National Center for Gerontology), Department of Dermatology; National Clinical Research Center for Geriatric Diseases; Key Laboratory of Geriatric Medicine, National Health Commission; Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China
  • Received:2026-02-04 Revised:2026-04-14 Published:2026-04-21

摘要: 多实例学习已成为数字病理学中处理超高分辨率全切片图像的主流方案。当前基于多实例学习的方法利用各种注意力机制聚合实例特征,然而往往注意力分数集中在少量实例特征上,导致模型关注区域单一,无法充分识别病理图像中多种结构的病灶区域与多种病理模式。针对这一问题,本文提出了MSM-MIL,一种面向病理图像分类的多阶段掩码多实例学习框架,它首先利用门控注意力模型导出初始包级嵌入与初始掩码,然后通过掩码叠加的方式逐阶段挖掘包内多样化的病理模式特征,打破单阶段仅关注单一显著特征的局限,随后,利用注意力机制聚合多个阶段的包级嵌入用于最终分类。在两个数据集上的实验结果表明,所提框架优于现有主流方法。

关键词: 病理图像分类, 多实例学习, 注意力机制, 掩码

Abstract: Multi-instance Learning (MIL) has become the mainstream paradigm for handling super-high resolution Whole Slide Images in digital pathology. Current MIL-based methods leverage various attention mechanisms to aggregate instance features; however, they tend to concentrate attention scores on a small subset of instances, resulting in limited focus regions and inadequate recognition of diverse lesion areas with heterogeneous structures and multiple pathological patterns in histopathological images. To address this limitation, we propose MSM-MIL, a multi-stage masked multi-instance learning framework for histopathological image classification. Specifically, it first leverages a gated attention model to derive the initial bag-level embedding and initial mask, then mines the diverse pathological pattern features within a bag stage by stage through mask stacking, breaking the limitation of focusing only on a single salient feature in a single stage. Finally, it aggregates multi-stage bag-level embeddings via attention mechanisms for ultimate classification. Experimental results on two datasets demonstrate that the proposed framework outperforms existing state-of-the-art methods.

Key words: Pathology Image Classification, Multi-Instance Learning, Attention Mechanism, Masking

中图分类号: