MSM-MIL: 一种面向病理图像分类的多阶段掩码多实例学习框架*

doi:10.7523/j.ucas.2026.020

摘要/Abstract

摘要： 多实例学习已成为数字病理学中处理超高分辨率全切片图像的主流方案。当前基于多实例学习的方法利用各种注意力机制聚合实例特征,然而往往注意力分数集中在少量实例特征上,导致模型关注区域单一,无法充分识别病理图像中多种结构的病灶区域与多种病理模式。针对这一问题,本文提出了MSM-MIL,一种面向病理图像分类的多阶段掩码多实例学习框架,它首先利用门控注意力模型导出初始包级嵌入与初始掩码,然后通过掩码叠加的方式逐阶段挖掘包内多样化的病理模式特征,打破单阶段仅关注单一显著特征的局限,随后,利用注意力机制聚合多个阶段的包级嵌入用于最终分类。在两个数据集上的实验结果表明,所提框架优于现有主流方法。

关键词: 病理图像分类, 多实例学习, 注意力机制, 掩码

Abstract: Multi-instance Learning (MIL) has become the mainstream paradigm for handling super-high resolution Whole Slide Images in digital pathology. Current MIL-based methods leverage various attention mechanisms to aggregate instance features; however, they tend to concentrate attention scores on a small subset of instances, resulting in limited focus regions and inadequate recognition of diverse lesion areas with heterogeneous structures and multiple pathological patterns in histopathological images. To address this limitation, we propose MSM-MIL, a multi-stage masked multi-instance learning framework for histopathological image classification. Specifically, it first leverages a gated attention model to derive the initial bag-level embedding and initial mask, then mines the diverse pathological pattern features within a bag stage by stage through mask stacking, breaking the limitation of focusing only on a single salient feature in a single stage. Finally, it aggregates multi-stage bag-level embeddings via attention mechanisms for ultimate classification. Experimental results on two datasets demonstrate that the proposed framework outperforms existing state-of-the-art methods.

Key words: Pathology Image Classification, Multi-Instance Learning, Attention Mechanism, Masking

中图分类号:

TP391

王玮, 张秋鹂, 姜海勇, 卢政达, 鲍迎秋, 傅裕, 肖俊. MSM-MIL: 一种面向病理图像分类的多阶段掩码多实例学习框架^*[J]. 中国科学院大学学报, DOI: 10.7523/j.ucas.2026.020.

WANG Wei, ZHANG Qiuli, JIANG Haiyong, LU Zhengda, BAO Yingqiu, FU Yu, XIAO Jun. MSM-MIL: A Multi-Stage Masked Multi-Instance Learning Framework for Pathology Image Classification[J]. Journal of University of Chinese Academy of Sciences, DOI: 10.7523/j.ucas.2026.020.

参考文献

[1] Gubatan J, Levitte S, Patel A, et al.Artificial intelligence applications in inflammatory bowel disease: Emerging technologies and future directions[J]. World Journal of Gastroenterology, 2021, 27(17): 1920-1935. DOI: 10.3748/wjg.v27.i17.1920.
[2] Ehteshami Bejnordi B, Veta M, Johannes van Diest P, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer[J]. Jama, 2017, 318(22): 2199. DOI: 10.1001/jama.2017.14585.
[3] Kraszewski S, Szczurek W, Szymczak J, et al.Machine learning prediction model for inflammatory bowel disease based on laboratory markers. working model in a discovery cohort study[J]. Journal of Clinical Medicine, 2021, 10(20): 4745. DOI: 10.3390/jcm10204745.
[4] Con D, van Langenberg D R, Vasudevan A. Deep learning vs conventional learning algorithms for clinical prediction in Crohn' s disease: A proof-of-concept study[J]. World Journal of Gastroenterology, 2021, 27(38): 6476-6488. DOI: 10.3748/wjg.v27.i38.6476.
[5] Kiyokawa H, Abe M, Matsui T, et al.Deep learning analysis of histologic images from intestinal specimen reveals adipocyte shrinkage and mast cell infiltration to predict postoperative crohn disease[J]. The American Journal of Pathology, 2022, 192(6): 904-916. DOI: 10.1016/j.ajpath.2022.03.006.
[6] Huang K Z, Yang H Q, King I, et al.Maximizing sensitivity in medical diagnosis using biased minimax probability Machine[J]. IEEE Transactions on Biomedical Engineering, 2006, 53(5): 821-831. DOI: 10.1109/TBME.2006.872819.
[7] Peng X, King I.Robust BMPM training based on second-order cone programming and its application in medical diagnosis[J]. Neural Networks, 2008, 21(2/3): 450-457. DOI: 10.1016/j.neunet.2007.12.051.
[8] Campanella G, Hanna M G, Geneslaw L, et al.Clinical-grade computational pathology using weakly supervised deep learning on whole slide images[J]. Nature Medicine, 2019, 25(8): 1301-1309. DOI: 10.1038/s41591-019-0508-1.
[9] Ianni J D, Soans R E, Sankarapandian S, et al.Tailored for real-world: A whole slide image classification system validated on uncurated multi-site data emulating the prospective pathology workload[J]. Scientific Reports, 2020, 10: 3217. DOI: 10.1038/s41598-020-59985-2.
[10] Ilse M, Tomczak J, Welling M. Attention-based deep multiple instance learning[C]//Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018: 2127-2136. (2018-07-03)[2026-04-09]. https://proceedings.mlr.press/v80/ilse18a.html.
[11] Li B, Li Y, Eliceiri K W.Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021: 14313-14323. DOI: 10.1109/CVPR46437.2021.01409.
[12] Shao Z, Bian H, Chen Y, et al. TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification[C]//Advances in Neural Information Processing Systems: Vol. 34. Curran Associates, Inc., 2021: 2136-2147. (2021-11-10)[2026-04-09]. https://proceedings.neurips.cc/paper_files/paper/2021/hash/10c272d06794d3e5785d5e7c5356e9ff-Abstract.html.
[13] Zhang H R, Meng Y D, Zhao Y T, et al.DTFD-MIL: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 18780-18790. DOI: 10.1109/CVPR52688.2022.01824.
[14] Quellec G, Cazuguel G, Cochener B, et al.Multiple-instance learning for medical image and video analysis[J]. IEEE Reviews in Biomedical Engineering, 2017, 10: 213-234. DOI: 10.1109/RBME.2017.2651164.
[15] Chen X W, Qiu P J, Zhu W H, et al. TimeMIL: Advancing multivariate time series classification via a time-aware multiple instance learning[EB/OL]. arXiv:2405.03140v2 (2024-05-27)[2026-04-09]. https://arxiv.org/abs/2405.03140v2.
[16] Wang X G, Yan Y L, Tang P, et al.Revisiting multiple instance neural networks[J]. Pattern Recognition, 2018, 74: 15-24. DOI: 10.1016/j.patcog.2017.08.026.
[17] Qiu P J, Xiao P, Zhu W H, et al. SC-MIL: Sparsely coded multiple instance learning for whole slide image classification[EB/OL]. arXiv:2311.00048v2 (2024-08-01)[2026-04-09]. https://arxiv.org/abs/2311.00048v2.
[18] Xiang J X, Zhang J. Exploring low-rank property in multiple instance learning for whole slide image classification[C]//The Eleventh International Conference on Learning Representations.2022. (2023-02-02)[2026-04-09]. https://openreview.net/forum?id=01KmhBsEPFO.
[19] Han X, Huang J Z, Wang M H, et al.SCL-WC: Cross-slide contrastive learning for weakly-supervised whole-slide image classification[C]//Advances in Neural Information Processing Systems 35. November 28-December 9, 2022, New Orleans, Louisiana, USA. Neural Information Processing Systems Foundation, Inc.(NeurIPS), 2022: 18009-18021. DOI: 10.52202/068431-1309.
[20] Yang L T, Mehta D, Liu S D, et al. TPMIL: Trainable prototype enhanced multiple instance learning for whole slide image classification[EB/OL]. arXiv:2305.00696v1 (2023-05-01)[2026-04-09]. https://arxiv.org/abs/2305.00696v1.
[21] Lin T C, Yu Z M, Hu H Y, et al.Interventional bag multi-instance learning on whole-slide pathological images[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 19830-19839. DOI: 10.1109/CVPR52729.2023.01899.
[22] Rymarczyk D, Pardyl A, Kraus J, et al.ProtoMIL: Multiple instance learning with prototypical parts for whole-slide image classification[M]//Machine Learning and Knowledge Discovery in Databases. Cham: Springer International Publishing, 2023: 421-436. DOI: 10.1007/978-3-031-26387-3_26.
[23] 李锦昌, 贾伟, 赵雪芬. 双向协同蒸馏多实例学习的肺癌图像分类方法[J]. 计算机应用, 2026. (2026-02-12)[2026-04-09]. https://link.cnki.net/urlid/51.1307.TP.20260212.1112.002.
[24] 李锦昌, 贾伟, 孔德凤, 等. 多教师模型自适应知识融合的肺癌全切片图像分类方法[J]. 计算机科学, 2026. (2026-01-05)[2026-04-09]. https://link.cnki.net/urlid/50.1075.tp.20260104.1707.002.
[25] 薛保, 周俊杰, 邵伟. 基于可变形注意力和多尺度多实例学习的全切片病理图像分类方法[J]. 数据采集与处理, 2026, 41(1): 231-243. DOI: 10.16337/j.1004-9037.2026.01.016.
[26] Lu M Y, Williamson D F K, Chen T Y, et al. Data-efficient and weakly supervised computational pathology on whole-slide images[J]. Nature Biomedical Engineering, 2021, 5(6): 555-570. DOI: 10.1038/s41551-020-00682-w.
[27] Xiong C H, Chen H, Sung J J Y, et al. Diagnose like a pathologist: Transformer-enabled hierarchical attention-guided multiple instance learning for whole slide image classification[C]//Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. August 19-25, 2023, Macau, SAR China. International Joint Conferences on Artificial Intelligence Organization, 2023: 1587-1595. DOI: 10.24963/ijcai.2023/176.
[28] Zhang Y P, Liu S H, Qu X R, et al.Multi-instance discriminative contrastive learning for brain image representation[J]. Neural Computing and Applications, 2025, 37(11): 7459-7472. DOI: 10.1007/s00521-022-07524-7.
[29] Zhang Y L, Li H L, Sun Y X, et al.Attention-challenging multiple instance learning for whole slide image classification[M]//Computer Vision - ECCV 2024. Cham: Springer Nature Switzerland, 2024: 125-143. DOI: 10.1007/978-3-031-73668-1_8.
[30] Tang W H, Huang S, Zhang X X, et al.Multiple instance learning framework with masked hard instance mining for whole slide image classification[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). October 1-6, 2023, Paris, France. IEEE, 2024: 4055-4064. DOI: 10.1109/ICCV51070.2023.00377.
[31] Luo X Y, Qu L H, Song Z J, et al.Bi-directional weakly supervised knowledge distillation for whole slide image classification[C]//Advances in Neural Information Processing Systems 35. November 28-December 9, 2022, New Orleans, Louisiana, USA. Neural Information Processing Systems Foundation, Inc.(NeurIPS), 2022: 15368-15381. DOI: 10.52202/068431-1118.
[32] Brancati N, Anniciello A M, Pati P, et al.BRACS: A dataset for BReAst carcinoma subtyping in H&E histology images[J]. Database, 2022, 2022: baac093. DOI: 10.1093/database/baac093.
[33] Xiong Y Y, Zeng Z P, Chakraborty R, et al.Nyströmformer: A nyström-based algorithm for approximating self-attention[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(16): 14138-14148. DOI: 10.1609/aaai.v35i16.17664.
[34] He K M, Zhang X Y, Ren S Q, et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
[35] Liu Z, Mao H Z, Wu C Y, et al.A ConvNet for the 2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 11966-11976. DOI: 10.1109/CVPR52688.2022.01167.
[36] Howard A, Sandler M, Chen B, et al.Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019, Seoul, Korea (South). IEEE, 2019: 1314-1324. DOI: 10.1109/ICCV.2019.00140.
[37] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. arXiv:2010.11929v2 (2021-06-03)[2026-04-09]. https://arxiv.org/abs/2010.11929v2.
[38] Deng J, Dong W, Socher R, et al.ImageNet: A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. June 20-25, 2009, Miami, FL, USA. IEEE, 2009: 248-255. DOI: 10.1109/CVPR.2009.5206848.
[39] Kang M G, Song H, Park S, et al.Benchmarking self-supervised learning on diverse pathology datasets[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 3344-3354. DOI: 10.1109/CVPR52729.2023.00326.

MSM-MIL: 一种面向病理图像分类的多阶段掩码多实例学习框架^*

MSM-MIL: A Multi-Stage Masked Multi-Instance Learning Framework for Pathology Image Classification

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

编辑推荐

Metrics

本文评价

访问统计

联系我们

[1]	张冰玉, 潘志刚, 姚锴, 董旭彬. 基于生成对抗网络的SAR解压缩图像重建算法[J]. 中国科学院大学学报, 2025, 42(5): 666-676.
[2]	陈经纬, 李宇, 陈俊, 张洪群. 基于MFF-Deeplabv3+网络的高分辨率遥感影像建筑物提取方法[J]. 中国科学院大学学报, 2024, 41(5): 654-664.
[3]	黄玉林, 梁磊, 李卫军, 习晓环. 基于多尺度特征和注意力机制的深度学习点云压缩[J]. 中国科学院大学学报, 2024, 41(5): 687-694.
[4]	王兆瑞, 岩延, 张宝贤. 基于时空依赖关系多智能体强化学习的多路口交通信号协同控制方法[J]. 中国科学院大学学报, 2024, 41(3): 398-410.
[5]	李雪源, 韩丛英. Actor-critic框架下的二次指派问题求解方法[J]. 中国科学院大学学报, 2024, 41(2): 275-284.
[6]	李翔, 王艳, 李宝清. 基于FVC-CNN模型的野外车辆声信号分类[J]. 中国科学院大学学报, 2023, 40(2): 208-216.
[7]	霍鑫怡, 李焱磊, 陈龙永, 张福博, 孙巍. 基于卷积注意力和胶囊网络的 SAR少样本目标识别方法[J]. 中国科学院大学学报, 2022, 39(6): 783-792.
[8]	李凌, 李京, 徐琳, 王维维. 一种云计算环境中用户身份信息隐私保护方法[J]. 中国科学院大学学报, 2013, 30(1): 98-105.