IRIS UniCatt

Supervised learning in presence of multiple sets of noisy labels is a challenging task that is receiving increasing interest in the ever-evolving landscape of healthcare analytics. Such an issue arises when multiple annotators are tasked to manually label the same training samples, potentially giving rise to discrepancies in class assignments among the supplied labels with respect to the ground truth. Commonly, the labeling process is entrusted to a small group of domain experts, and different level of experience and subjectivity may result in noisy training labels. To solve the classification task leveraging on the availability of multiple data annotators, we introduce a novel ensemble methodology constructed combining model-based classifiers separately trained on single sets of noisy labels. Eigenvalue Decomposition Discriminant Analysis is employed for the definition of the base learners, and six distinct averaging strategies are proposed to combine them. Two solutions necessitate a priori information, such as the partial knowledge of the ground truth labels or the annotators' level of expertise. Differently, the remaining four approaches are entirely data-driven. A simulation study and an application on real data showcase the improved predictive performance of our proposal, while also demonstrating the ability of automatically inferring annotators' expertise level as a by-product of the learning process.

Montani, G., Cappozzo, A., Stacking Model‐Based Classifiers for Dealing With Multiple Sets of Noisy Labels, <<BIOMETRICAL JOURNAL>>, 2025; 67 (2): N/A-N/A. [doi:10.1002/bimj.70042] [https://hdl.handle.net/10807/309618]

Stacking Model‐Based Classifiers for Dealing With Multiple Sets of Noisy Labels

Montani, Giulia;Cappozzo, Andrea

2025

Abstract

Supervised learning in presence of multiple sets of noisy labels is a challenging task that is receiving increasing interest in the ever-evolving landscape of healthcare analytics. Such an issue arises when multiple annotators are tasked to manually label the same training samples, potentially giving rise to discrepancies in class assignments among the supplied labels with respect to the ground truth. Commonly, the labeling process is entrusted to a small group of domain experts, and different level of experience and subjectivity may result in noisy training labels. To solve the classification task leveraging on the availability of multiple data annotators, we introduce a novel ensemble methodology constructed combining model-based classifiers separately trained on single sets of noisy labels. Eigenvalue Decomposition Discriminant Analysis is employed for the definition of the base learners, and six distinct averaging strategies are proposed to combine them. Two solutions necessitate a priori information, such as the partial knowledge of the ground truth labels or the annotators' level of expertise. Differently, the remaining four approaches are entirely data-driven. A simulation study and an application on real data showcase the improved predictive performance of our proposal, while also demonstrating the ability of automatically inferring annotators' expertise level as a by-product of the learning process.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2025
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				BIOMETRICAL JOURNAL
			
	DOI del contributo
	
				https://dx.doi.org/10.1002/bimj.70042
			
	Citazione
	
				Montani, G., Cappozzo, A., Stacking Model‐Based Classifiers for Dealing With Multiple Sets of Noisy Labels, <<BIOMETRICAL JOURNAL>>, 2025;  67 (2): N/A-N/A. [doi:10.1002/bimj.70042] [https://hdl.handle.net/10807/309618]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

File	Dimensione	Formato
Biometrical J - 2025 - Montani - Stacking Modelâ Based Classifiers for Dealing With Multiple Sets of Noisy Labels.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.11 MB Formato Adobe PDF Visualizza/Apri	2.11 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/309618

Citazioni

ND

ND

0

social impact