IRIS UniCatt

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

Barbierato, E., Della Vedova, M. L., Tessera, D., Toti, D., Vanoli, N., A Methodology for Controlling Bias and Fairness in Synthetic Data Generation, <<APPLIED SCIENCES>>, 2022; 12 (9): N/A-N/A. [doi:10.3390/app12094619] [https://hdl.handle.net/10807/208822]

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Barbierato, Enrico;Della Vedova, Marco Luigi;Tessera, Daniele;Toti, Daniele;Vanoli, Nicola

2022

Abstract

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2022
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				APPLIED SCIENCES
			
	DOI del contributo
	
				https://dx.doi.org/10.3390/app12094619
			
	Citazione
	
				Barbierato, E., Della Vedova, M. L., Tessera, D., Toti, D., Vanoli, N., A Methodology for Controlling Bias and Fairness in Synthetic Data Generation, <<APPLIED SCIENCES>>, 2022;  12 (9): N/A-N/A. [doi:10.3390/app12094619] [https://hdl.handle.net/10807/208822]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

File	Dimensione	Formato
2022_XAI_Applied_Sciences.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 7.48 MB Formato Adobe PDF Visualizza/Apri	7.48 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/208822

Citazioni

ND

35

24

social impact