IRIS UniCatt

In the realm of data-driven systems, understanding and controlling biases in datasets emerges as a critical challenge. These biases, defined in this study as systematic discrepancies, have the potential to skew algorithmic outcomes and even compromise data privacy. Mutual information serves as a key tool in the analysis, discerning both direct and indirect relationships between variables. Utilizing structural equation modeling, this paper introduces a synthetic dataset generation method founded on a two-step optimization algorithm that aims to fine-tune variable relationships and achieve targeted mutual information levels between attribute pairs. The algorithm's first phase utilizes gradient-less optimization, focusing on individual variables. The subsequent phase harnesses gradient-based methods to unravel deeper variable interdependencies. The approach is dual-purpose: it refines existing datasets for bias mitigation and creates synthetic datasets with defined bias levels, addressing a crucial research gap. Two case studies showcase the methodology. One emphasizes the finesse of network parameter adjustments in a simulated setting. The other applies the methodology to a realistic job hiring dataset, effectively reducing bias while safeguarding key variable relationships. In summary, this paper offers a novel method for bias management, presents tools for quantitative bias adjustments, and provides evidence of the method's broad applicability through varied use cases.

Barbierato, E., Pozzi, A., Tessera, D., Controlling Bias Between Categorical Attributes in Datasets: A Two-Step Optimization Algorithm Leveraging Structural Equation Modeling, <<IEEE ACCESS>>, N/A; 11 (N/A): 115493-115510. [doi:10.1109/ACCESS.2023.3325235] [https://hdl.handle.net/10807/271838]

Controlling Bias Between Categorical Attributes in Datasets: A Two-Step Optimization Algorithm Leveraging Structural Equation Modeling

Barbierato, Enrico;Pozzi, Andrea;Tessera, Daniele

2023

Abstract

In the realm of data-driven systems, understanding and controlling biases in datasets emerges as a critical challenge. These biases, defined in this study as systematic discrepancies, have the potential to skew algorithmic outcomes and even compromise data privacy. Mutual information serves as a key tool in the analysis, discerning both direct and indirect relationships between variables. Utilizing structural equation modeling, this paper introduces a synthetic dataset generation method founded on a two-step optimization algorithm that aims to fine-tune variable relationships and achieve targeted mutual information levels between attribute pairs. The algorithm's first phase utilizes gradient-less optimization, focusing on individual variables. The subsequent phase harnesses gradient-based methods to unravel deeper variable interdependencies. The approach is dual-purpose: it refines existing datasets for bias mitigation and creates synthetic datasets with defined bias levels, addressing a crucial research gap. Two case studies showcase the methodology. One emphasizes the finesse of network parameter adjustments in a simulated setting. The other applies the methodology to a realistic job hiring dataset, effectively reducing bias while safeguarding key variable relationships. In summary, this paper offers a novel method for bias management, presents tools for quantitative bias adjustments, and provides evidence of the method's broad applicability through varied use cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2023
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				IEEE ACCESS
			
	DOI del contributo
	
				https://dx.doi.org/10.1109/ACCESS.2023.3325235
			
	URL alternativo
	
				https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10286840
			
	Citazione
	
				Barbierato, E., Pozzi, A., Tessera, D., Controlling Bias Between Categorical Attributes in Datasets: A Two-Step Optimization Algorithm Leveraging Structural Equation Modeling, <<IEEE ACCESS>>, N/A;  11 (N/A): 115493-115510. [doi:10.1109/ACCESS.2023.3325235] [https://hdl.handle.net/10807/271838]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

File	Dimensione	Formato
Controlling_Bias_Between_Categorical_Attributes_in_Datasets_A_Two-Step_Optimization_Algorithm_Leveraging_Structural_Equation_Modeling.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.3 MB Formato Adobe PDF Visualizza/Apri	2.3 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/271838

Citazioni

ND

1

1

social impact