IRIS UniCatt

With the advent of ‘Big Data’, massive data sets are becoming increasingly prevalent. Several subdata selection are proposed in these last few years both to reduce the computational burden and to improve cost effectiveness and learning of the phenomenon. Some of these proposals (Drovandi et al., 2017; Wang et al., 2019; Deldossi and Tommasi (2021) among others) are inspired to Optimal Experimental Design (OED). However, differently from the OED context - where researchers have typically complete control over the predictors - in subsampling methods these, and the responses as well, are passively observed. Thus if outliers are present in the ‘Big Data’, it is likely that they could be included in the sample selected applying the D-criterion, being the D-optimal design points on the boundary of the design space. In regression analysis, outliers - and more in general influential points – could have a large impact on the estimates; identify and exclude them in advance, especially in large datasets, is generally not an easy task. In this study, we propose an exchange procedure to select a compromise-optimal subset which is informative for the inferential goal and avoids outliers and ‘bad’ influential points

Deldossi, L., Pesce, E., Tommasi, C., Optimal subset selection without outliers, in Programme and Abstracts, 22nd Annual ENBIS Conference, Trondheim, 26-30 June 2022, (TRONDHEIM, 26-30 June 2022), Bergquist, B; Tyssedal, J; Ruckert J, Trondheim 2022:2022 34-35 [http://hdl.handle.net/10807/214433]

Optimal subset selection without outliers

Deldossi, Laura;PESCE E.;TOMMASI C.

2022

Abstract

With the advent of ‘Big Data’, massive data sets are becoming increasingly prevalent. Several subdata selection are proposed in these last few years both to reduce the computational burden and to improve cost effectiveness and learning of the phenomenon. Some of these proposals (Drovandi et al., 2017; Wang et al., 2019; Deldossi and Tommasi (2021) among others) are inspired to Optimal Experimental Design (OED). However, differently from the OED context - where researchers have typically complete control over the predictors - in subsampling methods these, and the responses as well, are passively observed. Thus if outliers are present in the ‘Big Data’, it is likely that they could be included in the sample selected applying the D-criterion, being the D-optimal design points on the boundary of the design space. In regression analysis, outliers - and more in general influential points – could have a large impact on the estimates; identify and exclude them in advance, especially in large datasets, is generally not an easy task. In this study, we propose an exchange procedure to select a compromise-optimal subset which is informative for the inferential goal and avoids outliers and ‘bad’ influential points

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2022
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Programme and Abstracts, 22nd Annual ENBIS Conference, Trondheim, 26-30 June 2022
			
	Denominazione evento
	
				ENBIS Conference
			
	Luogo dell'evento
	
				TRONDHEIM
			
	Data inizio evento
	
				26-giu-2022
			
	Data fine evento
	
				30-giu-2022
			
	ISBN del volume
	
				9788230354704
			
	Editore
	
				Bergquist, B; Tyssedal, J; Ruckert J
			
	Citazione
	
				Deldossi, L., Pesce, E., Tommasi, C.,  Optimal subset selection without outliers, in Programme and Abstracts, 22nd Annual ENBIS Conference, Trondheim, 26-30 June 2022, (TRONDHEIM,  26-30 June 2022), Bergquist, B; Tyssedal, J; Ruckert J, Trondheim 2022:2022 34-35 [http://hdl.handle.net/10807/214433]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/214433

Citazioni

ND

ND

ND

social impact