IRIS UniCatt

Nowadays, in many different fields, massive data are available and for several rea- sons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observa- tions. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influ- ence). To overcome this problem, firstly, we propose a non-informative “exchange” procedure that enables us to select a “nearly” D-optimal subset of observations with- out high leverage values. Then, we provide an informative version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not necessarily associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the non-informative and informative selection procedures are adapted to I-optimality, with the goal of getting accurate predictions.

Deldossi, L., Pesce, E., Tommasi, C., Accounting for outliers in optimal subsampling methods, <<STATISTICAL PAPERS>>, 2023; (64): 1119-1135. [doi:10.1007/s00362-023-01422-3] [https://hdl.handle.net/10807/233890]

Accounting for outliers in optimal subsampling methods

Deldossi, Laura^Primo;Pesce, Elena^Secondo;Tommasi, Chiara^Ultimo

2023

Abstract

Nowadays, in many different fields, massive data are available and for several rea- sons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observa- tions. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influ- ence). To overcome this problem, firstly, we propose a non-informative “exchange” procedure that enables us to select a “nearly” D-optimal subset of observations with- out high leverage values. Then, we provide an informative version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not necessarily associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the non-informative and informative selection procedures are adapted to I-optimality, with the goal of getting accurate predictions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2023
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				STATISTICAL PAPERS
			
	DOI del contributo
	
				https://dx.doi.org/10.1007/s00362-023-01422-3
			
	Citazione
	
				Deldossi, L., Pesce, E., Tommasi, C., Accounting for outliers in optimal subsampling methods, <<STATISTICAL PAPERS>>, 2023;  (64): 1119-1135. [doi:10.1007/s00362-023-01422-3] [https://hdl.handle.net/10807/233890]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream-1504258845.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 668.52 kB Formato Adobe PDF Visualizza/Apri	668.52 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/233890

Citazioni

ND

1

3

social impact