In the era of Big Data, analyzing entire datasets is often infeasible due to high computational costs and the risks of overfitting. Subsampling techniques have emerged as a cost-effective alternative, enabling researchers to extract informative subsets of data. This study investigates the performance of design-based subsampling methods under conditions of model misspecification and the presence of outliers. A comparative analysis of model-based approaches, such as D-optimal and LowCon, and model-free techniques, including MRSS, Twinning, and PDDS, is conducted. Simulation studies offer insights into the trade-offs between inferential robustness and computational efficiency, guiding the selection of optimal subsampling strategies for various applications.

Deldossi, L., Tommasi, C., Optimal subsampling from Big Datasets in presence of misspecification, in A. Pollice, P. M. (ed.), Methodological and Applied Statistics and Demography II: SIS 2024 Italian Statistical Society Series on Advances in Statistics, Springer, Cham 2025: 458- 464. https://doi.org/10.1007/978-3-031-64350-7_77 [https://hdl.handle.net/10807/311575]

Optimal subsampling from Big Datasets in presence of misspecification

Deldossi, Laura
Primo
;
2025

Abstract

In the era of Big Data, analyzing entire datasets is often infeasible due to high computational costs and the risks of overfitting. Subsampling techniques have emerged as a cost-effective alternative, enabling researchers to extract informative subsets of data. This study investigates the performance of design-based subsampling methods under conditions of model misspecification and the presence of outliers. A comparative analysis of model-based approaches, such as D-optimal and LowCon, and model-free techniques, including MRSS, Twinning, and PDDS, is conducted. Simulation studies offer insights into the trade-offs between inferential robustness and computational efficiency, guiding the selection of optimal subsampling strategies for various applications.
2025
Inglese
Methodological and Applied Statistics and Demography II: SIS 2024 Italian Statistical Society Series on Advances in Statistics
978-3-031-64350-7
Springer
Deldossi, L., Tommasi, C., Optimal subsampling from Big Datasets in presence of misspecification, in A. Pollice, P. M. (ed.), Methodological and Applied Statistics and Demography II: SIS 2024 Italian Statistical Society Series on Advances in Statistics, Springer, Cham 2025: 458- 464. https://doi.org/10.1007/978-3-031-64350-7_77 [https://hdl.handle.net/10807/311575]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/311575
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact