In the era of Big Data, analyzing entire datasets is often infeasible due to high computational costs and the risks of overfitting. Subsampling techniques have emerged as a cost-effective alternative, enabling researchers to extract informative subsets of data. This study investigates the performance of design-based subsampling methods under conditions of model misspecification and the presence of outliers. A comparative analysis of model-based approaches, such as D-optimal and LowCon, and model-free techniques, including MRSS, Twinning, and PDDS, is conducted. Simulation studies offer insights into the trade-offs between inferential robustness and computational efficiency, guiding the selection of optimal subsampling strategies for various applications.
Deldossi, L., Tommasi, C., Optimal subsampling from Big Datasets in presence of misspecification, in A. Pollice, P. M. (ed.), Methodological and Applied Statistics and Demography II: SIS 2024 Italian Statistical Society Series on Advances in Statistics, Springer, Cham 2025: 458- 464. https://doi.org/10.1007/978-3-031-64350-7_77 [https://hdl.handle.net/10807/311575]
Optimal subsampling from Big Datasets in presence of misspecification
Deldossi, Laura
Primo
;
2025
Abstract
In the era of Big Data, analyzing entire datasets is often infeasible due to high computational costs and the risks of overfitting. Subsampling techniques have emerged as a cost-effective alternative, enabling researchers to extract informative subsets of data. This study investigates the performance of design-based subsampling methods under conditions of model misspecification and the presence of outliers. A comparative analysis of model-based approaches, such as D-optimal and LowCon, and model-free techniques, including MRSS, Twinning, and PDDS, is conducted. Simulation studies offer insights into the trade-offs between inferential robustness and computational efficiency, guiding the selection of optimal subsampling strategies for various applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.