The increasing sophistication of cyberattacks necessitates the development of advanced detection systems capable of accurately identifying and mitigating potential threats. This research addresses the critical challenge of cyberattack detection by employing a comprehensive approach that includes generating a realistic yet imbalanced dataset simulating various types of cyberattacks. Recognizing the inherent limitations posed by imbalanced data, we explored multiple data augmentation techniques to enhance the model’s learning effectiveness and ensure robust performance across different attack scenarios. Firstly, we constructed a detailed dataset reflecting real-world conditions of network intrusions by simulating a range of cyberattack types, ensuring it embodies the typical imbalances observed in genuine cybersecurity threats. Subsequently, we applied several data augmentation techniques, including SMOTE and ADASYN, to address the skew in class distribution, thereby providing a more balanced dataset for training supervised machine learning models. Our evaluation of these techniques across various models, such as Random Forests and Neural Networks, demonstrates significant improvements in detection capabilities. Moreover, the analysis also extends to the investigation of feature importance, providing critical insights into which attributes most significantly influence the predictive outcomes of the models. This not only enhances the interpretability of the models but also aids in refining feature engineering and selection processes to optimize performance.

Tosi T., M. K., Barbierato, E., Gatti, A., Balancing the Scale: Data Augmentation Techniques for Improved Supervised Learning in Cyberattack Detection, <<ENG>>, 2024; 5 (3): 2170-2205. [doi:10.3390/eng5030114] [https://hdl.handle.net/10807/297137]

Balancing the Scale: Data Augmentation Techniques for Improved Supervised Learning in Cyberattack Detection

Barbierato, Enrico
Methodology
;
2024

Abstract

The increasing sophistication of cyberattacks necessitates the development of advanced detection systems capable of accurately identifying and mitigating potential threats. This research addresses the critical challenge of cyberattack detection by employing a comprehensive approach that includes generating a realistic yet imbalanced dataset simulating various types of cyberattacks. Recognizing the inherent limitations posed by imbalanced data, we explored multiple data augmentation techniques to enhance the model’s learning effectiveness and ensure robust performance across different attack scenarios. Firstly, we constructed a detailed dataset reflecting real-world conditions of network intrusions by simulating a range of cyberattack types, ensuring it embodies the typical imbalances observed in genuine cybersecurity threats. Subsequently, we applied several data augmentation techniques, including SMOTE and ADASYN, to address the skew in class distribution, thereby providing a more balanced dataset for training supervised machine learning models. Our evaluation of these techniques across various models, such as Random Forests and Neural Networks, demonstrates significant improvements in detection capabilities. Moreover, the analysis also extends to the investigation of feature importance, providing critical insights into which attributes most significantly influence the predictive outcomes of the models. This not only enhances the interpretability of the models but also aids in refining feature engineering and selection processes to optimize performance.
2024
Inglese
ENG
Tosi T., M. K., Barbierato, E., Gatti, A., Balancing the Scale: Data Augmentation Techniques for Improved Supervised Learning in Cyberattack Detection, <<ENG>>, 2024; 5 (3): 2170-2205. [doi:10.3390/eng5030114] [https://hdl.handle.net/10807/297137]
File in questo prodotto:
File Dimensione Formato  
eng-05-00114.pdf

accesso aperto

Licenza: Creative commons
Dimensione 442.83 kB
Formato Adobe PDF
442.83 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/297137
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact