Generating high-quality synthetic data is essential for advancing machine learning applications in financial time series, where data scarcity and privacy concerns often pose significant challenges. This study proposes a novel hybrid architecture that combines variational autoencoders (VAEs) with Markov Chain Monte Carlo (MCMC) sampling to enhance the generation of robust synthetic sequential data. The model leverages Gated Recurrent Unit (GRU) layers for capturing long-term temporal dependencies and MCMC sampling for effective latent space exploration, ensuring high variability and accuracy. Experimental evaluations on datasets of Google, Tesla, and Nestlé stock prices demonstrate the model’s superior performance in preserving statistical and temporal patterns, as validated by quantitative metrics (discriminative and predictive scores), statistical tests (Kolmogorov–Smirnov), and t-Distributed Stochastic Neighbour Embedding (t-SNE) visualisations. The experiments reveal the model’s scalability, maintaining high fidelity even under augmented dataset sizes and missing data scenarios. These findings position the proposed framework as a computationally efficient and structurally simple alternative to Generative Adversarial Network (GAN)-based methods, suitable for real-world applications in data-driven financial modelling.

Bruni Prenestino, F., Barbierato, E., Gatti, A., Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo Architectures, <<FUTURE INTERNET>>, 2025; 17 (2): N/A-N/A. [doi:10.3390/fi17020095] [https://hdl.handle.net/10807/326624]

Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo Architectures

Barbierato, Enrico
Secondo
Writing – Review & Editing
;
2025

Abstract

Generating high-quality synthetic data is essential for advancing machine learning applications in financial time series, where data scarcity and privacy concerns often pose significant challenges. This study proposes a novel hybrid architecture that combines variational autoencoders (VAEs) with Markov Chain Monte Carlo (MCMC) sampling to enhance the generation of robust synthetic sequential data. The model leverages Gated Recurrent Unit (GRU) layers for capturing long-term temporal dependencies and MCMC sampling for effective latent space exploration, ensuring high variability and accuracy. Experimental evaluations on datasets of Google, Tesla, and Nestlé stock prices demonstrate the model’s superior performance in preserving statistical and temporal patterns, as validated by quantitative metrics (discriminative and predictive scores), statistical tests (Kolmogorov–Smirnov), and t-Distributed Stochastic Neighbour Embedding (t-SNE) visualisations. The experiments reveal the model’s scalability, maintaining high fidelity even under augmented dataset sizes and missing data scenarios. These findings position the proposed framework as a computationally efficient and structurally simple alternative to Generative Adversarial Network (GAN)-based methods, suitable for real-world applications in data-driven financial modelling.
2025
Inglese
Bruni Prenestino, F., Barbierato, E., Gatti, A., Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo Architectures, <<FUTURE INTERNET>>, 2025; 17 (2): N/A-N/A. [doi:10.3390/fi17020095] [https://hdl.handle.net/10807/326624]
File in questo prodotto:
File Dimensione Formato  
futureinternet-17-00095-v2.pdf

accesso aperto

Descrizione: Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo Architectures
Tipologia file ?: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 24.55 MB
Formato Adobe PDF
24.55 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/326624
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact