This study explores the application of topic modeling techniques for auditing purposes in the banking sector, focusing on the analysis of reviews of anti-money laundering alerts. We compare three topic modeling algorithms: Latent Dirichlet Allocation (LDA), Embedded Topic Model (ETM), and Product of Experts LDA (ProdLDA), using a dataset of 35,000 suspicious activity reports from an Italian bank. The models were evaluated using the coherence score, NPMI coherence, and topic diversity metrics. Our results show that ProdLDA consistently outperformed LDA and ETM, with the best performance achieved using 1-gram word embeddings. The study reveals distinct topics related to specific client activities, cross-border transactions, and high-risk business sectors, like gambling. These results demonstrate the potential of advanced topic modeling techniques in enhancing the efficiency and effectiveness of auditing processes in the banking sector, particularly in the analysis of activities that could be tied to money laundering and terrorism.
Giaconia, A., Chiariello, V., Passarotti, M. C., Topic Modeling for Auditing Purposes in the Banking Sector, in Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), (Pisa, 04-06 December 2024), CEUR Workshop Proceedings, Pisa 2024: 1030-1035 [https://hdl.handle.net/10807/308720]
Topic Modeling for Auditing Purposes in the Banking Sector
Passarotti, Marco Carlo
2024
Abstract
This study explores the application of topic modeling techniques for auditing purposes in the banking sector, focusing on the analysis of reviews of anti-money laundering alerts. We compare three topic modeling algorithms: Latent Dirichlet Allocation (LDA), Embedded Topic Model (ETM), and Product of Experts LDA (ProdLDA), using a dataset of 35,000 suspicious activity reports from an Italian bank. The models were evaluated using the coherence score, NPMI coherence, and topic diversity metrics. Our results show that ProdLDA consistently outperformed LDA and ETM, with the best performance achieved using 1-gram word embeddings. The study reveals distinct topics related to specific client activities, cross-border transactions, and high-risk business sectors, like gambling. These results demonstrate the potential of advanced topic modeling techniques in enhancing the efficiency and effectiveness of auditing processes in the banking sector, particularly in the analysis of activities that could be tied to money laundering and terrorism.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.