IRIS UniCatt

With the growth of the Internet of Things and the rapid progress of social networks, everything appears to generate data. The ever-increasing number of connected devices is accompanied by a growth of the volume of data, produced at an ever-increasing rate, and this massive flow includes data types that are difficult to process using standard database techniques. One of the most critical scenarios is healthcare, whose activities need to store and manage a variety of data types - reports written in natural language, medical images, genomic data and waveforms of vital signs - which do not have a well-defined structure. In order to benefit from this large amount of complex data, Data Lakes have recently emerged as a solution to grant central storage and flexible analysis for all types of data. However, there is no Data Lake architecture that fits all the possible scenarios, since the architecture depends heavily on the application domain and, so far, there are no Data Lake architectures that support the specific needs of the healthcare domain. This work proposes HEALER: a Data Lake architecture that effectively performs data ingestion, data storage, and data access with the aim of providing a single central repository for efficient storage of different types of healthcare data. The architecture also enables the analysis and querying of the data, which can be loaded into the Data Lake regardless of their format and type. To verify the effectiveness of the architecture, a proof-of-concept of HEALER has been developed, that allows ingestion of various data, performs waveforms processing to make them more interpretable to researchers and analysts, grants access to the saved data and allows the analysis of natural language reports. Finally we studied the performance of the system in each of its main phases: ingestion, processing, data access and analysis. The results lead us to some important considerations to be taken into account when using and configuring the system components.

Manco, C., Dolci, T., Azzalini, F., Barbierato, E., Gribaudo, M., Tanca, L., HEALER: A Data Lake Architecture for Healthcare, Paper, in 2023 Workshops of the EDBT/ICDT Joint Conference, EDBT/ICDT-WS 2023, (Greece, 28-28 March 2023), CEUR-WS, Berlino 2023:3379 N/A-N/A [https://hdl.handle.net/10807/301237]

HEALER: A Data Lake Architecture for Healthcare

Manco C.;Dolci T.;Azzalini F.;Barbierato, Enrico^{Primo

Writing – Review & Editing};Gribaudo M.;Tanca L.

2023

Abstract

With the growth of the Internet of Things and the rapid progress of social networks, everything appears to generate data. The ever-increasing number of connected devices is accompanied by a growth of the volume of data, produced at an ever-increasing rate, and this massive flow includes data types that are difficult to process using standard database techniques. One of the most critical scenarios is healthcare, whose activities need to store and manage a variety of data types - reports written in natural language, medical images, genomic data and waveforms of vital signs - which do not have a well-defined structure. In order to benefit from this large amount of complex data, Data Lakes have recently emerged as a solution to grant central storage and flexible analysis for all types of data. However, there is no Data Lake architecture that fits all the possible scenarios, since the architecture depends heavily on the application domain and, so far, there are no Data Lake architectures that support the specific needs of the healthcare domain. This work proposes HEALER: a Data Lake architecture that effectively performs data ingestion, data storage, and data access with the aim of providing a single central repository for efficient storage of different types of healthcare data. The architecture also enables the analysis and querying of the data, which can be loaded into the Data Lake regardless of their format and type. To verify the effectiveness of the architecture, a proof-of-concept of HEALER has been developed, that allows ingestion of various data, performs waveforms processing to make them more interpretable to researchers and analysts, grants access to the saved data and allows the analysis of natural language reports. Finally we studied the performance of the system in each of its main phases: ingestion, processing, data access and analysis. The results lead us to some important considerations to be taken into account when using and configuring the system components.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2023
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				2023 Workshops of the EDBT/ICDT Joint Conference, EDBT/ICDT-WS 2023
			
	Denominazione evento
	
				Performance evaluation
			
	Luogo dell'evento
	
				Greece
			
	Tipo di contributo
	
				Paper
			
	Data inizio evento
	
				28-mar-2023
			
	Data fine evento
	
				28-mar-2023
			
	ISBN della pubblicazione
	
				978-3-89318-092-9
			
	Editore
	
				CEUR-WS
			
	Citazione
	
				Manco, C., Dolci, T., Azzalini, F., Barbierato, E., Gribaudo, M., Tanca, L., HEALER: A Data Lake Architecture for Healthcare,  Paper, in 2023 Workshops of the EDBT/ICDT Joint Conference, EDBT/ICDT-WS 2023, (Greece,  28-28 March 2023), CEUR-WS, Berlino 2023:3379 N/A-N/A [https://hdl.handle.net/10807/301237]
			
	Appare nelle tipologie:
	
				Paper, Selected paper, Contributed paper, Working paper, Poster, Poster paper, Comunicazione, Relazione (in volume)

File in questo prodotto:

File	Dimensione	Formato
DataPlat_2023_602.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.39 MB Formato Adobe PDF Visualizza/Apri	1.39 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/301237

Citazioni

ND

10

ND

social impact