This paper presents NETHIC, a software system for the automatic classification of textual documents based on hierarchical taxonomies and artificial neural networks. This approach combines the advantages of highly-structured hierarchies of textual labels with the versatility and scalability of neural networks, thus bringing about a textual classifier that displays high levels of performance in terms of both effectiveness and efficiency. The system has first been tested as a general-purpose classifier on a generic document corpus, and then applied to the specific domain tackled by DANTE, a European project that is meant to address criminal and terrorist-related online contents, showing consistent results across both application domains.
Ciapetti, A., Di Florio, R., Lomasto, L., Miscione, G., Ruggiero, G., Toti, D., NETHIC: A system for automatic text classification using neural networks and hierarchical taxonomies, Paper, in ICEIS 2019 - Proceedings of the 21st International Conference on Enterprise Information Systems, (grc, 03-05 May 2019), SciTePress, Setúbal 2019:<<ICEIS 2019 - Proceedings of the 21st International Conference on Enterprise Information Systems>>,1 284-294. 10.5220/0007709702960306 [http://hdl.handle.net/10807/165868]
NETHIC: A system for automatic text classification using neural networks and hierarchical taxonomies
Toti, D.
2019
Abstract
This paper presents NETHIC, a software system for the automatic classification of textual documents based on hierarchical taxonomies and artificial neural networks. This approach combines the advantages of highly-structured hierarchies of textual labels with the versatility and scalability of neural networks, thus bringing about a textual classifier that displays high levels of performance in terms of both effectiveness and efficiency. The system has first been tested as a general-purpose classifier on a generic document corpus, and then applied to the specific domain tackled by DANTE, a European project that is meant to address criminal and terrorist-related online contents, showing consistent results across both application domains.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.