This paper presents the first core component of LinkEn, a knowledge base of interoperable language resources for English adhering to Linked Open Data principles. With this initial step towards a broader infrastructure, we focus on the development of a lemma-centered hub designed to enable interoperability between distributed lexical resources, corpora, and linguistic annotations. The modeling is inspired by the LiLa Knowledge Base for Latin and the OntoLex-Lemon model, ensuring compatibility with existing lemma-centric knowledge graphs and enabling future cross-linguistic interoperability. Rather than relying solely on manual knowledge graph construction and significant human effort, the lemma bank has been developed through a hybrid neuro-symbolic pipeline that integrates large language models into the generation of RDF data under explicit ontological constraints. This approach combines automated generation with ontology-driven supervision and evaluation, enabling scalable yet controlled construction of structured lexical knowledge. By presenting the first steps towards the LinkEn Knowledge Base, this paper contributes both a new lemma bank for English and an experimental methodology for the semi-automatic creation of Linked Data based knowledge graphs.
Augello, L., Passarotti, M. C., Towards the LinkEn Knowledge Base. A Neuro-Symbolic approach to build a Linked Data hub for English lemmas with Large Language Models, in Proceedings of the 10th Workshop on Linked Data in Linguistics (LDL-2026) @LREC 2026, (Palma De Mallorca, 12-12 May 2026), European Language Resources Association (ELRA), Palma De Mallorca 2026: 13-21 [https://hdl.handle.net/10807/335481]
Towards the LinkEn Knowledge Base. A Neuro-Symbolic approach to build a Linked Data hub for English lemmas with Large Language Models
Passarotti, Marco Carlo
2026
Abstract
This paper presents the first core component of LinkEn, a knowledge base of interoperable language resources for English adhering to Linked Open Data principles. With this initial step towards a broader infrastructure, we focus on the development of a lemma-centered hub designed to enable interoperability between distributed lexical resources, corpora, and linguistic annotations. The modeling is inspired by the LiLa Knowledge Base for Latin and the OntoLex-Lemon model, ensuring compatibility with existing lemma-centric knowledge graphs and enabling future cross-linguistic interoperability. Rather than relying solely on manual knowledge graph construction and significant human effort, the lemma bank has been developed through a hybrid neuro-symbolic pipeline that integrates large language models into the generation of RDF data under explicit ontological constraints. This approach combines automated generation with ontology-driven supervision and evaluation, enabling scalable yet controlled construction of structured lexical knowledge. By presenting the first steps towards the LinkEn Knowledge Base, this paper contributes both a new lemma bank for English and an experimental methodology for the semi-automatic creation of Linked Data based knowledge graphs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



