IRIS UniCatt

This paper presents the initial construction of a lemma bank for Ancient Greek, developed according to the Linked Data principles. The need for interoperable linguistic infrastructures capable of supporting interoperability among historical variation, divergent annotation practices, and resource-specific lemmatisation conventions was highlighted by the increasing availability of digital linguistic resources. This lemma bank is developed as the core component of the Linking Greek knowledge base and inspired by the architecture of the LiLa project for Latin. The proposed lemma bank adopts a descriptive, lemma-centric approach that preserves alternative canonical forms and dialectal variation while enabling consistent linking across lexical and semantic resources. Its population combines data extracted from the Ancient Greek WordNet and the Liddell–Scott–Jones lexicon, integrating semantic structure, part-of-speech information, and lexicographically encoded gender assignment. Additional normalisation steps, including the rule-based correction of closed-class part-of-speech categories and harmonisation to the Universal Dependencies tagset, were applied to improve consistency and computational usability. The resulting dataset provides a foundation for interlinking corpora, lexica, and NLP tools for Ancient Greek within a Linked Open Data framework.

Swaelens, C., Mambrini, F., Passarotti, M. C., From Lemmas to Links: A Lemma Bank for Ancient Greek, in Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @LREC 2026, (Palma De Mallorca, 11-11 May 2026), European Language Resources Association (ELRA), Palma de Mallorca 2026: 106-111 [https://hdl.handle.net/10807/335482]

From Lemmas to Links: A Lemma Bank for Ancient Greek

Swaelens Colin;Mambrini, Francesco;Passarotti, Marco Carlo

2026

Abstract

This paper presents the initial construction of a lemma bank for Ancient Greek, developed according to the Linked Data principles. The need for interoperable linguistic infrastructures capable of supporting interoperability among historical variation, divergent annotation practices, and resource-specific lemmatisation conventions was highlighted by the increasing availability of digital linguistic resources. This lemma bank is developed as the core component of the Linking Greek knowledge base and inspired by the architecture of the LiLa project for Latin. The proposed lemma bank adopts a descriptive, lemma-centric approach that preserves alternative canonical forms and dialectal variation while enabling consistent linking across lexical and semantic resources. Its population combines data extracted from the Ancient Greek WordNet and the Liddell–Scott–Jones lexicon, integrating semantic structure, part-of-speech information, and lexicographically encoded gender assignment. Additional normalisation steps, including the rule-based correction of closed-class part-of-speech categories and harmonisation to the Universal Dependencies tagset, were applied to improve consistency and computational usability. The resulting dataset provides a foundation for interlinking corpora, lexica, and NLP tools for Ancient Greek within a Linked Open Data framework.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2026
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @LREC 2026
			
	Denominazione evento
	
				Fourth Workshop on Language Technologies for Historical and Ancient Languages
			
	Luogo dell'evento
	
				Palma De Mallorca
			
	Data inizio evento
	
				11-mag-2026
			
	Data fine evento
	
				11-mag-2026
			
	ISBN del volume
	
				978-2-493814-58-6
			
	Editore
	
				European Language Resources Association (ELRA)
			
	URL alternativo
	
				http://lrec-conf.org/proceedings/lrec2026/workshops/lt4hala/2026.lt4hala-1.0.pdf
			
	Citazione
	
				Swaelens, C., Mambrini, F., Passarotti, M. C.,  From Lemmas to Links: A Lemma Bank for Ancient Greek, in Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @LREC 2026, (Palma De Mallorca,  11-11 May 2026), European Language Resources Association (ELRA), Palma de Mallorca 2026: 106-111 [https://hdl.handle.net/10807/335482]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/335482

Citazioni

ND

ND

ND

social impact