While Old Irish (c. 600–900 A.D.) is extensively documented, it remains digitally under- resourced, lacking the range of digital resources available for other older Indo-European languages (e.g., Latin, see Pellegrini and Passarotti, 2018). We report on the development of a fully inflected lexicon of Old Irish nouns, provided in both phonemic and orthographic notation. This involved a computer-assisted, systematic, and reproducible grapheme-to- phoneme conversion pipeline and generating morphological forms through a finite-state transducer. The inflected lexicon we develop will better enable computational studies in Old Irish morphology, further research into diachronic developments, and have a wide range of Natural Language Processing (NLP) applications. We began by extracting noun lemmata from the Old Irish Würzburg glosses (Kavanagh, 2001) and the Corpus PalaeoHibernicum (CorPH) ‘Old Irish Corpus’ (Stifter et al., 2021). We then devised a set of rules for orthography-to-phonology conversion, subsequently implemented using the Python package Epitran (Mortensen, Dalmia, and Littell, 2018). The resulting transcriptions act as the data input for a finite-state transducer (FST) adapted from Fransen (2019), allowing us to generate inflected forms of Old Irish nouns. Finally, we derived orthographic forms (and their variants) by applying conversion rules to the generated forms. Old Irish presents considerable challenges for the development of a resource of this nature, given its opaque and inconsistent orthography, complex phonology, elaborate system of morphophonological alternations, and intricate patterns of morphological inflection (Anderson, 2016; Stifter, 2009; Thurneysen, 1946; Pedersen, 1909–1913). We report on how we dealt with these problems in the development of the inflectional lexicon. While this study focused on the Old Irish nouns in the Würzburg glosses, we intend to extend the lexicon by applying this pipeline to further corpora and other parts- of-speech. This inflected lexicon makes possible systematic studies in data-driven morphology and typology (Pellegrini, 2020; Beniamine, Bonami, and Luís, 2021; Beniamine, 2021), and facilitates future research into diachronic and diatopic variation in Irish and the development of further NLP applications for the language. References Anderson, Cormac (2016). “Consonant colour and vocalism in the history of Irish”. PhD thesis. Uniwersytet im. Adama Mickiewicza w Poznaniu. URL: https://hdl.handle.net/10593/14780. Beniamine, Sacha (2021). “One lexeme, many classes: inflection class systems as lattices”. In: One-to-Many Relations. Ed. by Berthold Crysmann and Manfred Sailer. Berlin: Language Science Press. Beniamine, Sacha, Olivier Bonami, and Ana R. Luís (2021). “The fine implicative structure of European Portuguese conjugation”. In: Isogloss 7.9, pp. 1–35. DOI: https://doi.org/10.5565/rev/isogloss.109. Fransen, Theodorus (2019). “Past, present and future: Computational approaches to mapping historical Irish cognate verb forms”. PhD thesis. Trinity College Dublin, The University of Dublin. URL: https://github.com/ThFransen84/OIfst. Kavanagh, Séamus (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul. Ed. by Dagmar S. Wodtko. Mitteilungen der Prähistorischen Kommission 45. + 1 CD-ROM. Wien: Verlag der Österreichischen Akademie der Wissenschaften. DOI: 10.1553/0x0001fb6e. Mortensen, David R., Siddharth Dalmia, and Patrick Littell (May 2018). “Epitran: Precision G2P for Many Languages”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Ed. by Nicoletta Calzolari (Conference chair) et al. Miyazaki, Japan: European Language Resources Association (ELRA). Pedersen, Holger (1909–1913). Vergleichende Grammatik der keltischen Sprachen. 2 Vols. Göttingen: Vandenhoeck & Ruprecht. Pellegrini, Matteo (2020). “Using LatInfLexi for an Entropy-Based Assessment of Predictability in Latin Inflection”. English. In: Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages. Marseille, France: European Language Resources Association (ELRA), pp. 37–46. URL: https://aclanthology.org/2020.lt4hala-1.6. Pellegrini, Matteo and Marco Passarotti (2018). “LatInfLexi: an Inflected Lexicon of Latin Verbs”. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018) (Turin, Italy, Dec. 10, 2018). Ed. by Elena Cabrio, Alessandro Mazzei, and Fabio Tamburini. Vol. 2253. CEUR Workshop Proceedings. Aachen. URL: http://ceur-ws.org/Vol-2253/paper23.pdf. Stifter, David (2009). “Early Irish”. In: The Celtic Languages. Ed. by Martin Ball and Nicole Müller. Hoboken: Routledge. Stifter, David et al. (2021). Corpus PalaeoHibernicum (CorPH) v1.0. URL: http://chronhib.maynoothuniversity.ie. Thurneysen, Rudolf (1946). A Grammar of Old Irish. Trans. by Daniel A. Binchy and Osborn Bergin. Revised and enlarged edition. Dublin: Dublin Institute for Advanced Studies. Repr. 1993, with supplement.

Anderson, C., Fransen, T., Beniamine, S., Developing an inflectional lexicon for Old Irish, Abstract de <<International Congress of Celtic Studies XVII Utrecht>>, (Utrecht, 24-28 July 2023 ), N/A, Utrecht 2023: 25-26 [https://hdl.handle.net/10807/270192]

Developing an inflectional lexicon for Old Irish

Fransen, Theodorus;
2023

Abstract

While Old Irish (c. 600–900 A.D.) is extensively documented, it remains digitally under- resourced, lacking the range of digital resources available for other older Indo-European languages (e.g., Latin, see Pellegrini and Passarotti, 2018). We report on the development of a fully inflected lexicon of Old Irish nouns, provided in both phonemic and orthographic notation. This involved a computer-assisted, systematic, and reproducible grapheme-to- phoneme conversion pipeline and generating morphological forms through a finite-state transducer. The inflected lexicon we develop will better enable computational studies in Old Irish morphology, further research into diachronic developments, and have a wide range of Natural Language Processing (NLP) applications. We began by extracting noun lemmata from the Old Irish Würzburg glosses (Kavanagh, 2001) and the Corpus PalaeoHibernicum (CorPH) ‘Old Irish Corpus’ (Stifter et al., 2021). We then devised a set of rules for orthography-to-phonology conversion, subsequently implemented using the Python package Epitran (Mortensen, Dalmia, and Littell, 2018). The resulting transcriptions act as the data input for a finite-state transducer (FST) adapted from Fransen (2019), allowing us to generate inflected forms of Old Irish nouns. Finally, we derived orthographic forms (and their variants) by applying conversion rules to the generated forms. Old Irish presents considerable challenges for the development of a resource of this nature, given its opaque and inconsistent orthography, complex phonology, elaborate system of morphophonological alternations, and intricate patterns of morphological inflection (Anderson, 2016; Stifter, 2009; Thurneysen, 1946; Pedersen, 1909–1913). We report on how we dealt with these problems in the development of the inflectional lexicon. While this study focused on the Old Irish nouns in the Würzburg glosses, we intend to extend the lexicon by applying this pipeline to further corpora and other parts- of-speech. This inflected lexicon makes possible systematic studies in data-driven morphology and typology (Pellegrini, 2020; Beniamine, Bonami, and Luís, 2021; Beniamine, 2021), and facilitates future research into diachronic and diatopic variation in Irish and the development of further NLP applications for the language. References Anderson, Cormac (2016). “Consonant colour and vocalism in the history of Irish”. PhD thesis. Uniwersytet im. Adama Mickiewicza w Poznaniu. URL: https://hdl.handle.net/10593/14780. Beniamine, Sacha (2021). “One lexeme, many classes: inflection class systems as lattices”. In: One-to-Many Relations. Ed. by Berthold Crysmann and Manfred Sailer. Berlin: Language Science Press. Beniamine, Sacha, Olivier Bonami, and Ana R. Luís (2021). “The fine implicative structure of European Portuguese conjugation”. In: Isogloss 7.9, pp. 1–35. DOI: https://doi.org/10.5565/rev/isogloss.109. Fransen, Theodorus (2019). “Past, present and future: Computational approaches to mapping historical Irish cognate verb forms”. PhD thesis. Trinity College Dublin, The University of Dublin. URL: https://github.com/ThFransen84/OIfst. Kavanagh, Séamus (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul. Ed. by Dagmar S. Wodtko. Mitteilungen der Prähistorischen Kommission 45. + 1 CD-ROM. Wien: Verlag der Österreichischen Akademie der Wissenschaften. DOI: 10.1553/0x0001fb6e. Mortensen, David R., Siddharth Dalmia, and Patrick Littell (May 2018). “Epitran: Precision G2P for Many Languages”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Ed. by Nicoletta Calzolari (Conference chair) et al. Miyazaki, Japan: European Language Resources Association (ELRA). Pedersen, Holger (1909–1913). Vergleichende Grammatik der keltischen Sprachen. 2 Vols. Göttingen: Vandenhoeck & Ruprecht. Pellegrini, Matteo (2020). “Using LatInfLexi for an Entropy-Based Assessment of Predictability in Latin Inflection”. English. In: Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages. Marseille, France: European Language Resources Association (ELRA), pp. 37–46. URL: https://aclanthology.org/2020.lt4hala-1.6. Pellegrini, Matteo and Marco Passarotti (2018). “LatInfLexi: an Inflected Lexicon of Latin Verbs”. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018) (Turin, Italy, Dec. 10, 2018). Ed. by Elena Cabrio, Alessandro Mazzei, and Fabio Tamburini. Vol. 2253. CEUR Workshop Proceedings. Aachen. URL: http://ceur-ws.org/Vol-2253/paper23.pdf. Stifter, David (2009). “Early Irish”. In: The Celtic Languages. Ed. by Martin Ball and Nicole Müller. Hoboken: Routledge. Stifter, David et al. (2021). Corpus PalaeoHibernicum (CorPH) v1.0. URL: http://chronhib.maynoothuniversity.ie. Thurneysen, Rudolf (1946). A Grammar of Old Irish. Trans. by Daniel A. Binchy and Osborn Bergin. Revised and enlarged edition. Dublin: Dublin Institute for Advanced Studies. Repr. 1993, with supplement.
2023
Inglese
International Congress of Celtic Studies XVII Utrecht
International Congress of Celtic Studies XVII Utrecht
Utrecht
24-lug-2023
28-lug-2023
N/A
Anderson, C., Fransen, T., Beniamine, S., Developing an inflectional lexicon for Old Irish, Abstract de <<International Congress of Celtic Studies XVII Utrecht>>, (Utrecht, 24-28 July 2023 ), N/A, Utrecht 2023: 25-26 [https://hdl.handle.net/10807/270192]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/270192
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact