In “Automatic morphological analysis and interlinking of historical Irish cognate verb forms”, Theodorus Fransen describes a computational approach to understanding how the Irish verbal system develops diachronically. The author’s major contribution is to propose a morphological analyser for Old Irish verbs and to discuss ways this analyser can be incorporated into a framework of computational resources for various stages of Irish. This proposal dovetails with Jøhndal’s and Meelen’s chapters in dealing with ways of expanding the current computational toolset for a historical language (specifically historical stages of Irish) and in its concerns with scalability. These concerns are reflected in his detailed investigation of the challenges encountered by a methodology that incorporates finite-state morphology as it applies to Old Irish. The challenges he details are twofold. The first challenge has to do with word and morpheme division as encountered in “real” text, i.e. editions or manuscript transcriptions. In many cases, multiple morphemes may be written as a concatenated string, resulting in the need to find a way to encode licit combinatorial possibilities of multiple morphemes. This is a so-called generation problem, where generation means the ability of the analyser to generate all and only the licit inflected forms of any given stem. In other cases, whitespace is found between morphemes leading to potential parsing ambiguities since the analyser is word-based (where a word is understood to be an element between whitespace). This is a so-called analysis problem, which may result in the wrong morphological tag being assigned to any given string. The second challenge has to do with the complex interaction between phonology (especially stress) and morphology in Old Irish since stress alternations can result in syncope and the presence or absence of palatalisation of stem-final and ending-initial consonants. These challenges impinge on the choices made for implementing the finite-state transducer. For instance, does one rely on a strictly rule-based approach to specify certain licit combinations and handle stem variants induced by stress alternations, using “flag” morphemes or upper-level filters for instance to deal with the generation problem? Or does one hard-code (i.e. list) such stem variation or parts of paradigms? Fransen carefully weighs the advantages of different approaches in order to ensure the applicability of his analyser. He also envisions a fully functioning POS-tagger suitable for both Old and Middle Irish by making some suggestions for allowing interoperability of resources, especially between his morphological analyser and Dereza’s (2018) Old Irish lemmatiser.

Fransen, T., 3 Automatic morphological analysis and interlinking of historical Irish cognate verb forms, in Lash, E., Qiu, F., Stifter, D. (ed.), Morphosyntactic Variation in Medieval Celtic Languages: Corpus-Based Approaches,, De Gruyter Mouton, Berlin 2020: 49- 84. 10.1515/9783110680744-004 [https://hdl.handle.net/10807/270187]

3 Automatic morphological analysis and interlinking of historical Irish cognate verb forms

Fransen, Theodorus
2020

Abstract

In “Automatic morphological analysis and interlinking of historical Irish cognate verb forms”, Theodorus Fransen describes a computational approach to understanding how the Irish verbal system develops diachronically. The author’s major contribution is to propose a morphological analyser for Old Irish verbs and to discuss ways this analyser can be incorporated into a framework of computational resources for various stages of Irish. This proposal dovetails with Jøhndal’s and Meelen’s chapters in dealing with ways of expanding the current computational toolset for a historical language (specifically historical stages of Irish) and in its concerns with scalability. These concerns are reflected in his detailed investigation of the challenges encountered by a methodology that incorporates finite-state morphology as it applies to Old Irish. The challenges he details are twofold. The first challenge has to do with word and morpheme division as encountered in “real” text, i.e. editions or manuscript transcriptions. In many cases, multiple morphemes may be written as a concatenated string, resulting in the need to find a way to encode licit combinatorial possibilities of multiple morphemes. This is a so-called generation problem, where generation means the ability of the analyser to generate all and only the licit inflected forms of any given stem. In other cases, whitespace is found between morphemes leading to potential parsing ambiguities since the analyser is word-based (where a word is understood to be an element between whitespace). This is a so-called analysis problem, which may result in the wrong morphological tag being assigned to any given string. The second challenge has to do with the complex interaction between phonology (especially stress) and morphology in Old Irish since stress alternations can result in syncope and the presence or absence of palatalisation of stem-final and ending-initial consonants. These challenges impinge on the choices made for implementing the finite-state transducer. For instance, does one rely on a strictly rule-based approach to specify certain licit combinations and handle stem variants induced by stress alternations, using “flag” morphemes or upper-level filters for instance to deal with the generation problem? Or does one hard-code (i.e. list) such stem variation or parts of paradigms? Fransen carefully weighs the advantages of different approaches in order to ensure the applicability of his analyser. He also envisions a fully functioning POS-tagger suitable for both Old and Middle Irish by making some suggestions for allowing interoperability of resources, especially between his morphological analyser and Dereza’s (2018) Old Irish lemmatiser.
2020
Inglese
Morphosyntactic Variation in Medieval Celtic Languages: Corpus-Based Approaches,
De Gruyter Mouton
Fransen, T., 3 Automatic morphological analysis and interlinking of historical Irish cognate verb forms, in Lash, E., Qiu, F., Stifter, D. (ed.), Morphosyntactic Variation in Medieval Celtic Languages: Corpus-Based Approaches,, De Gruyter Mouton, Berlin 2020: 49- 84. 10.1515/9783110680744-004 [https://hdl.handle.net/10807/270187]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/270187
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact