In this paper, we investigate the value of derivational information in predicting the inflectional behavior of lexemes. We focus on Latin, for which large-scale data on both inflection and derivation are easily available. We train boosting tree classifiers to predict the inflection class of verbs and nouns with and without different pieces of derivational information. For verbs, we also model inflectional behavior in a word-based fashion, training the same type of classifier to predict wordforms given knowledge of other wordforms of the same lexemes. We find that derivational information is indeed helpful, and document an asymmetry between the beginning and the end of words, in that the final element in a word is highly predictive, while prefixes prove to be uninformative. The results obtained with the word-based methodology also allow for a finer-grained description of the behavior of different pairs of cells.
Bonami, O., Pellegrini, M., Derivation predicting inflection: A quantitative study of the relation between derivational history and inflectional behavior in Latin, <<STUDIES IN LANGUAGE>>, 2022; 46 (4): 753-792. [doi:10.1075/sl.21002.bon] [https://hdl.handle.net/10807/193867]
Derivation predicting inflection: A quantitative study of the relation between derivational history and inflectional behavior in Latin
Pellegrini, Matteo
2022
Abstract
In this paper, we investigate the value of derivational information in predicting the inflectional behavior of lexemes. We focus on Latin, for which large-scale data on both inflection and derivation are easily available. We train boosting tree classifiers to predict the inflection class of verbs and nouns with and without different pieces of derivational information. For verbs, we also model inflectional behavior in a word-based fashion, training the same type of classifier to predict wordforms given knowledge of other wordforms of the same lexemes. We find that derivational information is indeed helpful, and document an asymmetry between the beginning and the end of words, in that the final element in a word is highly predictive, while prefixes prove to be uninformative. The results obtained with the word-based methodology also allow for a finer-grained description of the behavior of different pairs of cells.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.