This paper describes the creation of a workbench tool designed to make technologies developed throughout the lifespan of the Cardamom project easily accessible to researchers who could most benefit from them, but who may not have the technical expertise to apply bleeding edge technologies to their own datasets. The workbench provides an intuitive graphical user interface (GUI) and workflow which abstract users away from underlying technical tasks, while providing them with a suite of powerful NLP tools developed by the Cardamom team. These include tokenisers, POS-taggers, various annotation tools, and ML models. The performance of workbench tools can be improved as text and annotations are added by users. It is envisioned that this workbench will provide a simple route to digital publication for academics in the humanities, or more specifically, for linguists working with under-resourced or historical languages, who have collected text data but are unable to make it available online as a result of financial or technical restraints. This has the added benefit of increasing the availability of high quality, annotated text data to NLP researchers, thereby providing value to both communities of researchers.

Doyle, A., Fransen, T., Stearns, B., Mccrae, J. P., Dereza, O., Rani, P., The Cardamom Workbench for Historical and Under-Resourced Languages, in Proceedings of the 4th Conference on Language, Data and Knowledge, (Vienna, Austria, 12-15 September 2023), NOVA CLUNL, Portugal, Lisbon 2023: 109-120 [https://hdl.handle.net/10807/270174]

The Cardamom Workbench for Historical and Under-Resourced Languages

Fransen, Theodorus;
2023

Abstract

This paper describes the creation of a workbench tool designed to make technologies developed throughout the lifespan of the Cardamom project easily accessible to researchers who could most benefit from them, but who may not have the technical expertise to apply bleeding edge technologies to their own datasets. The workbench provides an intuitive graphical user interface (GUI) and workflow which abstract users away from underlying technical tasks, while providing them with a suite of powerful NLP tools developed by the Cardamom team. These include tokenisers, POS-taggers, various annotation tools, and ML models. The performance of workbench tools can be improved as text and annotations are added by users. It is envisioned that this workbench will provide a simple route to digital publication for academics in the humanities, or more specifically, for linguists working with under-resourced or historical languages, who have collected text data but are unable to make it available online as a result of financial or technical restraints. This has the added benefit of increasing the availability of high quality, annotated text data to NLP researchers, thereby providing value to both communities of researchers.
2023
Inglese
Proceedings of the 4th Conference on Language, Data and Knowledge
4th Conference on Language, Data and Knowledge
Vienna, Austria
12-set-2023
15-set-2023
978-989-54081-5-3
NOVA CLUNL, Portugal
Doyle, A., Fransen, T., Stearns, B., Mccrae, J. P., Dereza, O., Rani, P., The Cardamom Workbench for Historical and Under-Resourced Languages, in Proceedings of the 4th Conference on Language, Data and Knowledge, (Vienna, Austria, 12-15 September 2023), NOVA CLUNL, Portugal, Lisbon 2023: 109-120 [https://hdl.handle.net/10807/270174]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/270174
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact