This article provides an overview of the history of linguistic corpora, i.e., collections of written or spoken texts used for linguistic research. Starting from the earliest concordances and lexicographical efforts during the pre-electronic era, the article details the birth of modern linguistic corpora in the mid-20th century and the subsequent creation of large-scale corpora made possible by technological advancements. The exploitation of annotated corpora by supervised machine learning algorithms for natural language processing is described, as well as the use of huge raw corpora for building Large Language Models in unsupervised fashion.

Passarotti, M. C., Voce "Linguistic Corpora, History of", in International Encyclopedia of Language and Linguistics, Elsevier, London (UK) 2026:1 692-696. https://dx.doi.org/10.1016/B978-0-323-95504-1.00333-1 [https://hdl.handle.net/10807/339730]

Linguistic Corpora, History of

Passarotti, Marco Carlo
2026

Abstract

This article provides an overview of the history of linguistic corpora, i.e., collections of written or spoken texts used for linguistic research. Starting from the earliest concordances and lexicographical efforts during the pre-electronic era, the article details the birth of modern linguistic corpora in the mid-20th century and the subsequent creation of large-scale corpora made possible by technological advancements. The exploitation of annotated corpora by supervised machine learning algorithms for natural language processing is described, as well as the use of huge raw corpora for building Large Language Models in unsupervised fashion.
2026
Inglese
International Encyclopedia of Language and Linguistics
9780323955041
Elsevier
Passarotti, M. C., Voce "Linguistic Corpora, History of", in International Encyclopedia of Language and Linguistics, Elsevier, London (UK) 2026:1 692-696. https://dx.doi.org/10.1016/B978-0-323-95504-1.00333-1 [https://hdl.handle.net/10807/339730]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/339730
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact