Tekstaro de Esperanto
The Tekstaro de Esperanto (Corpus of Esperanto) is a text corpus of the Esperanto language, a large collection of very diverse texts for linguistic research on Esperanto. As of January 2019[update], the corpus has texts with a total of 5,177,208 words.[1] It is searchable by regular expressions, including custom search terms that are lexical (e.g., sequences of Esperanto letters) and grammatical (e.g., active participial suffixes, passive participial suffixes, adjectival suffixes, etc.).[2]
History
In 2002 the Esperantic Studies Foundation (ESF) started the project to support linguistic study of Esperanto. ESF hired Bertilo Wennergren to plan and create the first phase of the project, which finished at the end of April 2003. Wennergren was aided by Ilona Koutny, Jouko Lindstedt, Carlo Minnaja, Christopher Gledhill, and Mauro La Torre.
In 2006 planning of the Parola tekstaro de Esperanto (Speech corpus of Esperanto) was started.
References
External links
- Official website (in Esperanto)
- Interview with Bertilo Wennergren about the Tekstaro de Esperanto in Libera Folio (in Esperanto)
- v
- t
- e
English
- American National Corpus
- Bank of English
- Bergen Corpus of London Teenage Language
- British National Corpus
- Brown Corpus
- Buckeye Corpus
- Cambridge English Corpus
- Corpus of Contemporary American English
- Enron Corpus
- EnTenTen
- International Corpus of English
- Lancaster-Oslo-Bergen Corpus
- Oxford English Corpus
- PropBank
- Spoken English Corpus
- Switchboard Telephone Speech Corpus
- TIMIT
- VerbNet
- Wellington Corpus of Spoken New Zealand English
non-English
- Bijankhan Corpus
- CHILDES
- CorCenCC National Corpus of Contemporary Welsh
- Croatian Language Corpus
- Croatian National Corpus
- Czech National Corpus
- Europarl Corpus
- German Reference Corpus
- Hamshahri Corpus
- National Corpus of Polish
- Neo-Assyrian Text Corpus Project
- Persian Speech Corpus
- Quranic Arabic Corpus
- Russian National Corpus
- Scottish Corpus of Texts and Speech
- Slovenian National Corpus
- TalkBank
- Tatoeba
- Tehran Monolingual Corpus
- Tekstaro de Esperanto
- TenTen Corpus Family
- Thesaurus Linguae Graecae
This Esperanto-related article is a stub. You can help Wikipedia by expanding it. |
- v
- t
- e