INEL Nenets corpus

The annotated corpus of Nenets (< Samoyedic < Uralic) is available for online search or download under a CC-BY-NC-SA license. Corpus size in words: 45624. You will find full documentation here.

Search Download

About

The INEL Nenets Corpus has been created within the long-term INEL project headed by Prof. Dr. Beáta Wagner-Nagy, scheduled for 2016–2033.

The corpus makes possible typologically aware corpus-based grammatical research on the Nenets language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The corpus includes texts recorded in 1940–2011 in both Nenets lects, Forest Nenets and Tundra Nenets. The majority of texts in this corpus originate from published works, which are appropriately cited in the relevant sections of the metadata. In particular, the following publications were used, the full information can be found in the reference section of the documentation:

Barmich 2018
Burkova 2008
Burkova 2012
Burkova et al. 2003
Hajdú 1968
Koshkareva et al. 2007
Labanauskas 2001
Logany & Logany 2016
Lyubinskaya 2022
Pusztay 1976
Tereshchenko 1956
Tereshchenko 1990
Turutina 2003
Yangasova 2018

Svetlana Burkova kindly shared a collection of her Forest Nenets data including an original sound recording (Agan dialect), transcripts and glosses as Toolbox files and Word documents (Agan and Pur dialects), as well as published texts in Pur (Turutina 2003) and Numto (Logany & Logany 2016) dialects.

All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English, German and Russian. Audio recording is also provided for one text.

Corpus size

Forest Nenets: 80 texts, 3,709 sentences, 23,597 tokens
Tundra Nenets: 56 texts, 6,545 sentences, 37,681 tokens
Total: 136 texts, 10,254 sentences, 61,278 tokens
Total duration of audio: 44 minutes 45 seconds

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Search

The Tsakorpus search system is used for the online search. You can search by lemma (root), word form, glosses and grammatical tags. You can combine several parameters or specify a distance between search terms to make an advanced search query. You can also narrow down you search to a subcorpus. For more information, use the ❔ button at the top of the search page.

For offline search, you can download the corpus from the ZFDM Repository. A downloaded corpus can be browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Remote search with EXMARaLDA is also possible without downloading all the files (see here).

INEL Nenets corpus

About

Corpus size

Funding

Search

Other links