INEL Kalmyk corpus

The annotated corpus of Kalmyk (< Mongolic) is available for online search or download under a CC-BY-NC-SA license. Corpus size in words: 19742. You will find full documentation here.

Search Download

About

The INEL Kalmyk Corpus has been created within the long-term INEL project headed by Prof. Dr. Beáta Wagner-Nagy, scheduled for 2016–2033.

The corpus makes possible typologically aware corpus-based grammatical research on the Kalmyk language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The corpus consists of transcribed audio recordings collected in the Republic of Kalmykia between 2007 and 2018 in the Ketchenerovsky District (Derbet and Torgut dialect).

All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them.

Corpus Size

The corpus contains 55 texts, 2,076 sentences, and 19,742 tokens. The total duration of the audio recordings is 4 hours and 23 minutes.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions / Acknowledgements

Native speakers generously shared their knowledge of Kalmyk, making the creation of this corpus possible. Zamira Xejchieva and Galina Cabdy`rova assisted with oral transcription and the Russian translation of the audio materials.

Part of the materials were recorded during joint expeditions of St. Petersburg University and the Institute for Linguistic Studies of the Russian Academy of Sciences in 2007–2008, under the direction of Elena Perekhvalskaya and Sergey Say.

This corpus primarily follows the transcription system and partially adopts the glossing conventions developed by a research team led by Sergey Say, with input from other expedition participants.

Search

The Tsakorpus search system is used for the online search. You can search by lemma (root), word form, glosses and grammatical tags. You can combine several parameters or specify a distance between search terms to make an advanced search query. You can also narrow down you search to a subcorpus. For more information, use the ❔ button at the top of the search page.

For offline search, you can download the corpus from the ZFDM Repository. A downloaded corpus can be browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Remote search with EXMARaLDA is also possible without downloading all the files (see here).