INEL Kamas corpus

The annotated corpus of Kamas (< Samoyedic < Uralic) is available for online search or download under a CC-BY-NC-SA license. Corpus size in words: 63806. You will find full documentation here.

Search Download

About

The INEL Kamas Corpus has been created within the long-term INEL project headed by Prof. Dr. Beáta Wagner-Nagy, scheduled for 2016–2033.

The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status.

New in release 2.0

In texts from Donner’s collection, phonetic transcription according to Klumpp’s edition of Donner’s manuscripts has been added (as stl tier)
Five texts which were originally split between different tapes have been merged, as well as respective parts of recordings. Sentences in each resulting text are numbered throughout
- PKZ_196X_Alenushka_flk + PKZ_196X_Alenushka_continuation_flk > PKZ_196X_Alenushka_flk
- End of PKZ_196X_SU0226 starting from PKZ_196X_SU0226.203 (210) + PKZ_196X_Alenushka2_continuation_flk > PKZ_196X_Alenushka2_flk
- PKZ_196X_BlacksmithAndMerchant_flk + PKZ_196X_BlacksmithAndMerchant_cont_flk > PKZ_196X_BlacksmithAndMerchant_flk
- PKZ_196X_Finist_flk + PKZ_196X_Finist_continuation_flk > PKZ_196X_Finist_flk
- PKZ_196X_StupidWolf_flk + PKZ_196X_StupidWolf_continuation_flk > PKZ_196X_StupidWolf_flk
Part of the texts are now annotated for existential, locative and possessive predication (ExLocPoss tier, by C.L. Däbritz)
Numerous corrections in glosses, other annotations and transcriptions, including:
- Fuller and more consistent transcription, glossing and annotations of borrowings
- Vowel length is marked in mp tier in baːzoʔ ‘again’, büːzʼe ‘man’ and saːgər ‘black’
- Corrections in disambiguation of polysemous or homonymous morphemes: -ziʔ "INS"/"COM", -də "LAT"/"3SG", mo- "can/become/want | мочь/стать/хотеть"
- Possessive suffix unmarked for case: "NOM/GEN/ACC" > "POSS"
- Glosses for personal pronouns were changed to uniform labels: "I | я" > "PRO1SG", "we | мы" > "PRO1PL", "you | ты" > "PRO2SG", "you.PL | вы" > "PRO2PL"
- Fuller annotations of code-switching and calques (CS tier)
Added ELAN *.eaf as a supplementary end-user file format for all transcripts

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements

Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA).
Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS).
Scanned pages from the Kai Donner’s Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society.

Search

The Tsakorpus search system is used for the online search. You can search by lemma (root), word form, glosses and grammatical tags. You can combine several parameters or specify a distance between search terms to make an advanced search query. You can also narrow down you search to a subcorpus. For more information, use the ❔ button at the top of the search page.

For offline search, you can download the corpus from the ZFDM Repository. A downloaded corpus can be browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Remote search with EXMARaLDA is also possible without downloading all the files (see here).