INEL Ressourcen Portal

Cite Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2023. “INEL Kamas Corpus.” Version 2.0. Publication date 2023-12-31. http://hdl.handle.net/11022/0000-0007-FC25-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1.

Kamassisch

Kamas

Das Korpus beinhaltet Audioaufnahmen sowie deren zeit-alignierte, annotierte und glossierte Transkripte des Kamassischen und ist in der Version 2.0 unter Open-Access-Bedingungen zugänglich. Es entstand im Rahmen des INEL-Teilprojekts Kamassisch.
Das INEL-Kamas-Korpus besteht aus zwei Teilen: Folkloretexte, die von Kai Donner in den Jahren 1912-1914 gesammelt wurden, und transkribierte Tonaufnahmen der letzten Kamas-Sprecherin, Klavdiya Plotnikova, die zwischen 1964 und 1970 gemacht wurden. Jeder Text des Korpus ist mit morphologischer Glossierung, Übersetzungen ins Englische, Russische und Deutsche sowie mit Annotationen zu syntaktischen Funktionen, semantischen Rollen, russischen Entlehnungen und Code-Switching versehen. Einige Texte sind auch mit Annotationen zum Informationsstatus versehen.

Zugang zu dem Korpus über das Zentrum für nachhaltiges Forschungsdatenmanagement der Universität Hamburg:

Download des gesamte Korpus als Zip-Archiv incl. WAV/MP3 Audio (4,1 GB), nur MP3 Audio (492,7 MB) und ohne Audio (84,9 MB)
Online Suche mithilfe von Tsakorpus
Vollständige Übersicht über Korpusinhalte (oder siehe unten)
Dokumentation
Remote Suche mithilfe des Suchwerkzeugs EXAKT (EXMARALDA Anaylse- und Konkordanzprogramm)

Neu in Version 2.0:

Bei Texten aus Donners Sammlung wurde die phonetische Transkription nach Klumpps Edition von Donners Manuskripten hinzugefügt (als stl Spur)
Fünf Texte, die ursprünglich auf verschiedene Bänder aufgeteilt waren, wurden zusammengeführt, ebenso wie die jeweiligen Teile der Aufnahmen. Die Sätze in jedem der so entstandenen Texte sind durchgehend nummeriert:
- PKZ_196X_Alenushka_flk
- PKZ_196X_Alenushka2_flk
- PKZ_196X_BlacksmithAndMerchant_flk
- PKZ_196X_Finist_flk
- PKZ_196X_StupidWolf_flk
Ein Teil der Texte ist nun für existentielle, locative und possessive Prädikation annotiert (ExLocPoss-Spur, von C.L. Däbritz)
Zahlreiche Korrekturen in Glossen, anderen Annotationen und Transkriptionen, darunter:
- Vollständigere und konsistentere Transkription, Glossierung und Kommentierung von Entlehnungen
- Markierung der Vokallänge in mp-Spur in baːzoʔ 'again', büːzʼe 'man' und saːgər 'black'
- Korrekturen bei der Disambiguierung polysemer oder homonymer Morpheme: -ziʔ "INS"/"COM", -də "LAT"/"3SG", mo- "can/become/want | мочь/стать/хотеть"
- Possessivsuffix ohne Kennzeichnung des Kasus: "NOM/GEN/ACC" > "POSS"
- Die Glossen für Personalpronomen wurden auf einheitliche Bezeichnungen umgestellt: "I | я" > "PRO1SG", "we | мы" > "PRO1PL", "you | ты" > "PRO2SG", "you.PL | вы" > "PRO2PL"
- Ausführlichere Anmerkungen zu Code-Switching und Calques (CS-Spur)
Hinzufügung von ELAN *.eaf als zusätzliches Endbenutzer-Dateiformat für alle Transkripte

The corpus includes audio recordings along with their time-aligned, annotated, and glossed transcripts in Kamas. It is accessible in version 2.0 under Open-Access conditions. It was created as part of the INEL sub-project Kamas.
The INEL Kamas Corpus consists of two parts: folklore texts collected by Kai Donner from 1912-1914 and transcribed audio recordings of the last Kamas speaker, Klavdiya Plotnikova, made between 1964 and 1970. Each corpus text is provided with a morphological glossing, translation into English, Russian, and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings, and code-switching. Some texts also include annotations for information status.

Access to the corpus via the Center for sustainable research data management:

Download the entire corpus as a zip archive including WAV/MP3 audio (4.1 GB), only MP3 audio (492.7 MB), and without audio (84.9 MB)
Online search using Tsakorpus platform
Complete overview of corpus contents (or see below)
Documentation
Remote search using the concordance tool EXAKT (EXMARALDA Analysis and Concordance Program)

New in Version 2.0:

In texts from Donner’s collection, phonetic transcription according to Klumpp|s edition of Donner’s manuscripts has been added (as stl tier)
Five texts which were originally split between different tapes have been merged, as well as respective parts of recordings. Sentences in each resulting text are numbered throughout:
- PKZ_196X_Alenushka_flk
- PKZ_196X_Alenushka2_flk
- PKZ_196X_BlacksmithAndMerchant_flk
- PKZ_196X_Finist_flk
- PKZ_196X_StupidWolf_flk
Part of the texts is now annotated for existential, locative, and possessive predication (ExLocPoss tier, by C.L. Däbritz)
Numerous corrections in glosses, other annotations, and transcriptions, including:
- Fuller and more consistent transcription, glossing and annotations of borrowings
- Vowel length is marked in mp tier in baːzoʔ ‘again’, büːzʼe ‘man’ and saːgər ‘black’
- Corrections in disambiguating polysemous or homonymous morphemes: -ziʔ "INS"/"COM," -də "LAT"/"3SG," mo- "can/become/want | мочь/стать/хотеть"
- Possessive suffix unmarked for case: "NOM/GEN/ACC" > "POSS"
- Glosses for personal pronouns were changed to uniform labels: "I | я" > "PRO1SG", "we | мы" > "PRO1PL", "you | ты" > "PRO2SG", "you.PL | вы" > "PRO2PL"
- Fuller annotations of code-switching and calques (CS tier)
Added ELAN *.eaf as a supplementary end-user file format for all transcripts

Download Online (Tsakorpus)