INEL Dolgan Corpus released
Dolgan is an endangered Turkic language of Northern Siberia. It is spoken by approximately 1,000 people on the Taymyr peninsula and in adjacent areas. Dolgan is closely related to Yakut (Sakha), but differs nevertheless in many aspects. Dolgan is in close contact with the neighboring languages Nganasan, Enets and Evenki as well as with Russian.
The corpus at hand contains both folklore and narrative texts as well as spontaneous conversations. All material is interlinearily glossed; partly annotations of Semantic Roles, Syntactic Functions, Information Status and Structure as well as Borrowing and Code-Switching are provided. Roughly half of the material is aligned to the respective sound file which makes up ca. 10 hours of Dolgan speech in total.
The INEL Dolgan corpus is composed of texts from different sources:
1. Published folklore texts from an edited volume (“Fol’klor Dolgan”, P.E. Efremov 2000),
2. Transcripts of recordings provided by the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s),
3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010,
4. Transcripts of recordings made on a fieldwork trip in 2017.
Accessing the corpus
The data in the corpora (annotated texts as well as corresponding metadata) are represented in XML formats of the freely distributed EXMARaLDA suite (http://exmaralda.org/en/).
User documentation (in English) is available here: INEL_Dolgan_Corpus.pdf
For browsing (and playback) of individual texts, use «Sessions» tab on the main corpus page. Each text can be viewed in one of three online formats (e.g. Visualizations: Score) and downloaded in EXB (an EXMARaLDA format). The sources of texts, i.e. scanned pages (PDF) or sound files (WAV, MP3) can also be viewed/downloaded.
For searching across the whole corpus, the complete archive of the corpus files can be downloaded and searched with the EXAKT program of the EXMARaLDA suite.
Furthermore, in the next few weeks, an online search interface will be launched, based on the Tsakonian Corpus Platform (Tsakorpus).
Please send your comments and suggestions to: firstname.lastname@example.org.