Weitere Ressourcen

Further resources

Die folgende Zusammenstellung einschlägiger digitaler Sprachressourcen wird kontinuierlich erweitert.

Es fehlt etwas? Teilen Sie uns gerne mit (inel@uni-hamburg.de) welche Referenz wir noch aufnehmen können.

The following compilation of relevant digital language resources is continuously being expanded.

Something is missing? Please let us know (inel@uni-hamburg.de) which reference we can still include.

further resources
Author Title Year Abstract DOI/URL BibTeX
Arkhangelskiy, T. Komi-Zyrian corpora 2019
Currently, two corpora are available: the corpus of contemporary written literary Komi-Zyrian (“the Main corpus”) and the corpus of social media in Komi-Zyrian.
      author = {Arkhangelskiy, Timofey},
      title = {Komi-Zyrian corpora},
      year = {2019},
      url = {http://komi-zyrian.web-corpora.net/index_en.html}


Arkhangelskiy, T. Meadow Mari corpora 2019
      author = {Timofey Arkhangelskiy},
      title = {Meadow Mari corpora},
      year = {2019},
      url = {http://meadow-mari.web-corpora.net/index_en.html}


Arkhangelskiy, T. Erzya corpora 2018
      author = {Timofey Arkhangelskiy},
      title = {Erzya corpora},
      year = {2018},
      url = {http://erzya.web-corpora.net/index_en.html}


Arkhangelskiy, T. Moksha corpora 2018
Currently, two corpora are available: the corpus of contemporary written literary Moksha (“the Main corpus”) and the corpus of Moksha-language social media and forums. They differ in what kind of texts the contain, but have mostly identical annotation and search capabilities.
      author = {Timofey Arkhangelskiy},
      title = {Moksha corpora},
      year = {2018},
      url = {http://moksha.web-corpora.net/index_en.html}


Bergmann, M., Lublinskaya, M. and Sherstinova, T. Nenets Multimedia Phrasebook 2003
      author = {Markus Bergmann, Marina Lublinskaya, Tatiana Sherstinova},
      title = {Nenets Multimedia Phrasebook},
      year = {2003},
      url = {https://iling.spb.ru/nord/materia/nenec_phrasebook/introduction.html}


Blokland, R., Chuprov, V., Fedina, M., Lev-chenko, D., Partanen, N. and Rießler, M. Spoken Komi Corpus.The Language Bank of Finland, forthcoming version 0.1. 2021
      author = {Blokland, Rogier and Chuprov, Vasily and Fedina, Marina and Lev-chenko, Dmitry and Partanen, Niko and Rießler, Michael},
      title = {Spoken Komi Corpus.The Language Bank of Finland, forthcoming version 0.1.},
      year = {2021},
      url = {http://urn.fi/urn:nbn:fi:lb-2019121603}


Blokland, R., Chuprov, V., Levchenko, D., Fedina, M., Partanen, N. and Rießler, M. Komi media collection 2016
      author = {Rogier Blokland and Chuprov, Vasilij and Levchenko, Dmitrij and Fedina, Maria and Partanen, Niko and Rießler, Michael},
      title = {Komi media collection},
      year = {2016},
      url = {http://videocorpora.ru}


Blokland, R., Partanen, N. and Rießler, M. SpokenKomi Corpus: Erik Vászolyi. Zenodo data repository 2021
      author = {Blokland, Rogier and Partanen, Niko and Rießler, Michael},
      title = {SpokenKomi Corpus: Erik Vászolyi. Zenodo data repository},
      year = {2021},
      doi = {https://doi.org/10.5281/zenodo.4591281}


Bradley, J. Hill Mari Dictionary 2021
      author = {Bradley, Jeremy},
      title = {Hill Mari Dictionary},
      year = {2021},
      url = {https://www.univie.ac.at/maridict/site-2014/hill-dict-prop.php}


Bradley, J. Mari textbook 2021
This Mari textbook is an English-language version of the 1990/1991 Russian-language textbook Марийский язык для всех, Volumes I & II, extensively adapted to suit the needs of self-learners. Accompanying audio materials and other supplementary resources are provided as well.
      author = {Bradley, Jeremy},
      title = {Mari textbook},
      year = {2021},
      url = {https://www.univie.ac.at/maridict/site-2014/book.php?int=0}


Bradley, J. Mari-English dictionary 2021
A Mari-English dictionary with 42,560 headwords and 82,740 subentries, including 10,750 set phrases.
      author = {Bradley, Jeremy},
      title = {Mari-English dictionary},
      year = {2021},
      url = {https://www.univie.ac.at/maridict/site-2014/dict.php?int=0}


Bradley, J. Corpus of Literary Mari 2020
The Mari corpus project was initiated by scholars from Ghent (Alexandra Simonenko), Helsinki (Jack Rueter, Niko Partanen), Moscow (Anna Volkova), Munich/Vienna (Jeremy Bradley), Tromsø (Trond Trosterud), Turku (Jorma Luutonen), and Yoshkar-Ola (Andrey Chemyshev, Gennadiy Sabantsev, Nadezhda Timofeeva). It represents an effort to create a morphologically annotated corpus of literary Mari (both Meadow Mari and Hill Mari) searchable in myriad ways (by lexeme, by morphological pattern, by syntactic pattern). The first working version of this corpus was released on 23 December 2020 and contains 57.38 million tokens of Meadow Mari texts and 6.25 million tokens of Hill Mari text. Texts represent different genres (fiction, non-fiction, law, news, science) and represent over a century of Mari literacy.
      author = {Bradley, Jeremy},
      title = {Corpus of Literary Mari},
      year = {2020},
      url = {https://www.univie.ac.at/maridict/site-2014/corpus-desc.php?int=0}


Brykina, M., Gusev, V., Szeverényi, S. and Wagner-Nagy, B. Nganasan Spoken Language Corpus (NSLC) 2018
This second version 0.2 of the corpus is a subcorpus that comprises 177 communications, 136 of which contain an aligned audio recording, with glossed (Toolbox/FLEx) and annotated (EXMARaLDA) transcripts from 57 speakers. All texts have been translated into Russian and English, some also into German. The corpus also contains rich metadata on the communications and speakers.
      author = {Maria Brykina, Valentin Gusev, Sándor Szeverényi, Beáta Wagner-Nagy},
      title = {Nganasan Spoken Language Corpus (NSLC)},
      year = {2018},
      note = {Archived in Hamburger Zentrum für Sprachkorpora. Version 0.2. Publication date 2018-06-12.},
      url = {http://hdl.handle.net/11022/0000-0007-C6F2-8}


Brykina, M., Gusev, V., Szeverényi, S. and Wagner-Nagy, B. Nganasan Spoken Language Corpus (NSLC) 2016
      author = {Maria Brykina, Valentin Gusev, Sándor Szeverényi, Beáta Wagner-Nagy},
      title = {Nganasan Spoken Language Corpus (NSLC)},
      year = {2016},
      note = {Archived in Hamburger Zentrum für Sprachkorpora. Version 0.1. Publication date 2016-12-23.},
      url = {http://hdl.handle.net/11022/0000-0001-B36C-C.}


Budzisch, J., Harder, A. and Wagner-Nagy, B. Selkup Language Corpus (SLC) 2019
      author = {Josefina Budzisch, Anja Harder, Beáta Wagner-Nagy},
      title = {Selkup Language Corpus (SLC)},
      year = {2019},
      note = {Archived in Hamburger Zentrum für Sprachkoropra. Version 1.0.0. Publication date 2019-02-08.},
      url = {http://hdl.handle.net/11022/0000-0007-D009-4}


Fu-Lab_team Корпус Коми Языка 2020
      author = {Fu-Lab_team},
      title = {Корпус Коми Языка},
      year = {2020},
      url = {http://komicorpora.ru/}


Goussev, V. LangueDOC: Nganasan XX
      author = {Valentin Goussev},
      title = {LangueDOC: Nganasan},
      year = {XX},
      url = {http://www.philol.msu.ru/ languedoc/eng/ngan/index.php}


Kazakevich, O. Minority languages of Siberia as our cultural heritage: Ket XX
      author = {Kazakevich, O.},
      title = {Minority languages of Siberia as our cultural heritage: Ket},
      year = {XX},
      url = {http://siberian-lang.srcc.msu.ru/en/textspage?field_word_lang_tid%5B%5D=44&field_text_type_tid=All&field_term_place_tid=All&field_informant_nid=All}


Kazakevich, O. Minority languages of Siberia as our cultural heritage: Selkup XX
      author = {Kazakevich, Olga},
      title = {Minority languages of Siberia as our cultural heritage: Selkup},
      year = {XX},
      url = {http://siberian-lang.srcc.msu.ru/en/textspage?field_word_lang_tid%5B%5D=43&field_text_type_tid=All&field_term_place_tid=All&field_informant_nid=All}


Kazakevich, O. Minority languages of Siberia as our culturalheritage: Evenki XX
      author = {Kazakevich, O.},
      title = {Minority languages of Siberia as our culturalheritage: Evenki},
      year = {XX},
      url = {http://siberian-lang.srcc.msu.ru/en/textspage?field_word_lang_tid%5B%5D=42&field_text_type_tid=All&field_term_place_tid=All&field_informant_nid=All}


Khanina, O. and Shluinsky, A. Endangered Languages and Cultures of Siberia: Forest Enets XX
      author = {Olesya Khanina and Andrey Shluinsky},
      title = {Endangered Languages and Cultures of Siberia: Forest Enets},
      year = {XX},
      url = {http://www.siberianlanguages.surrey.ac.uk/summary/forest-enets/}


Khomchenkova, I., Pleshak, P. and Stoynova., N. The corpus of contact-influenced Russian of Northern Siberia and The Russian Far East. XX
      author = {Irina Khomchenkova, Polina Pleshak, Natalia Stoynova.},
      title = {The corpus of contact-influenced Russian of Northern Siberia and The Russian Far East.},
      year = {XX},
      url = {http://web-corpora.net/wsgi3/ruscontact/search}


Kokkonen, P. Komi Zyrian Corpus 2014
1. Jesus Friend of Children.
ISBN 91-88394-64-6, ISBN 952-9790-13-9.
Institute for Bible Translation.
Stockholm & Helsinki 1994. 65 pp.
Document type: running text.
Size of the corpus: 7,338 words, 48,883 characters.
Character encoding: Cyrillic alphabet converted to the ISO 8859-1 (Latin-1).
2. Gospel of Mark in Komi-Zyrian language.
(Preliminary edition.)
ISBN 91-88394-79-4, ISBN 952-9790-20-1. 71 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1995.
Document type: running text.
Size of the corpus: 11,932 words, 86,108 characters.
Character encoding: Cyrillic alphabet converted to the ISO 8859-1 (Latin-1).
3. Gospel of Luke in Komi-Zyrian language.
(Trial edition.)
ISBN 91-88794-32-6, 952-9790-32-5. 137 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1996.
Document type: running text.
Size of the corpus: 14,677 words, 101,908 characters.
Character encoding: Cyrillic alphabet converted to the ISO 8859-1 (Latin-1).
4. Gospel of John in Komi-Zyrian language.
(Trial edition.)
ISBN91-88794-88-1, 952-9790-44-9. 97 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1997.
Document type: running text.
Size of the corpus: 14,769 words, 102,504 characters.
Character encoding: Cyrillic alphabet converted to the ISO 8859-1 (Latin-1).

5. The New Testament in the Komi-Zyrian language
ISBN 10: 978-952-5634-16-7
Institute for Bible Translation
Helsinki/Izhevsk 2008
The document types: htm, rtf, txt, xml
The size of the corpus: 6,65 Mt (calculated from the rtf-file)

6. Morphologically encoded Komi texts
The Komi texts are in three formats: running texts, in the sentence-per-line format, and morphologically encoded format. They include the following:
(1) 2 short-stories by a Komi writer, (2) a text of a booklet for children, (3) an article from a Komi newspaper, (4) a scientific text (in two parts) from a Komi periodical and (5) religional texts:
(1) N'ina Kuratova (1983). Bobön'an' kör, Povest'jas, vis'tjas.
Komi kn'izhnöj izdatel'stvo, Syktyvkar.
(2) Rots'ev, Jegor (1987). Mitruk petö tundrays', 3 - 65.
Komi knizhnoj izdatelstvo, Syktyvkar.
(3) P. Stolpovskij, SSSR-ys' pisat'el'jas sojuzsa ts'l'en. Komi mu 1991: 4.
(4) Tsypanov, Jevgenij (1989). VK: 6, 49 - 55.

      author = {Kokkonen, Paula},
      title = {Komi Zyrian Corpus},
      year = {2014},
      url = {http://urn.fi/urn:nbn:fi:lb-2014032615}


Leisiö, L. Nganasan Speech Corpus 2020
The corpus contains video and audio recordings from 1986-2013 of fairy tales, songs, biographies, recollections and stories, as well as discussions on everyday issues in Nganasan and their linguistic transcripts. The corpus contains also photographs.
      author = {Larisa Leisiö},
      title = {Nganasan Speech Corpus},
      year = {2020},
      url = {http://urn.fi/urn:nbn:fi:lb-2014100302}


Lublinskaya, M., Goussev, V. and Sherstinova, T. Nganasan Multimedia Dictionary 2000
The dictionary contains more than 3600 words. It reflects rather well the Nganasan traditional culture and the way of life, which are now almost unknown to the younger generation. The dictionary does not contain the new Russian loan words, as they were not influenced by the phonological system of Nganasan and preserve the Russian pronunciation. The dictionary includes all the known Nganasan idioms. Some of the entries are accompanied by the phrase or context examples. All words, idioms and phrases are provided with the sounding examples.
      author = {Marina Lublinskaya, Valentine Goussev, Tatiana Sherstinova},
      title = {Nganasan Multimedia Dictionary},
      year = {2000},
      url = {https://iling.spb.ru/nord/materia/nganasan/introduction.html}


Medvedeva, M. and Arkhangelskiy, T. Udmurt corpora 2018
Currently, three corpora are available: the corpus of contemporary written literary Udmurt (“the Main corpus”), the corpus of Udmurt-language social media and the Sound-aligned corpus of Udmurt dialects. They differ in what kind of texts the contain, but have mostly identical annotation and search capabilities.
      author = {Medvedeva, Maria and Timofey Arkhangelskiy},
      title = {Udmurt corpora},
      year = {2018},
      url = {http://udmurt.web-corpora.net/index_en.html}


Mus, N. Tundra Nenets monolingual corpus XX
The Tundra Nenets Monolingual Corpus is a collection of full texts and text excerpts (dominantly) representing the written version of Tundra Nenets. The current size of TNMC (as of January 2023) is 467,212 tokens.
      author = {Nikolett Mus},
      title = {Tundra Nenets monolingual corpus},
      year = {XX},
      url = {https://tundranenetsdata.nytud.hu/bonito/run.cgi/first_form}


Nikolaeva, I. Endangered Languages and Cultures of Siberia: Tundra Nenets 2014
      author = {Irina Nikolaeva},
      title = {Endangered Languages and Cultures of Siberia: Tundra Nenets},
      year = {2014},
      url = {http://www.siberianlanguages.surrey.ac.uk/summary/tundra-nenets/}


Nikolaeva, I. Tundra Nenets texts 2010
      author = {Nikolaeva, Irina},
      title = {Tundra Nenets texts},
      year = {2010},
      doi = {http://hdl.handle.net/2196/00-0000-0000-0001-D85D-3}


Nikolaeva, I. Endangered Languages and Cultures of Siberia: Northern Khanty XX
      author = {Irina Nikolaeva},
      title = {Endangered Languages and Cultures of Siberia: Northern Khanty},
      year = {XX},
      url = {http://www.siberianlanguages.surrey.ac.uk/summary/northern-khanty/}


Paperno, D., Leontyev, A., Vostrikova, N., Serdobolskaya, N., Adaskina, Y., Prozorova, E., Volkova, A. and Usacheva., M. Komi Pechora Corpus 2002-2003
The texts were recorded and transcribed by D. Paperno, A. Leontyev, N. Vostrikova, N. Serdobolskaya, Yu. Adaskina, E. Prozorova, A. Volkova during fieldwork with speakers residing in Yeremeevo village in July 2002 and February 2003. All texts were proofread and annotated by M. Usacheva.

All corpus texts are also available for full download in pdf as interlinear texts, some of them are also accompanied by a sound file. Each pdf file contains one transcribed text split into sentences/clauses. The annotation includes glossing and free translation into Russian, both in separate lines.

      author = {D. Paperno, A. Leontyev, N. Vostrikova, N. Serdobolskaya, Yu. Adaskina, E. Prozorova, A. Volkova, M. Usacheva.},
      title = {Komi Pechora Corpus},
      year = {2002-2003},
      url = {http://web-corpora.net/KomiPechoraCorpus/search/common-search-start-en.html}


Ruttkay-Miklián, E. Synya Khanty Dictionary
The Synya Khanty Dictionary based on R. Makarovna’s dialect is a part of a comprehensive project tending to document the living Khanty dialects.

Its plan was elaborated and published by Éva Schmidt (2001: 280–288). Essentially, this project aims at documenting the dialects and sub-dialects of the Ob-Ugric languages by recording the performance or language knowledge of the last speakers. She recommended a method similar to using questionnaires, in which the entries of the well-known dialect dictionaries of the Ob-Ugric languages are taken as a starting point. It is the conversations triggered by the words to be explained that should be recorded, as the explanations are determined by free associations. Evidently, this material will inform us on the characteristic features of the mental operations of speech.

The material of the Synya Khanty dictionary (50 + 20 hours) was recorded in 1999 and 2002. This article presents the theoretical and practical problems of the collecting and elaborating process.

      author = {Ruttkay-Miklián, Eszter},
      title = {Synya Khanty Dictionary},
      url = {http://hantisirn.nytud.hu/bundle/synya-khanty-dictionary}


Urmanchieva, A. LangueDOC: Enets XX
      author = {Anna Urmanchieva},
      title = {LangueDOC: Enets},
      year = {XX},
      url = {http://www.philol.msu.ru/ languedoc/eng/enets/index.php}


XX Fenno-Ugrica 2016
Fenno-Ugrica of the National Library of Finland is a digital collection of publications in Uralic languages. The Fenno-Ugrica collection includes more than 1500 monographs and over 110 newspaper and journal titles in 20 languages.

In addition to the prints in Uralic languages, Fenno-Ugrica contains six special collections, Lapponica for Sami languages, Zingarica for Romani languages, Hebraica for Yiddish, Institute of Estonian Language for Livonian and Komi National Library for Komi languages. Moreover, the sub-collection Learned Societies does contain digitized publications of the Finnish Antiquarian Society and the Finno-Ugrian Society.

The material of Fenno-Ugrica has been produced by the National Library of Finland in the Digitization Project of Kindred Languages in 2012-2015 and Minority Languages Project in 2016. The both projects have been funded by Kone Foundation.

      author = {XX},
      title = {Fenno-Ugrica},
      year = {2016},
      url = {https://fennougrica.kansalliskirjasto.fi/}


XX Mansi Dictionary of Munkácsi and Kálmán 2012
Bei diesem Wörterbuch handelt es sich um die digitalisierte Version des Wogulischen Wörterbuches (Munkácsi, Bernát – Kálmán, Béla 1986, Budapest: Akadémiai Kiadó), welches das erste Wörterbuch des Mansischen ist. Die vorgestellten Sprachdaten wurden von Bernát Munkácsi von 1888-1889 gesammelt.
Neben einer großen Menga an Grammatik-, Folklore- und Textmaterial, konnte Munkácsi auch viele lexikographischen Daten bei den Mansen sammeln. Aus den mansischen Sätzen erstellte er eine Kartei, die er größtenteils nach Dialekt und in alphabetischer Reihenfolge ordnete.
Nach Munkácsis Tod übernahm Béla Kálmán seine Aufgaben: Er bearbeitete und veröffentlichte Munkácsis Daten. Kálmán ordnete außerdem die Kartei neu, wobei er die Wörter anhand ihrer (gemeinsamen) Etymologie gruppierte. Danach vervollständigten er und seine Kollegen die Kartei mit den mansischen Wörtern aus Munkácsis Texten.
In den Lexikoneinträgen ist bei den Nomen die Grundform angegeben und bei den Verben die Form der dritten Person Singular. Dazu wird in eckigen Klammern die heutzutage verwendete Form des Wortes im Sosva-Dialekt angegeben, wie sie in Kálmáns (oder manchmal in Rombandeevas) Aufzeichungen zu finden ist. Die Übersetzungen werden auf Ungarisch und manchmal auf Russisch (wie bei Munkácsi) sowie auf Deutsch angegeben. Die deutschen Übersetzungen wurden von Béla Kálmán hinzugefügt. Die Bedeutung der Wörter geht auch aus den Beispielsätzen hervor. Derivationen und Komposita, die dieses Wort enthalten, sind dort ebenfalls zu finden.
      author = {XX},
      title = {Mansi Dictionary of Munkácsi and Kálmán},
      year = {2012}

YY Диалектологический атлас XX
      author = {YY},
      title = {Диалектологический атлас},
      year = {XX},
      note = {Dialectal atlas},
      url = {http://atlas.philology.nsc.ru/}


Бергманн, М., Люблинская, М. and Шерстинова, Т. Русско-ненецкий озвученный разговорник 2002
      author = {Маркус Бергманн, Марина Люблинская, Татьяна Шерстинова},
      title = {Русско-ненецкий озвученный разговорник},
      year = {2002},
      url = {https://iling.spb.ru/nord/materia/nenec_phrasebook/intro-rus.html}


Люблинская, М., Гусев, В. and Шерстинова, Т. Нганасанский мультимедийный словарь 2000
Словарь составлен на основе авамского говора нганасанского языка. На нём говорит 75% носителей - большинство нганасан в пос. Усть-Авам и Волочанка (Дудинский р-н Таймырского автономного округа), а также часть нганасан пос. Новая (Хатангский р-н Таймырского автономного округа). До перехода на оседлый образ жизни (до начала 1970-х гг.) его носители кочевали в бассейнах рек Пясина (Дудыпта и ее притоки), Верхней и Нижней Таймыры - на исконных этнических территориях. Переход к оседлой жизни привел к существенному снижению функций нганасанского языка и общему упадку этнической культуры. Ситуацию ухудшило тяжёлое экономическое положение посёлков, типичное для Крайнего Севера Сибири.
      author = {Марина Люблинская, Валентин Гусев, Татьяна Шерстинова},
      title = {Нганасанский мультимедийный словарь},
      year = {2000},
      url = {https://iling.spb.ru/nord/materia/nganasan/introrus.html}


(Kazym) Khanty newspaper Khanti Yaseng XX
      title = {(Kazym) Khanty newspaper Khanti Yaseng},
      year = {XX},
      url = {https://khanty-yasang.ru/khanty-yasang/archive}


Mansi newspaper Luima Seripos XX
      title = {Mansi newspaper Luima Seripos},
      year = {XX},
      url = {https://khanty-yasang.ru/luima-seripos/archive}


Impressum | Datenschutzerklärung | Sitemap