A dataset containing sentences from Public Domain sources. Digitalized in Fenno-Ugrica and proofread in FU-Lab. Ivan Belykh's works made available by the author. Four Battles book proofread by Niko Partanen. Current version contains only data comparable with other languages, also the idea is to have all possible Komi-Zyrian data here and the sentence_id would tell which are matching.
kpv
Data frame with Komi-Zyrian data
name of the text in original corpus
sentence id, unique within a text
sentence text
https://github.com/langdoc/kpv-lit https://fennougrica.kansalliskirjasto.fi/ http://komikyv.org/