Our aim is to build on the Iźva-Komi corpus created in our current project financed by the Kone Foundation and to further advance the grammatical description of Komi. Special focus will be paid to open research practices. The project will consist of two parts, which together aim at the grammatical description of Komi, but from different directions. The first part of the project will result in a formalized morphosyntactic description to be implemented in a syntactic parser. The second part will be a descriptive and comparative syntax of Komi.
Grammatical description is the logical continuation of corpus-building in endangered language documentation and description. In December 2016 we will have reached the goal and end of the current project in which we built an annotated speech corpus of significant quality and quantity. The corpus will be made easily accessible to the research community and local Komi, which we consider ethically imperative for language documentation projects. We nevertheless believe that it is too early to promise a full-fledged corpus-based grammar, as we are continually receiving new corpus data from digitized collections and from fieldwork carried out by others. Furthermore, a Komi National Corpus is currently being constructed in Syktyvkar, which will eventually contain all texts that have been published in the Komi-Zyrian language. Work on a comprehensive corpus-based grammar would therefore make more sense at a later stage when the use of this data (30+M word tokens for written varieties and up to 1M word tokens for spoken varieties) is guaranteed. We also understand “corpus-based” as a truly quantitative approach. A corpus can be used to find specific examples, but to base one’s analysis and description on the whole corpus as such and to ground analyses on statistically significant generalizations are not only very different tasks, but also require a different focus in order to achieve.