I was teaching in Syktyvkar 1.6.-9.6.2017, as we had a practice seminar with the second year students of Komi and English philologies in Syktyvkar State University. This practice has been arranged regulary for many years, and I have also participated to these earlier in different roles. The idea is that the students from Komi department of Syktyvkar State University either go to do fieldwork practice, or they work on already existing recordings at the University. This year the method was latter, and I was invited to teach how to make transcriptions in ELAN. I was very glad to have such an opportunity, and I’m hopeful we will have more possibilities in the future to continue this kind of work.
The students worked either alone or in pairs. The idea was that each group transcribes around 45 minutes, and those who are alone bit less. In practice this resulted in around 2000 transcribed tokens by one student. This was all bit experimental, but even though we still have not totally finished the coursework, it seems we ended up with around 20,000 transcribed tokens, which is a very good result. Of course, it is most crucial that the students get accustomed to work with ELAN, but having something concrete in hand is also important.
The course had few distinct parts, one was the segmentation and annotation part, after which we moved into analysis section. This was relatively short and shallow part, but we explored different ways to make multi-layered searches in ELAN and went through the basics of regular expressions. We also fixed some regularly occurring mistakes automatically with regular expression search and replaces, which maybe helped in part to show what is the point of this. Similarly we used additional tier for notes and encouraged everyone to use it, which also gives one model for later annotating some features on their distinct tiers as part of research practice with ELAN corpus.
We were lucky that most of the students had their own laptops, and I had some headphones and similar equipments with me as well. So every group was well equipped to their task.
In order to make the learning materials easily accessible, I put everything online on a course website. Everything was also shown on projector in class with ELAN open, but I thought having screenshots for all most important tasks could be one way to ensure that everyone can follow.
The transcription system we used is called Scientific Komi Transcription (Коми научнӧй транскрипчия). It is a transcription system used commonly with Permic languages, and it is normally used on this course. So it was logical for us to use this also this time. One of the advantages it has is that the students were able to use the Cyrillic keyboard they commonly used, with just a few additional characters, most of which are already in official Komi keyboard.
So with help of Jeremy Bradley we added those characters into Russian keyboard after discussing with the students which locations they would find most useful. Besides these we still had to add '
as it doesn’t exist in the Russian keyboard.
Before course I spent some time digitalizing the C-cassettes on which we had some older recordings. This was done in somewhat primitive manner, and it is very likely that this work has to be repeated later with proper equipment. However, I think this was anyway worth doing as we were now able to select recordings from bit wider area where Komi is spoken. We had tapes also from the other regions among northern dialects, but when the students were grouped it just went so that none of those was selected for now. Also some recordings were discarded from this practice because the quality was lower than what we wanted.
The course itself started with segmenting the files, which were assigned and matched with groups on a Google Spreadsheet where we generally followed the status of different tasks and files. I would had liked to arrange a system where each student fills the metadata they discover from the recordings on the spreadsheet, but there wasn’t maybe enough time to introduce this properly. It isn’t too difficult to pick the metadata from the tapes now either, but this could had been a good time to go through metadata related practices and conventions.
After segmenting we moved into transcription, with the result that different groups were in different phases in different times, which was not really a problem anyway. The actual course plan for receiving a grade is following:
So now we are in the point where everyone has returned the file, and director of Komi department, Rimma Pavlovna Popova, and I will continue to mark the parts which still need some supervision.
One of the most important result is that we have a group of young Komi students who are familiar with transcribing and ELAN, but also the data we received will be very useful. We are still discussing what is the best way to distribute this data, and it will take some time to make the corrections to some of them, but we can already observe the preliminary results.
We can first examine the most common tokens:
content | n |
---|---|
да | 695 |
а | 418 |
вӧли | 356 |
и | 344 |
сиjӧ | 305 |
ме | 234 |
вот | 226 |
сес’с’а | 178 |
но | 170 |
оз | 151 |
This looks like the most common words in any other Komi corpus, I would say. All in all there are 19370 transcribed word tokens.
Then what is the geographical distibution? We were transcribing files from Užga, Koygorodok, Myrponaib, Obyachevo, Shoshka, Ust’-Nem and Körtkerös.
So we have been covering nicely the Southern Komi-Zyrian dialects, with some varieties missing, but still having quite nice coverage of different dialects.
Most of the recordings are old C-cassettes, so the quality is not as high as one could hope. Exception to this comes from Upper-Sysola recordings from 2012, which are generally of a good quality. This is very good, since Upper-Sysola has several phonological developments which deserve further attention.
I have not had that much teaching experience, so I was positively surprised how well everything eventually turned out. What I would like to change or improve upon are maybe following things:
Having learned very much about this experience myself, I’m always happy to return to Syktyvkar State University for any period. I was studying there for five months in 2011, and I consider it as one of my “home” universities. It is also one of the foremost centers of Komi research, and the native Komi speakers are studying there primarily in Komi. This kind of institutions are very valuable, and globally speaking very rare.
I’m personally amazed every time I visit Syktyvkar when I see the scale of different Komi language related work being done there. Deep ties to institutions and researchers on areas where different Uralic languages are spoken are the backbone also for all research we can conduct in the western universities. I’m very thankful for LATTICE laboratory for supporting my work during the two weeks in Komi.