The research is focused on the pronunciation of the word “что” in Russian dialects.

The data was taken from the Dialect Corpus of the Russian National Corpus.

In total, there were 99 observations.

Columns data: variant, region, gender, age, birth year, position, yat before hard consonants, yat before soft consonants, type of vocalism, realisation of /г/, /ц/ and /ч/.

We hypothesize that pronunciation of “что” depends on all the above-mentioned data.

Sys.setlocale("LC_ALL", "ru_RU.UTF-8")
## [1] "ru_RU.UTF-8/ru_RU.UTF-8/ru_RU.UTF-8/C/ru_RU.UTF-8/C"
library(tidyverse)
cho <- read.csv(url("http://goo.gl/m38tRA"), sep = ";")
summary(cho)
##  variant          region   gender      age         birth_year   position 
##  чё :50   Тамбов     :41   ж:93   Min.   :32.0   Min.   :1870   conj:32  
##  шо : 5   Волгоград  :36   м: 6   1st Qu.:71.5   1st Qu.:1918   part:11  
##  што:44   Тверь      : 8          Median :76.0   Median :1924   pron:56  
##           Забайкалье : 5          Mean   :75.7   Mean   :1923            
##           Архангельск: 3          3rd Qu.:82.0   3rd Qu.:1931            
##           Самара     : 3          Max.   :96.0   Max.   :1979            
##           (Other)    : 3                                                 
##                   yat_hard                   yat_soft            vocalism 
##  не [и]               :97   не [и]               :88   аканье сильное:94  
##  непоследовательно [и]: 2   непоследовательно [и]:11   оканье полное : 5  
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##         g                       ts                      ch    
##  взрывной: 6   [ц'] мягк         : 3   [ч'] мягкий       :74  
##  щелевой :93   [ц] твёрд.        :90   утрата затвора [ш]:25  
##                утрата затвора [с]: 6                          
##                                                               
##                                                               
##                                                               
##                                                               
##   variant_num      region_num      gender_num      yat_hard_num   
##  Min.   :1.000   Min.   :1.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.0000   1st Qu.:0.0000  
##  Median :1.000   Median :6.000   Median :1.0000   Median :0.0000  
##  Mean   :1.939   Mean   :4.768   Mean   :0.9394   Mean   :0.0202  
##  3rd Qu.:3.000   3rd Qu.:7.000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :3.000   Max.   :8.000   Max.   :1.0000   Max.   :1.0000  
##                                                                   
##   yat_soft_num     vocalism_num         g_num             ch_num      
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.00000   Median :0.00000   Median :0.0000  
##  Mean   :0.1111   Mean   :0.05051   Mean   :0.06061   Mean   :0.2525  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.5000  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000  
## 

Descriptive statistics

cho %>%
  ggplot(aes(variant, fill = variant)) +
  geom_bar()

On this graph, we can see distribution of three possible variants of “что” pronunciation in the dialect corpus: “чё”, “шо” and “што”. 50 cases of “чё”, 44 cases of “што” and 5 cases of “шо”.

cho %>%
  ggplot(aes(region, fill = variant)) +
  geom_bar()

This graph illustrates which regions are included in the dialect corpus and which kinds of “что” pronunciation exist in these regions. Although the dialect variant “чё” is leading in the whole corpus (50 cases), in 5 regions out of 8 the non-dialect variant “што” is predominant.

cho %>%
  ggplot(aes(age, variant, colour = gender)) +
  geom_point(size = 2)

This graph shows the gender and the age of the informants. 96% of the informants are female. The median age is 76. The maximum age is 96. The minimum age is 32.

cho %>%
  ggplot(aes(birth_year, variant, colour = gender)) +
  geom_point(size = 1)

Here is a distribution of our informants according to their age of birth. The median year of birth is 1924. The earliest year of birth is 1870. The latest year of birth is 1979.

cho %>%
  ggplot(aes(position, fill = variant)) +
  geom_bar()