UK dialect maps

Rapporter Team

2016/06/14 08:46:43 AM

Data source

Bert Vaux and Marius L. Jøhndal (University of Cambridge, United Kingdom) have just recently published some exciting results of the The Cambridge Online Survey of World Englishes that we try to analyse a bit further below.


Please note that the below report is generated automatically based on a statistical report template and the results, map, tables and all these text is generated real-time or served from cache. This means that you are now reading a non-proofread quick report written by computers.


First, let us plot the raw results about Pop or soda? gathered in the United Kingdom on a terrain map borrowed from Google:

Pop or soda?

Pop or soda?


You can see the raw results geocoded by the Zip code of the respondents in the above map marked by coloured stars for the 6 categories offered in the survey. See the legend on the top right corner for details where the number of cases for each category is shown after the labels in square brackets.

K-nearest neighbours

Beside the 152 answers, 192 subdivisions of the United Kingdom is also shown in similar (a bit dimmer and transparent) colours defined by k-nearest neighbour algorithm where k being 3.This classification method builds and uses the survey data to determine the most likely category for the given subdivision based on the k number of nearest neighbour(s).

This means that setting k to 1 would find the nearest point to each subdivisions centre and colour the polygons accordingly, and using a higher number for k would return a more smoothed map of colours.

Language usage across the UK

Although the characteristics of the four countries addressed in this report may be seen in the above map, some more detailed descriptive statistics are also worth noting.


  England Northern Ireland Scotland Wales
pop 53 0 6 3
other 28 1 14 0
coke 22 1 4 0
soft drink 5 2 2 1
soda 8 0 0 0
brand name 2 0 0 0

The above table shows the number of geocoded cases for each category in each country, that is just not too informative. A row-percentage table with the marginal and emphasized based on the computed Pearson-residuals might be a lot better to check out.


Residuals being higher than 2 or smaller than -2 are highlighted with bold font (continued below)
  England Northern Ireland Scotland
pop 44.92% 0% 23.08%
other 23.73% 25% 53.85%
coke 18.64% 25% 15.38%
soft drink 4.24% 50% 7.69%
soda 6.78% 0% 0%
brand name 1.69% 0% 0%
  Wales Sum
pop 75% 40.79%
other 0% 28.29%
coke 0% 17.76%
soft drink 25% 6.58%
soda 0% 5.26%
brand name 0% 1.32%

The last column of the above table shows the summarized distribution of the answers about Pop or soda? that is worth comparing to the country-specific values. The most interesting 5 values are highlighted based on their residuals.

Statistical tests

It seems that a real association can be pointed out between the question and the country (χ=31.7 at the degree of freedom being 15) at the significance level of 0.00709. This means that there is a significance difference in what people think about Pop or soda? in the analysed four countries. This association seems to be weak based on Cramer's V (0.228).


The most popular category in the United Kingdom was <<pop>> for <<Pop or soda?>> chosen by four tenth of the respondents.

And the most important differences between the countries can be summarised as: