Python: Geospatial Analysis of the Social and Economic Reasons for the Scottish Independence Referendum 2014 Result, at Local Authority Level

Geospatial Analysis of the Social and Economic Reasons for the Scottish Independence Referendum 2014 Result, at Local Authority Level.

1 Introduction

In Scotland on the 18^th September 2014 a referendum on Scottish Independence from the rest of the UK took place. The voters were asked “Should Scotland be an independent country?” which can be answered with a “Yes” or “No”. The result was a victory for the “No” side with 55.3% of the vote, with a very large turnout of 84.6%. Of the 32 local authorities, there was only 4 where the “Yes” voters won. Many studies have looked at the statistics of different groups (such as age, gender and prosperity) drawn from the whole population and the conclusion was drawn that the “Yes” vote was won by an alliance of groups including Protestants, the very young, women and average earners.

With only 4 local authorities (LA) voting “Yes” the map of the results looks very ‘one-sided’ however, this is misleading because it mainly reflects Scotland’s lopsided population distribution: most Scots live in the central belt between the Forth and the Clyde, with Glasgow accounting for 11% of the total population. These 4 LAs have been described as areas where heavy industry and mining were the main occupations, and areas of increased deprivation [1].

This analysis uses data aggregated by Scottish LAs and so this paper will investigate the results at the LA level. The data set consists of a row for each of the 32 LAs together with 72 census variables aggregated to LA level from the 2011 Census data set, which provides a rich source of social and economic statistics for each population. To normalize the data set, the proportion of vote “No” was calculated and used for the analysis and the census variables were normalized to the various appropriate totals to produce percentage figures for easy comparison.

The aim of this paper is to find the discriminating attributes of the census data that appear to show the spatial differences in the “No” voting outcomes. Then to interpret the results in terms of current political, social and economic sciences and the implications for future referendums. The key analytical research questions are:

What attributes are significant in the geospatial differences in voting behaviour?

Are these attributes consistent with the current geospatial differences in Scotland?

2 Tasks and Approach

Explaining the geospatial differences in voting preference at LA level is the object of this paper. I begin by using choropleth maps with 2-colour (diverging) schemes to visualize the geospatial differences in voting preference. Since LAs are not a fixed geographical size or have a fixed population size, the issue of interpretation of the maps is compounded as the size of the LA is not relevant to the population size. To overcome this issue, I will use population-weighted maps where LAs sizes are based on the number of votes cast.

I need to find attributes in the census data set that identify the voting preference. Academics such as Tom Mullen [2] suggest that the main social and economic reasons for voting preference are affluent group, gender and age, but that these were not uniform. Attention is draw to the geographical pattern were the highest No percentages returned were the areas nearest England and the lowest in the most deprived areas.

I use the Pearson’s correlation coefficient to assess the attributes correlation with voting preference and identify the significantly correlated attributes. I investigate the fit of the attributes using scatter plots and regression lines because to the know issue of outliers in the Pearson’s coefficient and to assess the assumptions of normality of residuals and independence. This should lead to a short list of attributes to geospatially model.

Regression has long been used in political, social and economic sciences [5,6] to analyse population statistics using data that has been aggregated to some degree. Regression as a parametric technique relies on the parameters given to interpret the explanatory variable (vote share in this case), which I can use the above exploratory analysis to select the most appropriate attributes to include in the modelling. I will use multivariate regression techniques to explore potential models that fit the geographic variations. I will use the residual sum of squares (RSS) to algorithmically assess the model fit to the data and model residuals choropleth maps applying a 2-colour (diverging) scheme to visually assess the geographic variations.

With many attributes a more automated way of approaching this investigation would be through a stepwise regression model starting from either the full model and reducing the number of attributes or from the null model and increasing the number or attributes. Such an automated regression analysis can be done using a variety of methods [7] however I will proceed with an exhaustive search using the leaps package. This package performs an exhaustive search for the best subsets of a given set of potential regressers, using a branch-and-bound algorithm.

The goal of this investigation is to explain the geospatial differences using social variables from the Census 2011 and SIMD attributes at LA level. It seems reasonable to evaluate the global model validity at each LA as I am concerned with broad society issues not very local issues or individuals. Evaluation is achieved by analysing the global model residuals geospatially using choropleth maps and summary statistics (p value and RSS) [4].

For more please click the link for the full paper - Link

Python

Tuesday, 7 March 2017

Geospatial Analysis of the Social and Economic Reasons for the Scottish Independence Referendum 2014 Result, at Local Authority Level

No comments:

Post a Comment