Tuesday, 7 March 2017

Feature Selection and Modelling of Well-Being Using the Human Development Index



Feature Selection and Modelling of Well-Being Using the Human Development Index

1 Introduction

In recent times, the dramatic rise in the global population has put strain on the resources available in countries around the world. This growing strain is being tackled by governments in the developed world by pouring resources into health, education, income and infrastructure. These vital facets of quality of life are used to define a nation’s human development, having a high value is what we mean by a developed country. There is a consensus on the idea of a developed, developing and under developed country. Such terms are used to define government spending on international aid and are used in economic forecasting for global growth and trade. But what is human development? Human development can be defined as the process of increasing the population freedoms, opportunities and well-being. Human development is about giving people the opportunities to define their own life in all aspects such as, social group, political affiliation and religious practices.
The human development concept was first proposed by Mahbub ul Haq during his time at the World Bank in the 1970s, where he argued that the measures currently in use did not account for the true purpose of human development – to improve people’s lives. This bought about a change in the way that human development was recorded, where in the past the traditional approach would be to look solely at the economy usual using Gross Domestic Product (GDP), stock prices, consumer spending and national trade and debt, giving only a partial view of how the actual population are doing. The new approach using the Human Development Index (HDI) is asking a fundamentally different question, how are people doing? The HDI is reported as a composite index between 0-1 and is used across the world as a measure of a populations well-being.  HDI looks at health, education and income, these three variables are the basic requirements of opportunity and well-being. People put most emphasis on health, by this I mean avoiding premature death due to infection, disease or injury and maintaining a healthy lifestyle and receive adequate and timely medical care. Education is important as access to knowledge is crucial in the life chances of people within a society. Income allows people to maintain their basic needs of maintaining a home and providing food and clothing, this contributes to development as standard of living.

This information is collected by United Nations Development Programme (UNDP). There are many users, such as economists, development specialists, health care professionals and governments, which have a plethora of analytical questions they hope to gain understanding about by using the data sets collected. For example, the reports of the past 25 years of the human development reports have studied such questions as financing human development, security, economic growth, consumption, deepening democracy, cultural liberty, sustainability, equity and human mobility. Using these broad subject headings, they then proceed with an in-depth analysis tackling a large range of questions. The most recent report ‘Work for Human Development’ [1] considers the work place and how work can contribute to the well-being and richness of a person’s life and in some cases, lead to a worst life and reduce overall human development (such as reduced life expectancy working with dangerous chemicals). The paper then proceeds to review work looking not just at jobs in the economy, but voluntary and caring work, then looking at different work in the stages of the lifecycle and how work enhances human development, deeper work then proceeds looking at globalization of the workforce and how the workforce has modernized in the post-industrial age. Moving on the paper looks at the imbalances in paid and unpaid work concluding that both paid and unpaid work has social value. Further issues such as sustainable work and enhancing human development through work and strategies are then tackled and discussed leading to actionable recommendations such as youth employment strategies and reducing gender inequality. This shows the broad range of analytical questions that are being asked of this data set and how in-depth analysis is used to provide context to the current world position considering the given topic and giving guidance and recommendations for the world’s people and governments.

The current index is based on the combination of just four factors, life expectancy at birth, expected years of schooling, mean years of schooling and gross national income (GNI) per capita. These four factors are combined to form the index value from 0-1 with 4 different categories given to different values. These combine the countries into 4 groupings (See Fig1).

Fig1 Human Development Index grouping boundaries

2. Questions

It is well known that the northern hemisphere has the largest proportion of very high human development and that the southern hemisphere has the largest proportion of low human development. The challenge since the mid-1900s has been to accelerate the growth of human development in the southern hemisphere by various economic, social and governmental changes (such as the World Trade Organisation and the United Nations), in the attempt to increase world trade and provide stability and social cohesion. This globalization of trade and commerce has been further accelerated by the internet allowing relatively cheap communication and commerce platforms to become wide spread among all countries and groups. However still there appears to be little change in the HDI for the low human development countries over the past 20 years, which prompts the questions, why? what is holding back the development? and how can this be accelerated? As there are many facets to human development, to begin investigation of these questions information about many parts of a country’s economy such as GDP, GNI and natural resources, information about the societies health such as life expectancy, education and deaths from diseases and information about the infrastructure available in the country such as electricity and internet availability are required.

To investigate these intriguing questions this paper will limit the domain to looking at the differences between the 4 groups of development countries with the view to investigating the following analytical questions:

1)     What are the most important attributes for the countries development?
2)     What are the trends in the HDI for each group over the past 35 years?
3)     Is HDI a suitable measure for well-being?
4)     Can the development of undeveloped countries can be accelerated by focusing on the most important attributes?

For more please click the link for the full paper - Link

Geospatial Analysis of the Social and Economic Reasons for the Scottish Independence Referendum 2014 Result, at Local Authority Level


Geospatial Analysis of the Social and Economic Reasons for the Scottish Independence Referendum 2014 Result, at Local Authority Level.


1 Introduction
In Scotland on the 18th September 2014 a referendum on Scottish Independence from the rest of the UK took place. The voters were asked “Should Scotland be an independent country?” which can be answered with a “Yes” or “No”. The result was a victory for the “No” side with 55.3% of the vote, with a very large turnout of 84.6%. Of the 32 local authorities, there was only 4 where the “Yes” voters won. Many studies have looked at the statistics of different groups (such as age, gender and prosperity) drawn from the whole population and the conclusion was drawn that the “Yes” vote was won by an alliance of groups including Protestants, the very young, women and average earners.
With only 4 local authorities (LA) voting “Yes” the map of the results looks very ‘one-sided’ however, this is misleading because it mainly reflects Scotland’s lopsided population distribution: most Scots live in the central belt between the Forth and the Clyde, with Glasgow accounting for 11% of the total population. These 4 LAs have been described as areas where heavy industry and mining were the main occupations, and areas of increased deprivation [1].
This analysis uses data aggregated by Scottish LAs and so this paper will investigate the results at the LA level. The data set consists of a row for each of the 32 LAs together with 72 census variables aggregated to LA level from the 2011 Census data set, which provides a rich source of social and economic statistics for each population. To normalize the data set, the proportion of vote “No” was calculated and used for the analysis and the census variables were normalized to the various appropriate totals to produce percentage figures for easy comparison.
The aim of this paper is to find the discriminating attributes of the census data that appear to show the spatial differences in the “No” voting outcomes. Then to interpret the results in terms of current political, social and economic sciences and the implications for future referendums. The key analytical research questions are:
What attributes are significant in the geospatial differences in voting behaviour?
Are these attributes consistent with the current geospatial differences in Scotland?

2 Tasks and Approach
Explaining the geospatial differences in voting preference at LA level is the object of this paper. I begin by using choropleth maps with 2-colour (diverging) schemes to visualize the geospatial differences in voting preference. Since LAs are not a fixed geographical size or have a fixed population size, the issue of interpretation of the maps is compounded as the size of the LA is not relevant to the population size. To overcome this issue, I will use population-weighted maps where LAs sizes are based on the number of votes cast.
I need to find attributes in the census data set that identify the voting preference. Academics such as Tom Mullen [2] suggest that the main social and economic reasons for voting preference are affluent group, gender and age, but that these were not uniform. Attention is draw to the geographical pattern were the highest No percentages returned were the areas nearest England and the lowest in the most deprived areas.
I use the Pearson’s correlation coefficient to assess the attributes correlation with voting preference and identify the significantly correlated attributes. I investigate the fit of the attributes using scatter plots and regression lines because to the know issue of outliers in the Pearson’s coefficient and to assess the assumptions of normality of residuals and independence. This should lead to a short list of attributes to geospatially model.
Regression has long been used in political, social and economic sciences [5,6] to analyse population statistics using data that has been aggregated to some degree. Regression as a parametric technique relies on the parameters given to interpret the explanatory variable (vote share in this case), which I can use the above exploratory analysis to select the most appropriate attributes to include in the modelling. I will use multivariate regression techniques to explore potential models that fit the geographic variations. I will use the residual sum of squares (RSS) to algorithmically assess the model fit to the data and model residuals choropleth maps applying a 2-colour (diverging) scheme to visually assess the geographic variations.
With many attributes a more automated way of approaching this investigation would be through a stepwise regression model starting from either the full model and reducing the number of attributes or from the null model and increasing the number or attributes. Such an automated regression analysis can be done using a variety of methods [7] however I will proceed with an exhaustive search using the leaps package. This package performs an exhaustive search for the best subsets of a given set of potential regressers, using a branch-and-bound algorithm.
The goal of this investigation is to explain the geospatial differences using social variables from the Census 2011 and SIMD attributes at LA level. It seems reasonable to evaluate the global model validity at each LA as I am concerned with broad society issues not very local issues or individuals. Evaluation is achieved by analysing the global model residuals geospatially using choropleth maps and summary statistics (p value and RSS) [4].

For more please click the link for the full paper - Link