Analysing Municipal Population Statistics with R

Last updated on: August 12, 2015 by Musa Kurhula Baloyi


This article was supposed to be entitled “How Thulamela Underdeveloped Malamulele”, inspired by the book “How Europe Underdeveloped Africa” written by Walter Rodney. I came up this similar title because one, I am from Malamulele, and two, I have witnessed first-hand how an uncaring government can drive a community to extreme levels of poverty and with that self-destruction. I was hoping to look at all the projects that were done in Malamulele compared to the ones that were done in Thohoyandou, the dual seat of Thulamela Local Municipality and Vhembe District Municipality. I would look at the cost of each project as well as the status of completion. This would highlight the subtleties of the municipality's spending patterns.

After struggling to find meaningful data that could be used for this analysis, I learnt that there were a number of sources to find relevant data. These were Statistics South Africa (StatsSA), the Municipal Demarcation Board of South Africa (MDB), municipal websites, the National Treasury, the Independent Electoral Commission of South Africa (IEC), as well as other third parties such as Adrian Frith's website. None of these sites included Thulamela Municipality's website. Vhembe District Municipality had something of little use. Most of StatsSA's data products are dead links, except for SuperWeb which proved immensely helpful. The data I now had in my hands led to a new title: “The Thulamela Years (2001 – 2015): A Case for Unequal Service Delivery”. I was really excited about the data I had found.

Since the data came from disparate sources, it required work to put in a format that my code could consume. I learnt that looking at the data from an electoral ward or voting district point of view simplified my life. I did not have to care which town or village fell under which ward. All I had to do was figure out if there are any differences in terms of wards and service delivery: a simple ANOVA test! This black box approach also saved me from myself as it insulated me from looking for data that vilified municipal administration.

When I could not find any ward population statistics for 2006, I inferred it with a random normal distribution. A community census was conducted in 2006/7. That gave me the total population but not a ward breakdown. My distribution had to adhere to this sum and I did that by shifting each value right by an equal number when it fell short.

R factors were not easy to deal with and so I could not get to my data to do the calculations. I told R to read in my CSV file without making my strings into factors. I was then able to draw different graphs that offer insight into how the population in the whole municipality as well as among wards has been changing since 1996 (when Malamulele was still in Levubu-Shingwedzi Transitional Local Municipality) until the last ward delimitation in 2011. Affecting these statistics is not just the doing of Thulamela, but policies from the IEC and the MDB.

At this point, I decided to forget about population densities, water, sanitation, drainage, electricity, health, parks, libraries, and other services that the municipality is mandated to render. That will be the topic of another day. Given how difficult municipal data collection is, it would have been a year before I could publish this article! For now the title will stay as is. The delivery of municipal services hinges on population sizes, so this is a good start.

All the code is accessible from here. The data can be found here.