The State Capture Report is all over the news since its release last week after last ditch attempts by members of our own government (including President Jacob Zuma and ministers Des van Rooyen and Mosebenzi Zwane) to prevent it from seeing the light of day. However, justice ran its course in the end and the report unexpectedly landed in South Africans’ inboxes and newsfeeds on the afternoon of Tuesday 1 November 2016.

Thembani Phaweni (a Masters statistician at UCT/AIMS) and I have collaborated regularly on text mining tasks in the past so we decided to take a look at the report from different angle to see if we could unearth any stories that had not already been emphasised in the media. To do this, we treated the report as a data analysis and visualisation exercise. We were particularly interested in uncovering the relationships between the various players in the report to see if there were any that had not come to light before.

Most of the news stories that have been derived from the report detail suspicious dealings around mining resources and government contracts. Some of the key players included President Zuma, his son, Duduzane Zuma, the Gupta brothers and their various mining operations, and state-owned energy supplier, Eskom, along with its CEO, Tom Molefe. We’ve found that this is far from the end of the story though. Our network analysis of the report shows that the story is much bigger than that. It is one of wholesale capture of many of South Africa’s parastatals (or former parastatals), including Eskom, Transnet, SAA and Denel. In addition, the data shows that President Zuma is really just one of many other ministers and businesspeople caught up in the Gupta web; a web whose tendrils reach deep into numerous parts of our government and economy. The scale of the Gupta state capture apparatus is truly breathtaking and must have taken incredible perseverance and strategic wit to build. In other circumstances, we’d be thoroughly impressed.

To uncover the relationships between people and organisations (referred to as “entities”), we first used text mining tools to extract a list of the various entities in the report (no automated tool extracts every entity with 100% accuracy but we did manage to capture most). We then drew a line between entities that appeared within five words of each other in the text. The result was a network of people, places and organisations mentioned in the report.