How big data analytics helped decode the Panama Papers

ICIJ turned to Nuix to help unravel the secrets of offshore banking scandal

At 2.6TB, the Panama Papersis by far the biggist data leak to happen this decade and, it is claimed, the biggest cache of data ever handed over to journalists.

The 11.5 million documents handed over to Sddeutsche Zeitung and theInternational Consortium of Investigative Journalists (ICIJ) comprised nearly five million emails, three million database files, two million PDFs, one million images, 320,166 text documents and 2,242 other unclassified files.

Advertisement - Article continues below

So how exactly did journalists sift through this huge amount of data?

The sheer variety and volume of this data rendered manually going through it all was impractical, if not impossible, as was the use of traditional analytics and search software.

Consequently, the ICIJ called in big data analytics firm Nuix to help make sense of the vast amount of information it had received.

"Nuix is a large-scale investigation platform, which allows you to index vast amounts of data very, very quickly," Carl Barron, Nuix's senior solutions consultant who worked with the ICIJ to sort through the files, toldIT Pro.

"We take in lots of different sorts of information - it could be email, databases, images, PDFs, all those different file formats - and extract all of the text and all of the metadata (information about the file itself). Once it has been indexed we can do some great analytical things, such as bringing out people's names or credit card information if necessary. We can also see who is connected to whom, so if we find a person's name in an email, maybe we want to find that person's name in other documentation and other things like that as well," Barron explained.

Advertisement
Advertisement - Article continues below
Advertisement - Article continues below

Nuix and the ICIJ have been working together for over four years, but this is the largest data analysis that has ever been done before on leaked information - 10 times that seen in the Offshore Leaks documents in 2013.

However, for Nuix's software this wasn't an extraordinary volume of data, being described by Barron as "quite routine".

The information released in the Panama Papers has discovered alleged cases of money laundering, involvement with organised crime, bribery and corruption at the highest levels of global government.

While Sddeutsche Zeitung has only gone so far as to say it received the information from an anonymous source, Mossack Fonseca, the subject of the leak, has claimed the data was extracted through a hack on its email servers.

Featured Resources

The case for a marketing content hub

Transform your digital marketing to deliver customer expectations

Download now

Fast, flexible and compliant e-signatures for global businesses

Be at the forefront of digital transformation with electronic signatures

Download now

Why CEOS should care about the move to SAP S/4HANA

And how they can accelerate business value

Download now

IT faces new security challenges in the wake of COVID-19

Beat the crisis by learning how to secure your network

Download now
Advertisement

Recommended

Visit/security/phishing/355810/zloader-malware-returns-as-a-coronavirus-phishing-scam
phishing

ZLoader malware returns as a coronavirus phishing scam

27 May 2020
Visit/security/hacking/355806/anarchygrabber-hack-steals-discord-tokens-ids-and-passwords
hacking

AnarchyGrabber hack steals Discord tokens, IDs and passwords

27 May 2020
Visit/security/hacking/355801/scammers-using-coronavirus-contact-tracing-in-hacking-attempt
hacking

Scammers leverage contact-tracing in hacking attempt

27 May 2020
Visit/security/phishing/355793/gitlab-phishes-its-remote-employees-and-1-in-5-fell-for-it
phishing

GitLab phished its employees and 20% handed over credentials

26 May 2020

Most Popular

Visit/operating-systems/microsoft-windows/355812/microsoft-warns-against-installing-windows-10-may-2020
Microsoft Windows

Microsoft warns users not to install Windows 10's May update

28 May 2020
Visit/infrastructure/server-storage/355785/dell-emc-poweredge-r7525-review-an-epyc-core-density-to-make
Server & storage

Dell EMC PowerEdge R7525 review: An EPYC core density to make Intel weep

26 May 2020
Visit/infrastructure/network-internet/355792/intel-releases-wi-fi-and-bluetooth-driver-updates-for
Network & Internet

Intel releases Wi-Fi and Bluetooth driver updates for Windows 10

26 May 2020