GitGuardian, the security startup hunting down online secrets to keep companies safe from hackers

More than 3,000 company credentials unwittingly end up online everyday. GitGuardian helps firms plug these leaks

When the login details of an Uber engineer were exposed in 2016 – signalling one of the most high-profile breaches of recent years – the names and addresses of 57 million riders and drivers were left at the mercy of hackers. 

None of Uber’s corporate systems had been directly breached, though. Its security infrastructure was working as it should. Instead, the credentials were found buried within the code of an Uber developer’s personal GitHub account. This account and its repositories were hacked, reportedly due to poor password hygiene and the stolen credentials used to access Uber’s vast datastore. This breach, which Uber sat on for a year, resulted in a then-record-breaking $148 million fine.

Yet despite this public lesson in how not to handle private credentials, so-called company secret leakage is an everyday occurrence

The rise of secret leakage

Research from North Carolina State University found that in just six months between October 2017 and April 2018, more than half a million secrets were uploaded to GitHub repositories, including sensitive login details, access keys, auth tokens and private files. A 2019 SANS Institute survey found that half of company data breaches in the past 12 months were a result of credential hacking – higher than any other attack method among firms using cloud-based services. 

Advertisement
Advertisement - Article continues below
Advertisement - Article continues below

This is where GitGuardian comes in.

Founded in 2017 by Jérémy Thomas and Eric Fourrier – a pair of applied mathematics graduates and software engineers specialising in data science, machine learning and AI – the Paris-based cybersecurity startup uses a combination of algorithms, including pattern matching and machine learning, to hunt for signs of company secrets in online code. According to the company’s figures, more than a staggering 3,000 secrets make their way online every day.

“The idea for GitGuardian came when Eric and I spotted a vulnerability buried in a GitHub repository,” CEO and co-founder Thomas tells IT Pro. “This vulnerability involved sensitive credentials relating to a major company being leaked online that had the potential to cost the firm tens of millions of dollars if they had got into the wrong hands. We alerted the company to the vulnerability and it was able to nullify it in less than a week.” 

“We then built an algorithm and real-time monitoring platform that automated and significantly built-upon the manual steps we took when we made that initial detection, and this platform attracted interest from GitHub’s own Scott Chacon as well as Solomon Hykes from Docker and Renaud Visage from EventBrite.” 

How the cloud is fuelling secret leakage

The problem of sensitive data leakage stems in part from the increasing reliance of software developers on third-party services. To integrate such services, developers often juggle hundreds of credentials with varying sensitivity, from API keys used to provide mapping features on websites to Amazon Web Services login details, and private cryptographic keys for servers. Not to mention the many secrets designed to protect data, surrounding payment systems, intellectual property and more. 

In the process of handling these integrations, more than 40 million developers and almost 3 million businesses and organisations globally use GitHub, the public platform that lets developers share code and collaboratively work on projects. Either by accident (in the majority of cases), or occasionally knowingly, these uploads have company secrets buried within them alongside the code that’s being developed. As was seen with the Uber breach, hackers can theoretically scour this code, steal credentials and hack company accounts all without the developer and their employer being any the wiser.

How GitGuardian plugs these leaks

GitGuardian’s technology works by first linking developers registered on GitHub to their respective companies. This already gives the company greater insight over who their developers are on GitHub and the levels of public activity they’re involved in. This is especially important for developers’ personal repositories because they’re completely out of their companies’ control, yet too often contain corporate credentials. 

Once linked, GitGuardian’s algorithms scrutinise any and all code changes, known as commits, made by these developers in real-time, looking for signs of company secrets. Such signs within these commits range from code patterns to file types that have previously been found to contain credentials.  

“Our algorithms scan the content of more than 2.5 million commits a day, covering over 300 types of secrets from keys to database connection strings, SSL certificates, usernames and passwords,” Thomas continues.

Once a leak occurs, it takes four seconds for GitGuardian to detect it and send an alert to the developer and their security team. On average, the information is removed within 25 minutes and the credential is revoked within the hour. For every alert, GitGuardian seeks feedback from its developers and security teams who rate the accuracy of the detection: were company secrets actually exposed or was it a false positive? Consequently, the algorithm is constantly evolving in response to new secrets and how they are leaked.

Advertisement
Advertisement - Article continues below

This seems like a simple premise, even if the technology behind it is far from simple. But what’s to stop a hacker building a similar algorithm to intercept the secrets before GitGuardian’s platform spots it? 

“GitGuardian is indeed competing with individual black hat hackers, as well as organised criminal groups,” Thomas explains. “We constantly improve our algorithms to be quicker and smarter than they are, and to be able to detect a wider scope of vulnerabilities, which requires a dedicated, highly skilled team.

Advertisement - Article continues below

“We're helped in this by our users and customers who give us feedback – at scale – that we reinject into our algorithms. Our white hat approach allows us to collect feedback and this gives us a tremendous edge over black hats. You can see this as the unfair advantage you get by doing good.”

GitGuardian has already supported global government organisations, more than 100 Fortune 500 companies and 400,000 individual developers. It’s now setting its sights on adding even more developers and companies to its platform to further improve its algorithm, and extend this technology for use on private sites. 

“We started GitGuardian by tackling secrets in source code and private sites,” concludes Thomas. “Our ambition really is to be developers’ and cybersecurity professionals’ best friend when it comes to securing the vulnerability area that is emerging due to modern software development techniques [and] we’re on the road to doing this.”

Featured Resources

Digitally perfecting the supply chain

How new technologies are being leveraged to transform the manufacturing supply chain

Download now

Three keys to maximise application migration and modernisation success

Harness the benefits that modernised applications can offer

Download now

Your enterprise cloud solutions guide

Infrastructure designed to meet your company's IT needs for next-generation cloud applications

Download now

The 3 approaches of Breach and Attack Simulation technologies

A guide to the nuances of BAS, helping you stay one step ahead of cyber criminals

Download now
Advertisement

Recommended

Visit/safe-harbour/33040/github-build-out-its-bug-bounty-programme-and-increases-reward-for-spotted-flaws
safe harbour

GitHub cuts maximum payment limits on bug bounty programme

20 Feb 2019
Visit/development/33712/github-launches-patreon-for-coders
Development

Github launches ‘Patreon for coders’

24 May 2019

Most Popular

Visit/operating-systems/25802/17-windows-10-problems-and-how-to-fix-them
operating systems

17 Windows 10 problems - and how to fix them

13 Jan 2020
Visit/microsoft-windows/32066/what-to-do-if-youre-still-running-windows-7
Microsoft Windows

What to do if you're still running Windows 7

14 Jan 2020
Visit/web-browser/30394/what-is-http-error-503-and-how-do-you-fix-it
web browser

What is HTTP error 503 and how do you fix it?

7 Jan 2020
Visit/policy-legislation/general-data-protection-regulation-gdpr/354577/data-protection-fines-hit-ps100m
General Data Protection Regulation (GDPR)

Data protection fines hit £100m during first 18 months of GDPR

20 Jan 2020