“Trojan Source” hides flaws in source code from humans

Organizations urged to take action to combat the new threat that could result in SolarWinds-style attacks

Security researchers have revealed a flaw in compilers that could add vulnerabilities to open source projects. Dubbed Trojan Source, the researchers said the attack was potent within the context of software supply chains, such as this year’s SolarWinds attacks.

“If an adversary successfully commits targeted vulnerabilities into open-source code by deceiving human reviewers, downstream software will likely inherit the vulnerability,” said researchers.

Researchers said the attack exploits subtleties in text-encoding standards, such as Unicode, to produce source code with logically encoded tokens that are in a different order from how they are displayed, leading to vulnerabilities.

“These visually reordered tokens can be used to display logic that, while semantically correct, diverges from the logic presented by the logical ordering of source code tokens,” said researchers. 

They added that compilers and interpreters adhere to the logical ordering of source code, not the visual order.

Hackers can use multiple techniques to exploit the visual reordering of source code tokens, according to researchers. 

The first technique is called “Early Returns.” This causes a function to short circuit by executing a return statement that visually appears to be within a comment.

The second is “Commenting-Out.” This causes a comment to visually appear as code, which in turn is not executed.

Related Resource

The truth about cyber security training

Stop ticking boxes. Start delivering real change.

Pair of feet in socks with a chair and plant in the backgroundFree download

Lastly, there are “Stretched Strings.” These cause portions of string literals to visually appear as code, which has the same effect as commenting-out and causes string comparisons to fail.

There is also a variant that uses homoglyphs, which are characters that appear nearly identical to letters. 

“An attacker can define such homoglyph functions in an upstream package imported into the global namespace of the target, which they then call from the victim code,” said researchers. 

This attack variant is tracked as CVE-2021-42694.

Researchers said to defend against such attacks, compilers, interpreters, and build pipelines supporting Unicode should throw errors or warnings for unterminated bidirectional control characters in comments or string literals, and for identifiers with mixed-script confusable characters.

“Language specifications should formally disallow unterminated bidirectional control characters in comments and string literals,” they added. “Code editors and repository frontends should make bidirectional control characters and mixed-script confusable characters perceptible with visual symbols or warnings.”

Featured Resources

2021 Thales cloud security study

The challenges of cloud data protection and access management in a hybrid and multi cloud world

Free download

IDC agility assessment

The competitive advantage in adaptability

Free Download

Digital transformation insights from CIOs for CIOs

Transformation pilotes, co-pilots, and engineers

Free download

What ITDMs did next - and what they should be doing now

Enable continued collaboration and communication for hybrid workers

Recommended

BitMart suspends withdrawals following hack
cryptocurrencies

BitMart suspends withdrawals following hack

6 Dec 2021
Bridging the DevSecOps divide: Spotlight on key relationships
Whitepaper

Bridging the DevSecOps divide: Spotlight on key relationships

3 Dec 2021
Planned Parenthood cyber attack exposes data of 400,000 patients
cyber attacks

Planned Parenthood cyber attack exposes data of 400,000 patients

3 Dec 2021
Bridging the DevSecOps divide: Spotlight on zero trust
Whitepaper

Bridging the DevSecOps divide: Spotlight on zero trust

3 Dec 2021

Most Popular

What should you really be asking about your remote access software?
Sponsored

What should you really be asking about your remote access software?

17 Nov 2021
What are the pros and cons of AI?
machine learning

What are the pros and cons of AI?

30 Nov 2021
What is single sign-on (SSO)?
single sign-on (SSO)

What is single sign-on (SSO)?

2 Dec 2021