DeepMind's AI can lip read better than humans

DeepMind AI beats a human expert in lip reading competition

Google's DeepMind has partnered with Oxford University researchers to create a new AI that can read lips, calling it Watch, Listen and Spell (WLAS).

The researchers released a scientific paper suggesting the newly developed AI could correctly interpret more words that a trained professional in lip reading.

When tested on the same randomly selected 200 clips, a human professional lip reader was able to guess words correctly 12.4% of the time, while WLAS had an accuracy rate of 46.8%.

The paper reads: "The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin. This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is available."

The system was trained on a dataset of 118,000 different sentences (17,500 words) using 5,000 hours of video footage from the BBC.

The BBC videos were prepared using machine learning algorithms, and the AI was also taught to realign video and audio when it was out of sync.

Earlier this month, the University of Oxford published a similar research paper, testing a lip reading program called LipNet. LipNet had a 93.4% level of lip reading accuracy, compared to 52.3% scored by a human expert on the same material presented.

However, LipNet was tested on videos with volunteers saying formulaic sentences, with a dataset of only 51 words, whereas WLAS was tested on a much larger range of data, analysing actual conversations from BBC shows.

There are various possible applications of this lip reading technology. An AI tool such as WLAS could be of great help to improve the quality of live subtitles and better support individuals whose hearing is impaired. 

It could also be a useful additional integration for virtual assistants such as Siri, as they could use the phone camera to lip read, improving their understanding of users' words even in crowded or noisy environments.

Such a tool could also be implemented for surveillance purposes, although reading lips from a grainy CCTV video could prove more challenging.

Featured Resources

Digital document processes in 2020: A spotlight on Western Europe

The shift from best practice to business necessity

Download now

Four security considerations for cloud migration

The good, the bad, and the ugly of cloud computing

Download now

VR leads the way in manufacturing

How VR is digitally transforming our world

Download now

Deeper than digital

Top-performing modern enterprises show why more perfect software is fundamental to success

Download now

Recommended

MarqVision detects counterfeit products with deep learning and AI
intellectual property

MarqVision detects counterfeit products with deep learning and AI

18 Sep 2020
The IT Pro Podcast: Attack of the AI hackers
artificial intelligence (AI)

The IT Pro Podcast: Attack of the AI hackers

14 Aug 2020
MIT develops AI tech to edit outdated Wikipedia articles
artificial intelligence (AI)

MIT develops AI tech to edit outdated Wikipedia articles

13 Feb 2020

Most Popular

Why you should prioritise privileged access management
Sponsored

Why you should prioritise privileged access management

9 Oct 2020
IT services giant Sopra Steria falls victim to Ryuk ransomware
Security

IT services giant Sopra Steria falls victim to Ryuk ransomware

23 Oct 2020
The enemy of security is complexity
Sponsored

The enemy of security is complexity

9 Oct 2020