Google to change AI forever with open source 'Parsey McParseface'

Google has open sourced its language parsing model, SyntaxNet, calling the English version Parsey McParseface.

The system understands human language with an incredible degree of accuracy, but attention has centred around its choice of name, which comes after people voted to name a science research ship Boaty McBoatface - it was in fact named after Sir David Attenborough.

The open sourcing of Google's parsing model means that the broader community can employ the tool to up the game of artificial intelligence (AI). This means that machines could understand sentences from a standard database of English language journalism, the first step in their journey to take over the world.

"At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways," explained Google senior staff research scientist, Slav Petrov.

"Today, we are excited to share the fruits of our research with the broader community by releasing SyntaxNet, an open source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding systems."

SyntaxNet is built on powerful machine learning algorithms that learn to analyse the linguistic structure of language, and that can explain the functional role of each word in a given sentence.

But how does it work? The system basically recognises the subject and object of sentences and make sense of what they mean, by determining the syntactic relationships between words in the sentence, represented in the dependency parse tree.

To explain this visually, here is a simple tree diagram for the word group: "Alice saw Bob".

The structure encodes that Alice and Bob are nouns and saw is a verb. The main verb 'saw' is the root of the sentence and Alice is the subject (nsubj) of saw, while Bob is its direct object (dobj). This diagram structure helps Parsey McParseface essentially suss out the meaning of the sentence, analysing it correctly.

"Our release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface," added Petrov. "SyntaxNet applies neural networks to the ambiguity problem. An input sentence is processed from left to right, with dependencies between words being incrementally added as each word in the sentence is considered."

According to Google, Parsey McParseface gets 94 per cent accuracy on the news text, compared to about 96 per cent or 97 per cent for human linguists, but it doesn't do quite as well on random sentences from the Web, where it gets about 90 per cent accuracy.