AI can play poker, but I’m neither shaken nor stirred
Chess, jeopardy, AlphaGo, it’s all the same to a machine – the real challenge is playing the person opposite
Imagine, just for a moment, that Ian Fleming didn't base James Bond on a mixture of Royal Navy officers and the singer Hoagy Carmichael and instead, made him a machine. A big, bulky contraption, sort of like the robot from lost in space, but wearing a dinner jacket. 1962's Dr. No would have been very different.
Think of the first scene in which we meet the world's most famous spy, playing Baccarat with Sylvia Trench. She loses a lot of money to this mysterious thing sitting opposite, which says, in a generic robot voice, "I admire your courage, Miss?"
She eyes it curiously and replies: "Trench, Sylvia Trench. I admire your luck, Mr?"
The camera pans up to reveal a robot lighting a cigarette. It says: "Bond, (dramatic pause) Android Bond".
Much later, the pair rendez-vous in a hotel suite where the robot declines Trench's advances because, well, it's a machine and incapable of that sort of thing. But we do know what AI-based programmes are good at: Games. This was highlighted, again, by a Facebook AI called Pluribus, which was developed by a teacher and student from Carnegie Mellon University.
Pluribus (a name meaning "more" in Latin) took part in a 12-day poker marathon against human professionals, each of whom had previously won more than a $1 million playing the game. After 10,000 hands of no-limit Texas hold'em, it earned a virtual $48,000 (38,000), beating all five elite players.
What Facebook say sets this apart from chess, Jeopardy, AlphaGo and other games where AI programs have beaten humans, is that Pluribus took on multiple players and it's a game that's missing data. The machine couldn't see the other player's hands and likewise, the opposing humans didn't know what cards it was holding. Poker, of course, is a game of 'bluffing' and concealing or implying things to fool opponents.
Pluribus works as a sort of forward looking reinforcement learning model, where it has the game parameters set for it and it 'learns' through trial and error. In this case, it runs scenarios against copies of itself and then applies the most successful one to the actual game. It makes initial bets this way, using its mathematical equations to perform a bluff with what's known as a "donk bet", where you place a higher wager early on.
"We think of bluffing as this very human trait," explained Noam Brown, the lead researcher from Facebook's AI team, to the BBC.
"But what we see is that bluffing is actually mathematical behaviour. When the bot "bluffs", it doesn't view it as deceptive or dishonest, it's just the way to make the most money."
Now, I don't mean to take the martini out of their glasses, but so what? Games are largely environments of rules and maths, and AI has already proven it can best humans at that, over and over. What's interesting about poker is that it's not always the best hand that wins the pot and the Facebook team are right to work on bluffing, but it would be far more interesting for a machine to spot the 'tells' in the other players rather than workout when it's best to do so itself.
For instance, in Casino Royale, the key to Bond winning the big hand is that his opponent, Le Chiffre, has a "tell" in the form of a twitch above the eye that appears when bluffing (if you think that's bad, his other eye leaks blood every so often!). As Bond says earlier in the film, "in poker you never play your hand, you play the man across from you." It's a ploy that doesn't require maths, but rather a keen eye and an understanding of idiosyncrasies.
Also, Bond is never just playing the game, he's always after something else; he isn't just reading Le Chiffre for an advantage, he's got to bring him in to MI6 for questioning. Likewise, in Dr. No, he's winning Sylvia Trench's money but also engineering a romantic liaison. He's playing games within games were the maths might not always add up.
There are types of facial recognition programs in development aiming to read emotions, but "tells" that signal player intentions are more subtle than a quivering lip or a furrowed brow. Likewise, many people don't realise they have this ticks and in more complicated examples people know all too well that they do. While Bond spots Le Chiffre's twitch early, the villain uses that to his advantage to imply a 'bluff' that ends up taking Bond out of the game (briefly he buys back in).
For me, a more interesting method AI-powered software could apply to Texas Hold'em is a model where it reads it's opponent's facial expressions, the slight changes in the voice, the unusual involuntary movements they don't know about or that they do and make in-game decisions based on that data.
Of course, this would mean that a robot would need to be built with a face that's honest to a fault or else it would have an unfair advantage. It would literally need to be programmed with some kind of feature or glitch that occurs when it's bluffing, which then raises the question of whether it should know it's got one or not.
Also, why not give the AI a voice; poker's a social game where players converse. This too, is used to gain advantages, to get inside the head and place doubt in the mind. After a short break in Casino Royale to do spy stuff, Bond comes back to the table in a fresh shirt and Le Chiffre notices. "I hope our little game isn't causing you to perspire," he quips.
"A little," Bond replies. "But I won't consider myself to be in trouble until I start weeping blood."
The IT Pro guide to Windows 10 migration
Everything you need to know for a successful transitionDownload now
Managing security risk and compliance in a challenging landscape
How key technology partners grow with your organisationDownload now
Software-defined storage for dummies
Control storage costs, eliminate storage bottlenecks and solve storage management challengesDownload now
6 best practices for escaping ransomware
A complete guide to tackling ransomware attacksDownload now