Pluribus learned the nuances of Texas Hold ’Em by playing trillions of hands against itself. After each hand was done, it would evaluate each decision, determining whether a different choice would have produced a better result.
Mr. Brown called this process “counterfactual regret minimization,” and compared it to the way humans learn the game. “One player will ask another, What would you have done if I had raised here instead of called?”
Unlike systems that can master three-dimensional video games like Dota and StarCraft — systems that need weeks or even months to train to play against humans — Pluribus trained for only about eight days on a fairly ordinary computer at a cost of about $150. The hard part was creating the detailed algorithm that analyzed the results of each decision. “We’re not using much computing power,” Mr. Brown said. “We can cope with hidden information in a very particular way.”
In the end, Pluribus learned to apply complex strategies, including bluffing and random behavior, in real time. Then, when playing against human opponents, it would refine these strategies by looking ahead to possible outcomes, as a chess player might. This spring, the researchers tested the system in games in which a single human professional played against five separate instances of Pluribus.
In that format, Mr. Elias was unimpressed. “You could find holes in the way it played,” he said; among other bad habits, Pluribus tended to bluff too often. But after taking suggestions from him and other players, the researchers modified and retrained the system. In subsequent games against top professionals, Mr. Elias said, the system seemed to have reached superhuman levels.
The system did not play for real money. But if the chips had been valued at a dollar apiece, Pluribus would have won about $1,000 an hour against its elite opponents. “At this point, you couldn’t find any holes,” Mr. Elias said.
All the matches were played online, so the system was not deciphering the emotions or physical “tells” of its human opponents. The success of Pluribus showed that poker can be boiled down to nothing but math, Mr. Elias said: “Pure numbers and percentages. It is solving the game itself.”