Three Ways Missing Data Can Mislead You in Poker


One of the most interesting books I read last year was How Not To Be Wrong: The Power of Mathematical Thinking by Jordan Ellenberg. If you’ve heard of it, it might be because it made a little splash of news recently when Bill Gates put it on his list of five books you should read this summer. (And I agree — you should.)

Ellenberg’s book is where I first learned of Abraham Wald. He was a brilliant mathematician in charge of a little-known part of the U.S. military during World War II, the Statistical Research Group. As its name implies, the group was tasked with figuring out the answers to computationally complex military questions, such as optimal sampling inspection protocols for rocket propellants.

Here’s one of the problems the group was asked to solve: What’s the best way to armor planes? You can’t apply huge amounts of armor everywhere, or the plane gets so heavy that its speed, maneuverability, and range all suffer. But too little armor, and it can’t survive hits from anti-aircraft guns or enemy fighters.

Wald’s group was told that planes coming back from engagements over Europe had 1.11 bullet holes per square foot over the engine, and 1.73 over the fuselage. Should they rearrange the armor? If so, how?

At first glance, you’d think you should put less armor on the engine and more on the fuselage, because the latter is taking more hits. But Wald realized that the key to the answer was in the data they didn’t have. The planes returning safely from their sorties were not a representative sample. The crucial missing data pertained to the planes that didn’t come back.

The greater density of bullet holes in the fuselage of the survivors meant that the planes could take those hits and still fly home. But the planes that were getting shot in the engine were falling out of the sky. The solution to the problem, then, is to move armor from the fuselage to the engine — a counterintuitive result, if you only looked at the available numbers.

Missing data can mislead you in poker, too. Let’s consider three ways this can happen.


1. Win Rates

You look at your log book or spreadsheet of buy-ins, wins, and losses, and see a decent profit. Great! You’re a winning player overall!

Or are you?

Have you faithfully recorded every single dollar, in and out, 100% of the time? For most poker players, the answer is no.

If the missing data were random, it wouldn’t matter much; the bottom-line conclusion would remain the same. But that’s not how it tends to happen.

Rather, the missing results tend to be the ones we want to forget, so we find flimsy excuses for not including them. That session “doesn’t count” because I was too drunk to play well, or it was just a home game rather than a casino, or I was playing a poker variant that was new to me. There’s no end to the creative justifications we can concoct when we want to avoid a harsh truth.

But even if you have honestly documented all of your results, you still might be misled by missing data. Sure, you have a good net win. But how are you getting it? Maybe you’re winning at hold’em, but losing at Omaha. Maybe you’re winning at $1/$2 games, but losing at $2/$5 games, or at least winning less per hour. Maybe you do great with shorter sessions — less than four hours, say — but tend to turn loser when you stretch that out. How are you doing live versus online? Cash games versus tournaments?

How can you possibly know any of these things if such details are not incorporated into your record-keeping? How can you optimize your hourly win rate if you don’t have the data to figure out when and where you’re performing best? How can you plug leaks in your game selection if you can’t sort your data by many different criteria?

2. Tells

An opponent has moved all in, and you have to decide whether to call. It’s a tough decision, and could go either way. Perhaps this is a situation in which a tell can act, as Mike Caro has long advised, as a “tiebreaker” on a close decision.

You noticed that the guy slammed his chips stacks down forcefully when he put them out in front of him. You’ve read that this is an attempt at intimidation, a “strong means weak” tell. Therefore, your conclusion is he doesn’t have much, and he’s trying to get you to fold.

But you’re missing some data here, aren’t you? Though there are certainly generalizable trends in interpreting tells correctly, there’s also a ton of player-to-player variation. Have you been paying enough attention to know about this particular opponent? Maybe this is the way he always does his all-in moves, whether strong or weak. Or maybe he’s the exception to the general rule, so amateurishly transparent that he has the nuts and can’t hide his confidence.

The answer lies in the missing data — how he behaved all the previous times he was all in. Were you watching? Did you notice how the cards revealed at the end of the hand correlated with his physical actions? If not, you’re stuck trying to interpret a tell with no context — or at least with a much less rich context than you could have had.

3. Tournament Results

Poker media are constantly saying of a well-known player that he has won so many dollars in tournaments in his career. But what are they not telling you? What are the missing data?

For one thing, they don’t tell you how many times that player entered tournaments and busted with no cash. Therefore, you have no way of knowing what his return on investment is (or “ROI”). It’s entirely possible that a player touted — correctly, so far as it goes — as having won $5 million, is actually an overall loser, once you add up all of his buy-ins. Or he might be a small enough winner that if you knew how much he racked up in airline, hotel, and food expenses on the tournament circuit, you’d see that he’s barely breaking even.

Even if he’s a substantial winner when you factor in those things, you probably don’t know what other financial leaks he has. Does he tend to blow his winnings at the craps table, throwing lavish celebratory parties for his friends, or making bad investments backing other players? If so, his actual take-home might be peanuts, or even negative.

A Challenge

Here’s my challenge to you: Think of a fourth example in which the key to a poker problem lies with the missing data. It might be a situation in which it’s obvious that missing data are important, or it might be one in which you’re not currently aware that data are missing. How would the missing data change your solution to the problem?

If you feel like it, put your best response in the comments below. I’ll be interested to see what you can come up with.