Thursday, August 30, 2007

Bayes Theorem -- Problem Solutions

OK we got a great set of responses to the two problems I posed yesterday to help discuss Bayes Theorem, and eventually its application to the game we all know and love. First, let's review the two problems in case you did not read yesterday and/or are too lazy to just scroll down a bit and read yesterday's post (and the corresponding comments) first:

Question #1: You are on the popular 70s game show Let's Make a Deal. Monty Hall is up there and he shows you 3 boxes on the stage, and asks you to pick one at random, explaining that under one box is a million dollars cash, but under the other two is a dead rodent of some kind. You choose Box #1. He then opens up Box #3 and shows you that there is in fact a dead rodent under that box. Now he offers you to either keep Box #1 or switch to Box #2, your choice. What do you do? Do you switch, do you keep your first box, or does it make any difference?

OK the answer to Question #1, as maybe half of the commenters intuited, is that you have to switch boxes here. Normally I would give my standard explanation that I have repeated many times to friends and family of mine, but I really like the way that Goat introduced it in his first comment to yesterday's post. So for starters I am just going to repeat what he said here:

"Always switch. I'm betting the comments I haven't read have already explained why, but the basic deal is when you pick box #1, there is an approx. 33% shot at a million bucks.

So you're holding 33%. The 'field' is holding 66%. When Monty lifts box #3 to show the dead rat, the 'field' is narrowed to box #2. Which still holds 66% equity. The 'field's' equity has not shrunk with your new information. You just happen to know which of the two boxes now holds all of the 66%.
"

In a nutshell, this is the reason why you should always switch boxes in this scenario, and why as a few of the commenters correctly pointed out, you literally double your chances of winning the million if you switch. Because at the time you chose box #1, you had a 33% chance of being right. We all can agree on that. Random pick of 1 out of 3 items. And that means that the odds that the money is in (either box #2 or box #3) = 66%. Now when box #3 is taken out of the equation, hopefully you can see that, since we know that the probability of the money being in (either box #2 or box #3) = 66%, now we know that the probability of the money being in (box #3) = 66%. Which makes sense, since we all know that we picked a random 1 out of 3 chance to get the money correct when the game began.

Let me just address quickly here those of you who make the very, very common argument that this is incorrect because obviously we have a 50-50 chance of having the money box since we have chosen one box out of two boxes total, or 1/2 = 50%. This would be the right answer if, instead of opening what he knew to be a dead rat box, Monty Hall had instead opened up a dead rat box, and then randomly spun box #1 and box #2 such that it was now an independent 50% chance that the money was behind either box #1 or box #2. But that is not the case here. In the example posed, we already had the existing probability of 33% chance of our box #1 being the box with the money, and that fact never changes. Again, if Monty eliminated box #3, and then followed up by hiding the other remaining two boxes and randomly reassigning the money to one of those two boxes, then of course you would be right, we are 50-50 to have the correct box with box #1. But here, we hold a 33% chance of being right, leaving 66% to the other two boxes, and when Monty eliminates one of those other two boxes (the "field", as Goat helpfully defines it in his comment above) without re-randomizing the remaining two boxes, then our box #1 retains its 33% chance, and box #2 doubles to the remaining 66% chance of being the one with the million inside.

This solution has proven over time to be a very, very difficult thing to explain, so I probably did not do a good enough job here for most of you, and I'm sorry about that. Despite having had this conversation with maybe 100 people in my day, I've never yet been able to devise the explanation that convinces everyone or even mostly everyone in any case. But rest assured, this explanation is the correct one. The reason I used this problem is that it is one of the best possible examples to show how the human brain tends to misapply what is essentially the concept underlying Bayes Theorem in applying new changes to probability when there is already an existing underlying probability assumption in place. And that fact in my opinion tends to spill right over onto the poker table as well with many people I run into on a nightly basis on the virtual felt.

Now on to Question #2 from yesterday:

Question #2 (this one comes directly from The Mathematics of Poker): Suppose doctors have a screening process for, say, Lupus, and that if a person is screened who actually has Lupus, the screening process will return a positive result for Lupus 80% of the time. Assume also that if a person who does not have Lupus is screened, the test will return a (false) positive result for Lupus 10% of the time. Lastly, assume that we know that 5% of the total population on average actually has Lupus.

A person is selected at random from the population at large and screened, and the test returns a positive result for Lupus. What is the likelihood that this person, who just tested positive for Lupus, actually has Lupus?

I was impressed with how many of the commenters seemed to get this one right on, which is great. I have read about this in some other studies, and I was really hoping that some of our doctor bloggers would have weighed in with their own thoughts, because in reality something like more than 75% of doctors gave answers or estimations to this very problem that were at a good 40-50 percentage points away from the right answer. The right answer, as a number of the commenters figured, is that even having received a positive test result, the chances that this randomly-selected patient does in fact have Lupus are 29.5%. This can be intuited very easily actually if you just pick an actual sample size for the total population, and work it out (something many of the commenters did):

So, assume the world has 1000 people in it. 50 of them will actually have Lupus (since 5% of the population on average has it). Of the 50 Lupus people, 40 of them will test positive given the 80% true positive rate that the test reports. Of the other 950 people who do not in fact have Lupus, 95 of them will test positive given the 10% false positive rate that the test reports. So, this means that a total of 135 people out of our 1000 population will test positive for Lupus under this screening process -- 40 of whom actually have Lupus, and 95 of whom actually do not. So, given the fact that someone is among the 135 people who test positive for Lupus, the odds are actually 40/135 that they actually have Lupus, and 95/135 that they do not. This equates to a 29.4% chance that the guy who tests positive actually is positive for Lupus.

So think about that again -- we have a test with an 80% effectiveness and only a low 10% false positive rate. That's actually pretty good. But in stark contrast to the analysis in some of the comments (Goat goat goat), the most relevant figure in this problem from a mathematical perspective is the fact that only 5% of the underlying population actually has Lupus, the relative rareness of which drags down the 80% effectiveness rate of the screening process to mean that even one positive test result still makes it only 29.6% likely that the subject actually has Lupus. Funny enough as I mentioned earlier, hundreds of doctors were asked this question, and I think more than 75% estimated the chances of the subject actually having Lupus to be somewhere between 70 and 80%. I myself originally estimated a higher number before reading about the simplicity of the solution that produces the 29.6% outcome, as did most of the people I have asked this question to myself over the past couple of days.

As I alluded to above, the results of both of the above two problems go to show how much the human mind is hard-wired to underweight the relative importance of existing, known probabilities and instead to put more importance than it should on later changes to that probability. Thus, even though there was already a 1/3 chance that your box was the money box in problem #1, the human mind by its intuitive nature tends to be more willing than it should be to believe that that probability is much higher than it actually is once there are now only two boxes left in the equation. Similarly, in problem #2, the inclination of most people is to significantly overweight the 80% effectiveness of the test, and to significantly underweight the already-given probability of 5% of the total population having the condition to begin with. Bayes Theorem and the concepts underlying it help us to understand and identify these situations where people commonly make faulty assumptions and faulty estimates -- sometimes arriving at the completely wrong result, such as the doctor who assumes that his positive-testing patient is 75% likely to have Lupus when in reality he is actually less than 30% to have the condition. These errors are commonly made at the poker table as well, and tomorrow I plan to discuss some of those situations where Bayes' rule can be used to help properly make such calculations.

Labels: ,

22 Comments:

Blogger KajaPoker said...

I would still get a second opinion. And probably a third...

11:19 PM  
Blogger Eric a.k.a. Bone Daddy said...

my head hurts

12:00 AM  
Blogger Julius_Goat said...

A goat is asked two questions. One of his answers is correct. The other one is a dead rat. What are the odds that he will answer a third question properly?

12:10 AM  
Blogger Blinders said...

The lets make a deal one always screws with me. Lets me explain it this way (you are right BTW). If Montey always shows a dead rat, and you always switch.

33% of the time you are right and switch to the wrong box.

66% of the time you are wrong and switch to the right box.

You win 66% of the time with this strategy.

Quite the mindfuck though

12:29 AM  
Blogger Schaubs said...

Great thinker post.

Nicely done.

12:35 AM  
Blogger BigPirate said...

This comment has been removed by the author.

1:02 AM  
Blogger BigPirate said...

Retry!

http://www.marilynvossavant.com/articles/gameshow.html

1:03 AM  
Blogger Mike Maloney said...

Man, that first question is really fascinating. I tried explaining it to my fiancée and she told me I was dumb.

1:54 AM  
Blogger 4dbirds said...

I think the truly amazing thing of all this is that Mike actually got a girl to agree to marry him. Only kidding.

2:02 AM  
Blogger Hammer Player a.k.a Hoyazo said...

Mike, I have encountered that exact same reaction with probably 98% of the people I have tried to explain this to since the first time I was presented with it some years ago. The human mind just does not want to believe sometimes. Keep at it.

2:32 AM  
Blogger BigPirate said...

If anyone is trying to reach the website I attempted to post and can't, add .html after gameshow.

2:45 AM  
Blogger Unknown said...

Incorrect.

The correct answer is that it makes no difference if you switch or stay - your odds are 50/50.

The problem is in always assuming that your odds of winning remain constant throughout the exercise. As soon as Monty reveals the dead rat, you have put new information into the system. The statistics have to be redone, and they are very simple now. You have two boxes, one has a rat and one has money. You will always be 50/50 to get the money when Monty reveals the rat.

Look at it this way:

What if he revealed the money? Would you still consider your odds to be 1/3? No, of course not. Your odds have dropped to 0. You now have 100% chance of picking a dead rat. This is courtesy of the new information you have received.

I recall reading this problem before and realizing how poorly it had been applied towards describing Bayesian statistics. It's a shame to see this mistake propagated!

Biggestron

3:45 AM  
Blogger Hammer Player a.k.a Hoyazo said...

Biggestron, you are experiencing the totally typical reaction to what is admittedly a very difficult problem to grasp for just about everyone. You sound 100% like I did for the first couple of months after I first heard this problem described to me. Someday you will get it. Or maybe you won't.

But the answer will still be that your odds are 66% if you switch, and 33% if you stay.

I suggest that you take a look at the link that Pirate Wes provided in his comments above. The best part is that your comment here reads almost exactly like the initial letters received by Marilyn vos Savant when she first posed this problem 15 years ago or so, including your expressions of shame that I would propogate such a mathematical error.

Let me ask you this though: If, instead of 3 boxes, there were 100 million boxes and you randomly picked one of them, say for the sake of argument you picked box #1. If Monty Hall then proceeds to open up 99,999,998 of the other boxes and show you dead rats, leaving only box # 98,765,432 and box #1 that you initially had picked, are you still sticking with your initial pick, even though you picked it out of 100 million random boxes? Surely you have to see how silly that is.

4:00 AM  
Blogger jobo said...

I was struggling with this, so I decided to run some simulations.

The bottom line is that you're right (as you know).

If you (or more importantly, any of the other doubting Thomases) want to check out the results of my exploration, they are here

4:56 AM  
Blogger BigPirate said...

A vital concept to remember is that Monte will NEVER pick the door with the big prize behind. He makes a knowing pick, not a random one as he waits until the contestant has picked before deciding which other door he will reveal. That is what gives the contestant the additional information necessary to change her choice. Your odds will never be 0% after Monte reveals as he will never open the grand prize door.

5:54 AM  
Blogger Dr Zen said...

The hundred million thing is really good, because if you think it's 50/50 after thinking about that, you probably get really upset at not winning the lottery every week.

6:03 AM  
Blogger Pseudo_Doctor said...

hey hoy just got to reading your last two posts today since I was busy this week. Awesome thinking posts and my X-girlfriend and I have had the same argument on the first question before because it took me some time to understand it. Though after researching it for a while I realized that she was (as were you) right. The second question though is funny because its quite a simple problem and even though I'm not a stat's person is quite easy to come the correct answer.

Though it doesn't surprise me that most doctors get it wrong. The reason being is that in most medical school STAT's class is 2-3 week crash course for licensing and not a semester long class like it is in college.

6:11 AM  
Blogger Astin said...

Kinda makes Deal or No Deal more interesting, doesn't it?

No wait, that game still requires no skill.

But, assuming you eliminate all but two cases, and one has the million, do you switch? 1:24 (or however many they have), and then 24:1 if you switch at the end.

6:24 AM  
Blogger Unknown said...

Well...

The 100 million boxes is a *different* problem. In that case, your initial chance of picking the $ is very small, whereas Monty's chance of picking the dead rats is very good (as in 100% - he knows where they are). In that case you should switch.

For the case of only 3 boxes, I maintain that it is 50/50 once one of the dead rats has been revealed.

There is likely some cutoff in terms of number of boxes after which switching is +EV, but it is definitely not 3, as you initially had posed in the problem.

Anybody with a solid foundation in statistics (upon which odds-making is based) should know that low number problems are different beasties than high number problems.

I maintain that your first solution (which is the widely cited example) is incorrect. And yes Wes, I saw MVS's column on the problem and I think that she is incorrect as well.

I understand where you are going with the idea (I use Bayesian statistical reasoning almost every day in the lab), but the example given is mistaken and a poor piece of pedantry.

Biggestron

8:00 AM  
Blogger Astin said...

I'm surprised Biggestron is so adamant about this, considering the fact it has been proven time and again since MVS made the problem famous years ago. It's a classic stats problem that's been shown to be right countless times.

I will grant that Hoy missed a key point when he presented the problem (although it's implied by the fact it's Let's Make A Deal) - Monty will ALWAYS pick the dead rat.

There is a variable that the mind doesn't intuitively take into account here - that your choice of door limits the options Monty has when revealing. If you pick the winning prize, he has two choices. If you pick a rat, he only has one choice.

It's a priori reasoning, so it's a pain to wrap your brain around.

It doesn't matter what door you pick initially, the probility is 1/3. So let's say you pick door A.

Monty can only open a door that doesn't have a prize. He opens C.

Probality he opens C if prize is behind A? 1/2 = P(c|A)

Probability he opens C if prize is beind B? 1 = P(c|B)

Probability he opens C is prize is behind C? 0 = P(c|C)

So, probability he opens C (P(c))?

1/3 * 1/2 + 1/3 * 1 + 1/3 * 0
= 1/6 + 1/3 + 0
= 1/2

And now Bayes theorem:

Let P(x|y)= Probability prize is behind door x if door y is opened.

P(x|y) = P(x)P(y|x)/P(y)

Probability prize is behind A (your pick) if door C is opened:

P(A|c) = P(A)P(c|A)/P(c)
= (1/3)(1/2)/(1/2)
= (1/6)/(1/2)
= 1/3

Probability prize is behind B (not your pick) is door C is opened:

P(B|c) = P(B)P(c|B)/P(c)
= (1/3)(1)/(1/2)
= (1/3)/(1/2)
= 2/3

There's the statistical proof. But like I said, the fact that Monty ALWAYS picks a door that doesn't have the prize was missed in Hoy's posing of the question.

10:50 PM  
Blogger Blinders said...

The way I explained it is best guys. It is so simple. You make an initial pick and it is either correct or wrong. It is correct 33% of the time. When you are correct, Montey shows you a dead rat, and you switch boxes to the wrong one. 33% of the time you are now wrong. When you pick wrong though (66% of the time), Montey shows you a rat, and you switch from the incorrect choice to the correct choice. This happens 66% of the time so you pick the right box 66% of the time with the always switch strategy.

12:57 AM  
Blogger statman said...

There are many ways to show this – mathematically, by actual experiment, by simulation, by logic. We will do a thought experiment. Think of playing this game 300 times and assume that the contestant always picks Door 1 while the car (money) is placed randomly behind one of the three doors. Then we know that 100 times the contestant will have picked the right door. Of the two hundred other times, the car will be behind Door 2 100 times and behind Door 3 100 times. For any trial, when the contestant is offered a switch, he essentially is being offered to trade his Door 1 for both Doors 2 and 3. Clearly he knows that when he makes the switch that one of the doors must have a goat (rat). By opening the door that does have a goat the MC is simply confirming that. Remember, the MC knows where the car is so he only opens a goat door before making the switch offer. But, whether the MC opens a door or not, the offer of switching is equivalent to giving the contestant the choice of two doors to one, thus doubling his chances of getting a car.

10:13 PM  

Post a Comment

<< Home