Sunday, January 25, 2009

Evidence-based medicine and false precision

In Nassim Taleb's outstanding book, The Black Swan, he discusses how mathematical models are used in predicting the stock market. His general contention is that while there is a lot of thought and brainpower used in designing these models, they are still at best approximations of real financial markets, and can be catastrophically wrong. He discussed the 1987 stock market crash as an example. The book was published before this year's market collapse, but the 2008 crash helped illustrate just how important his point was.

I would term the phenomenon he is discussing "false precision." Because the statistical models work 70%, 80%, or 90% of the time, there is a false assumption that they will work 100% of the time. That is not true, but the precision that we see under "ordinary" circumstances can fool us into thinking that we will see that precision all of the time.

Many people have seen a similar phenomenon when they play fantasy sports. I have played both fantasy football and baseball in the past, but I'll use football as an example, since it is the more popular game (I don't play either anymore, since while fun, they are major time sinks). Fantasy football is what some will call a "mirror game"- it is based on football, uses many of the same principles of football, but it is not the same game. Back in era when I played, Daunte Cullpepper was probably the best QB in fantasy football, since he would throw a lot of touchdowns (thanks to Randy Moss and Cris Carter) and run for a decent amount of yards.

After playing for a few years, it was easy to start thinking that Daunte Cullpepper was the best QB in the NFL. He wasn't- Peyton Manning and Tom Brady were, even that wasn't reflected in their fantasy football numbers. What happened was that it became very easy to confuse that while fantasy football is a reasonable approximation of football, it is NOT the same thing as football. It is a mirror, a reflection. But the output data we received from playing fantasy football made us think that we knew more about the game than we really did. This false precision.

The data for the stock market and the data for sports are much, much cleaner than they are for medicine. It is not even close. The data for educational outcomes (used for things like No Child Left Behind) is much cleaner than it is for medicine, and No Child Left Behind, for all of it's strengths, is still dealing with a messy data set that only mirrors reality. Medicine is much messier than that.

Don't get me wrong- I think medicine needs to more toward using evidence to inform our judgments- just like you need to prep for your fantasy draft by reading and evaluating, just like you should pick you mutual funds by reading their prospecti, just like you want your kids to be educated using the most evidence-based techniques, when it comes to treating patients, I think we need to look to the evidence. We shouldn't be just making stuff crap up- we should make our best effort to actually use data.

We just can't be too arrogant about it. We can't assume that just because we are using the best data available that we are unfailable. We can't assume that because we are using evidence-based practices that our precision is any better than 50-70%.

I will be absolutely shocked if, within my lifetime, the predictive models we use for health care even approach the precision of the predictive models we currently use for the stock market, baseball, and the weather. And as we've all experienced, those models aren't particularly precise, and when they are wrong, they can be wrong with disastrous consequences.

The main reason medical models will never be as good is that the populations we are looking at are too heterogeneous. For example, the largest medical study ever conducted was the Women's Health Initiative. I am having trouble locating a specific cost for the WHI, but my memory was that it was a $12 billion study.

The study was about as well designed as a medical study could be. For all that said, it was largely inconclusive, and there have been literally hundreds of papers that have tried to parse the data to figure out what it means. The underlying problem is that the group of patients they studied- older women- was too heterogenous. It turns out that older women is too broad a category, and so the conslusions that would be true with one subgroup don't really apply to another subgroup.

Within my own field of specialty, the largest study ever dedicated to outcomes for patients with low back pain was the SPORT trial. Same problem- the group studied was too heterogeneous, and the study could not account for human behavior sufficiently, so we don't really have much more insight on how well back surgery works for low back pain then we did before the study was conducted.

Just to beat this dead horse some more, I am going to pick the first study in this month's New England Journal of Medicine. It is about using oral steroids for kids with wheezing, trying to determine whether steroids make a difference or not. The answer is "we can't tell," although I suspect it will be reported as "steroids don't make a difference." I just checked- the first story in Google News under the search terms (steroids, wheezing) is "
Oral Steroids Ineffective in Treatment of Preschool Virus-Induced Wheezing." Why is it reported this way? I suspect that headlines of "we just spent several million dollars studying something, and we know only marginally more than we did before we did the study" doesn't sell a lot of copy.

If we look at the actual data, the kids who got steroids did spend less time in the hospital than kids who didn't (medians were 13.9 hours v 11 hours), but that this difference did not achieve statistical significance. Reporting this as "steroids don't make a difference" is a simplification, because the real picture is more complicated. This is not a criticism of the study authors- they acknowledge the study's limitations in their discussion. But most people don't have time for shades of gray.

The way I would interpret this study is that there probably is a sub-group of kids who would benefit from steroids, and we haven't figured out what that subgroup is yet. We just don't know enough yet. Even if we studied this with the same resources we use or analyzing the stock market, we are going to be wrong at least 5-10% of the time. Sometimes catastrophically wrong.

Going back to something closer to my scope of practice- low back pain. Low back pain is the most common chief complaint I see, and I think that I am very good at managing it. I am not perfect, and nobody else is either. If you have a failure rate of 10% in treating low back pain, you are outstanding.

One of the most frequent questions I am asked is whether a patient is best managed by surgery, spine injections, physical therapy, accupuncture, massage, chiropractic manipulation, nutritional supplementation, etc. The short answer is "I have no idea, really."

That is just being appropriately humble- in truth, I am easily in the top 1% in my competence to answer the question, but I am aware of the limitations of my and all of medicine's knowledge on the topic. Low back pain is a very complex topic, far more complex than the stock market, and there is an upper limit to how precisely we can answer thesse questions. I am just aware of our limitations.

What I believe is that all these different approaches, ranging for traditional approaches like physical therapy, surgery, and injections to more non-traditional approaches like accupuncture and nutritional supplements, they all work, but we have not yet perfected how to stratify which approach will work for which patients.

Does this mean that "well, since we can't be perfect, we should throw away this information?" Absolutely not! But it does mean that we need to be humble enough to realize that we are going to be wrong 15-20% of the time, at least.

Giving specific examples, with patients with low back pain radiating leg pain, the evidence for use of a neuropathic agent like Neurontin is significantly stornger than a muscle relaxant like Flexeril (which, in truth, is more of an anxietiolic than it is a muscle relaxant), so therefore I will use Neurontin more often.

As another example, the data is much stronger in support of using a transforaminal approach than an interlaminar approach for epidural steroid injections in the low back, so that is the approach I use the vast majority of the time.

As yet another example, if a patient has a directional preference (e.g., pain worse with bending forward), then a physical therapy program that takes that directional preference into account is far more likely to be succesful.

So, of course, I should adhere to these evidence-based guidelines. I just can't be so arrogant as to assume that just because I am adhering to evidence-based guidelines that I will always be right.

Coming back now to health policy .... there is a strong movement underway to make physicians more accountable to evidence-based guidelines. I think this is a good thing. Patients who have diabetes should have their hemoglobin A1c checked regularly, patients should have their blood pressure checked regularly and placed on appropriate agents, etc.

We just can't get arrogant about it. These standards we are moving toward will better approximate good health care- they do not equal good health care.

Just my 2 cents.


Frodo said...

Interesting perspective.

The Stock Market analogy is probably not the correct one to make, and you may be better off making comparisons to meterology. The stock market is not a "real" physical system bound by the laws of chemistry and physics, at best it is linked to economics and psycology. Extremely short term predictions can be made (bad quarter, stock falls), or long term (30+ years - "the market as a whole will most likely outperform other investments). But any real models predicting stock behavior fall apart. This is why it is more lucrative to sell day trading strategies instead of applying them.

Things like medicine or meteorlogy, are actually linked to physical phenomenon. The 2-3 day weather forcast is pretty good these days, where enough the major variables known, and a decent approximation of what will happen can be made. As the forecast extends outward, the butterfly effect begins to take over, and noise from "minor" variables begins to accumulate making prediction limits extremely wide.

Likewise with medicine, it is reasonable to expect some accuracy with pharmokinetics, and have some predictive ability about the tightly defined effects of something like a drug in the short term, but as time builds on, things change, and the condition must be observed, and the new state must be accounted for.

Gary P. Chimes, MD, PhD said...

Appreciate the feedback, Frodo. I still think medicine is significantly harder to model than meteorology. I think this is primarily because the outcomes are harder to quantify. For weather, the outcome is something like the temperature, barometric pressure, or inches of precipitation. Even indexes like heat index are relatively straightforward.

Outcomes in medicine are much harder to define. What is a good outcome? Wellness?

For the area that I know best, low back pain, this has been really hard to define. Pain scores are notoriously unreliable. There are more developed outcome measures, things like the Oswestry Disability Index (ODI), but these tools are far from complete and transparent.

Using a sports analogy again, think of the level of dissatisfaction many have with QB rating- most people have no idea what it really means. But that is still a much more straightforward idea than the ODI is for low back pain.

I think that, to me, gets to essence of what makes medicine so hard to model- you can't predict an outcome well unless there is a general agreement of what outcome you should be tracking.

Frodo said...

Fair enough. I guess my point is, that nobody really knows what they are doing when predicting the Stock Market.

I would hope that Doctors and Weathermen do.

Gautam M said...

would you agree that our field is relatively uncomfortable with regard to outcomes when compared to trauma surgery, oncology, or infectious disease?

Gary P. Chimes, MD, PhD said...

I think Physical Medicine and Rehabilitation is uncomfortable making definitive comments in general, whether it be regarding outcomes or any other statement. I don't know that is necessarily a bad thing, though. I think many specialties speak with a false precision they are not really entitled to. To be fair, it is probably not the medical scientists themselves who speak with false precision, but more the media's interpretation of the results. I think there are very few things that medicine can state with more than, say, 70% certainty.