Thursday, December 06, 2007

File Under "Cooking the Data"

Heavens! JAMA published a study suggesting that there may be some shenanigans in the way scientific papers are cited by those with an ax to grind. The article implies that "true believers" may be, shall we say...selective in their arguments when promoting their pet theories.

Click here for more.
This is demoralizing. JAMA published a study suggesting that there may be some shenanigans in the way scientific papers are cited by those with an ax to grind. This article implies that "true believers" may be, shall we say...selective in their arguments when promoting their pet theories.

The authors looked at two observational studies published by the New England Journal of Medicine in 1993 suggesting that high vitamin E intake was associated with marked reductions in cardiac events. In 2000, an RCT was performed that showed no benefit (and possibly some deleterious effects). They then looked at how frequently the two "disproven" studies were cited later in the medical literature.

One would have expected a rapid fall-off in the citation curve after the RCT. In fact there was a fall-off but not nearly as precipitous as one might have expected. Even as late as 2005 (the last year they looked) 50% of the papers citing the original articles were still favorable towards vitamin E. Many authors were persisting in promoting vitamin A on the basis of two non-RCT's despite markedly better contradictory evidence that became available later.

This is rather damning against either the thoroughness of the citing authors (who perhaps missed the RCT in their literature searches) or against their honesty (in perhaps ignoring and not citing it). The researchers of the JAMA article did something rather slick that suggests dishonesty.

They looked at a possible correlation with how favorable towards vitamin E the citing articles were with whether or not they also cited the RCT. This is what's a bit disappointing. They found that 35% of the articles unfavorable or equivocal to vitamin E failed to cite the RCT but 83% of the pro-vitamin E failed to. Of course the erroneous conclusions could be because of sloppy research but it seems much more likely that true believers simply choose to ignore data that doesn't support their beliefs.

As my five-year-old likes to say when she doesn't get her way: "How sad".

Labels: , ,

Tuesday, November 20, 2007

Statistics and the Medical Residents Who Misunderstand Them

Dr. Helen recently blogged on a study showing that medical residents aren't too good at statistics. This of course raises the possibility that they won't be able to critically read the medical literature and that their patients will ultimately be shortchanged.

Click here for more.
Dr. Helen recently blogged on a study showing that medical residents aren't too good at statistics. This of course raises the possibility that they won't be able to critically read the medical literature and that their patients will ultimately be shortchanged.

As one who teaches residents, I have a couple of points to make about this.

First of all, the study's results seem quite plausible to me. I like to think however, that our residents at Harbor-UCLA do a little better since we run a weekly Journal Club (which I occasionally lead). It's fairly well-attended because there's usually a free drug company lunch...I know. That's another blog topic altogether.

In Journal Club, we take one article with important clinical implications and dissect it in all of its gory details. We pay particular attention to the biostatistics and methodologies involved. The purpose of such discussions is to understand the study and its potential biases. We also try to determine whether its results are valid and its conclusions generalizable to the patients we actually treat. The bottom line is that I think we make a fairly reasonable attempt to teach this stuff.

I will concede however, that my belief that our residents are not typical of those studied may be wishful thinking on my part.

But there are some larger issues here. While I agree that an understanding of basic biostatistics is essential to putting the articles comprising the medical literature in their proper perspective, many of the methodologies currently employed are extremely complex. Without a very strong background, rigorous understanding of a lot of these articles is all but impossible.

Even for those papers that don't use esoteric statistical methods (stochastic modeling, complex applications of logistic regression, nonlinear correlation methods, etc.) the amount of time necessary to digest them just isn't there most of the time; not for residents, not for most clinicians. That being the case, many doctors rely on clinical guidelines for basic decision-making. The idea is that a bunch of top experts in a particular specialty get together in Zurich for a week or so and discuss the world's literature between ski runs. They then hammer out a set of recommendations that summarize their collective knowledge.

These guidelines are generally quite readable and have the advantage of representing a consensus of these supposedly great minds.

Admittedly, such position statements have many disadvantages. There are all kinds of biases that can creep into them especially given the substantial conflicts of interest that top opinion leaders accumulate over their careers. The reality is that few other solutions are that well embraced at present and no one is going to read every significant study that comes out on his own.

My point is, don't come down too hard on the poor resident with his or her suboptimal understanding of basic statistics. I bet Sir William Osler wasn't that mathematically inclined either.

Here is a whimsical piece of irony: Dr. Donna Windish, the first author of the study mentioned above points out that many residents only read the abstract of journal articles rather than the body itself. She goes on to say that there is data to suggest that abstracts don't accurately reflect the implications of their studies. I then noticed the following in the abstract of her paper:
"Residency programs should include more effective biostatistics training in their curricula to successfully prepare residents for this important lifelong learning skill."
For this statement to be true, the authors would have to be able to cite data that establishes that
  1. Such training does in fact "successfully prepare residents for this important lifelong learning skill."

  2. That this learning skill makes them demonstrably better clinicians (or why develop it?)
I seriously doubt that either of these points has been firmly proven in the literature...which more or less justifies Dr. Windish's point that abstracts don't accurately state the implications of their studies.

Finally, if my discussion of this study seems a bit superficial, it's probably because I only read the abstract.

Labels: ,

Wednesday, November 09, 2005

A New Medical Blog

I'm adding a new blog to my blogroll today: Within Normal Limits of Reason.

Not knowing the gender attached to the name of its author, Ming-Chih Kao, I've made, in the interest of readability, the regrettably sexist assumption that he is a he. Kao is a 3rd year medical student at university of Michigan and his blog appears to have begun last month so it's quite new.

I doubt that this will be of general relevance throughout the blogosphere but for those of us involved in study design and methodology, it can be quite interesting. Kao is a newly minted Ph.D in biostatistics from Harvard and has posted some provocative thoughts on the medical literature often from a statistical perspective.

For example, this post caught my attention.

It deals with the common practice of discontinuing randomized clinical trials after statistical significance is achieved. For example, assume a study is analyzing the outcome of a disease following treatment with drug A vs. Drug B over a period of time. What is generally done is to analyze the data repeatedly as the study progresses.

If drug A shows statistically significant superiority at any time during the analysis, the study would be stopped. The idea is that the study objective has already been achieved and that to continue would subject the patients in the drug B arm of the trial to the deleterious effects of being treated with an inferior drug. This would be unethical.

Kao, however, points to an article in JAMA (Journal of the AMA) that demonstrates some pitfalls in this approach that I doubt most people would have thought of.

The argument is subtle but the gist of the concept is this: there is a statistical dispersion of expected results each time you re-analyze data prospectively (with new data points). If one simply stops the study when statistical significance is achieved without taking into account this predictable dispersion, results can be skewed to large treatment effects that would have perhaps been markedly smaller (or even been reversed) had the study been permitted to continue to completion.

Kao concludes with the apropos observation that "A watched pot never boils. Or not."

It's pretty intriguing stuff and challenges some basic assumptions many of us have about such an approach. Kao's blog is filled with such postings. I look forward to his reflections on the medical literature from the vantage point of his statistical background.

Additionally, count on me to correct the error of my ways if I discover that he is in fact a she!

Labels: ,

Thursday, May 19, 2005

Liars, Clinical Tests and a Bit of Math

This next post is of virtually no general interest whatsoever. I'm being totally self-indulgent in that I really like this stuff but hey, it's my blog! Some clinicians may be a little interested however. We shall see.

Bryan Caplan of EconLog posted on our ability to distinguish truth tellers from liars. It's not as good as we might think. According to one study cited "People correctly identify truths 70% of the time and correctly identify lies only 50% of the time."

If lies are the disease, then this corresponds to what statisticians refer to as a sensitivity of 50% and a specificity of 70%. Please note that I'm restating the more pessimistic analysis of Kaplan's where truth is considered the disease.

Both of these performance indicators are pretty bad by medical standards. You'd definitely hope that that expensive spiral CT was better at detecting a serious blood clot in your lung!

The concept of test sensitivity and specificity is confusing to a lot of people especially doctors. Although it's a simplification, for most purposes, these numbers reflect only the test under consideration. What clinicians (and in today's example, CIA interrogators) are mainly interested is predictive ability, the ability of a test to predict the probability that a person has or doesn't have a particular disease (or is telling a fib).

We use knowledge of a test's sensitivity and specificity to help us with these predictions using a horrible piece of mathematics called Bayes' theorem.

To use the example here, assume a person lies 50% of the time. This is called the pre-test probability (or prevalence) of lying. This is about what we'd expect in a politician. With the above sensitivity and specificity, Bayes' theorem would predict the following:

If a person "seems" to be lying, there's a 63% post-test probability that he is.
If a person "seems" to be truthful, theres a 58% post-test probability that he is.

So given a 50% probability of lying in the first place, after actually observing the person, if you suspect a lie, our post-test conviction has risen only from 50% to 63%. Not very good in my opinion.

If we suspect truthfulness, our post-test conviction of truthfulness only rises from 50% to 58%. It would appear that we're worse at detecting truth.

One problem with this kind of analysis is that it is not very intuitive. juggling around sensitivities and specificities requires some mental gymnastics that few of us are capable of. Many different combinations of sensitivity and specificity (and probabilities of having the disease in the first place) will yield similar or different predictive abilities of the test. What is needed is a framework for thinking about tests that allows us to combine sensitivity and specificity into one number which can then be thought of independently from the disease prevalence.

Fortunately such a framework was published (at least in the medical literature) in 1999 here and here. These articles are not for the faint of heart! On the other hand, the concepts the author discusses actually make this whole thing easier. He's incorporated sensitivity and specificity, two somewhat nebulous concepts into two indicators: a positive and a negative likelihood ratio (PLR and NLR) which even an internist can understand.

Likelihood ratios are calculated from the sensitivity and specificity using some simple formulas. You can find the formulas here. You can also find an online calculator that does this for you here. I only want to discuss this conceptually and demonstrate how LR's are used.

When using LR's, you think in terms of odds instead of probability. This is the way things are done in Las Vegas. Instead of saying that there's a 50% chance a coin will come up heads, we say the odds are 1 to 1 (1/1 or 1.0). Instead of saying that the odds of a (non-crooked) die coming up 5 is 1/6 or 17% we say the odds are 1 to 5 (1/5 or .20).

The reason for using odds instead of probability is that the relationships between pre-test odds and post-test odds becomes VERY straightforward mathematically. No horrible Bayes' Theorem! The relationships are here:

If the test is positive:
(pre-test odds of condition) X(PLR) = (post-test odds of condition)

If the test is negative:
(pre-test odds of condition)X(NLR) = (post-test odds of condition-free)

This makes things easy to understand. In our example above, the PLR is 1.7. Use the online calculator to calculate this (and use proportions instead of percentages). This means that the odds of someone telling a lie increases by a factor of 1.7 if you think he looks dishonest.

No complicated Bayesian analysis. No fiddling with sensitivity or specificity. In fact, you don't even have to know the pre-test odds of disease. You just know that on the basis of the above-cited data, the post-test odds of lying is 1.7 times greater if you suspect lying. This gives you a better mental idea of how good the test is.

Likewise, the NLR is 0.7. So if you suspect that the person is telling the truth (a negative test), then the odds that he's lying go down to 0.7 times the pre-test odds.

Have I hopelessly confused you all? I hope not. The reason I mention all this is because once you get the hang of it, thinking in terms of LR's is much easier than thinking in terms of sensitivity and specificity. The medical literature increasingly calculates and cites the LR's for tests. Hopefully now you'll understand why they are useful.

If you're interested in this topic, I'd strongly suggest reading the link I cited above.

Labels: ,

Tuesday, May 03, 2005

Probability and Medicine

EconLog, linked an article by Dr. Richard Friedman on probability and medicine. In it, Friedman makes the point that many patients don't really understand the nature of probability in medical decision making. He cites the confusion a patient had when she was trying to understand that a 60% response rate with a given anti-depressant didn't mean that she would respond to it 60% of the time.

When he explained that she would either respond or not respond, she became confused and said "You mean my chances of getting better are really only 50%?" Clearly, she was mistaking the binary aspect of the treatment outcome (getting better or not) with the probability that she herself would get a response.

Friedman then speculated on why her patient might have had this misconception. He points out that mathematicians have attributed such problems on innumeracy "the arithmetic equivalent of illiteracy". He also mentioned that some misunderstanding might arise from a natural human tendency to not attribute bad (or any striking) events to chance.

Personally, I think the example Friedman cited has more to do with innumeracy. However, I don't like the word because of its pejorative connotation. The fact is that most of us have this type of innumeracy even doctors (if you can believe it). Probability is one of those terribly difficult philosophical problems that trouble just about all of us.

The dynamics of a clinical situation will determine the probability of a given patient developing a specific disease. A smoker has a higher probability of getting lung cancer than a nonsmoker, but an individual will either get it or not period. This sounds straight forward but a lot of people have problems with it. Some smokers never get lung cancer and some nonsmokers do. The reason is that smoking is not the only factor that leads to lung cancer. The more factors we understand (for example age, exposure to other toxins such as oxidants and genetics), the more precise the probability estimate will become.

This becomes very important as physicians increasingly embrace evidence-based medicine (EBM). In the desire to cite statistics of medical outcomes (such as the chance of developing a certain disease or the likelihood that a certain treatment will work) it is very important to recognize that every patient is different. The study population of a particular study will surely have a cross-section of many different types of participants. The patient's observed probability will be closer to patients more like himself -- maybe closer in ways that weren't even imagined or assessed by the researchers.

The original studies looking at the impact of cholesterol on cardiac outcomes didn't subdivide patients by measuring the different types of cholesterol such as LDL, HDL or triglycerides. Had they done so, individual probabilities of adverse outcomes could have been better stratified (as they have been subsequently).

As physicians, we have to do a better job of explaining these concepts to our patients. At the same time, we need to do a better job of understanding them ourselves! What's true in a study may not be true for a particular patient.

I want to close this post with my favorite probability brainteaser: If you flip a coin nine times and it comes up heads each time, what is the probability that it will come up heads the tenth time? I'll put the correct answer as the first comment to this post.

Labels: , ,

Monday, March 28, 2005

Requiring reporting of negative drug trial results

A Harris poll done in November of 2004 was reported in Internal Medicine News that showed, among other things, that 72% of the public would be in favor of some kind of legislation requiring that all negative drug trial results be published.

This is a very interesting question to have asked the public. My feeling is that the other 18% probably didn't understood the ramifications of such a subtle rule or it would have been 100% in favor.

A problem that has long affected the medical literature is what we call "publication bias". This is the selective publication of interesting studies, outcomes which are usually beneficial results from intervention trials. For example, if a drug is found to be effective at treating a particular condition, it is more likely to be published than if it is found to have no benefit.

Journals don't like reporting negative results because they're not sexy enough. Of course if the study demonstrates a deleterious effect of the drug, then that would be a sexy result and likely be published!

However, there is a more sinister reason for publication bias. When drug companies provide financial support for a clinical trial of one of their drugs and the results show no benefit, it may try to block the submission and therefore publication of the study. Technically, submission of the article would be at the discretion of the principle investigator (PI). But the thought of cutting off a future funding source may induce the PI to not submit.

One proposed solution to this problem would be to establish a registry for all studies regardless of the results. Outcomes and results would then have to be reported (and would be accessable by a medical literature search engine (such as www.pubmed.gov, the database maintained by the National Library of Medicine).

Obviously the drug companies would not be in favor of this and some people might argue about the intervention of big government in privately funded research. I don't believe that this is a valid argument in that there is clearly an overiding public interest in such a registry. Government is already involved in any study that uses human subjects at present. Mandating the public reporting of results of such studies seems like a very small additional burden.

Labels: ,