Don’t jump to conclusions with Doctor data

Unless you have been living under a rock (or you are just not “into” healthcare data journalism) you know that CMS is planning on releasing a massive data set about how doctors provide healthcare to Medicare patients (of course patient privacy will be protected).

This is a very exciting day. The Obama administration, HHS and CMS should all be applauded for taking Obama’s commitment to open government seriously! They have already and will continue to take heat from doctors who believe that the data will be used to hurt them. The AMA had a press release opposing the data drop, which they hastily removed (but which is still available on google’s cache). This is what they originally had to say about the matter:

“The American Medical Association (AMA) is committed to transparency and supports the release of physician data to improve quality of care. However, we also believe that certain safeguards are necessary to ensure that information is accurate and reliable for patients and other stakeholders.

“The AMA is concerned that CMS’ broad approach to releasing physician payment data will mislead the public into making inappropriate and potentially harmful treatment decisions and will result in unwarranted bias against physicians that can destroy careers. We have witnessed these inaccuracies in the past.

“To guarantee that information is accurate, complete, and helpful, the AMA strongly recommends that physicians be permitted to review and correct their information prior to the data release. This safeguard is not only practical but was recognized and included in other data release proposals, including bi-cameral and bi-partisan legislation supported by the AMA. Additionally, any analysis of the data released should note methodologies to ensure understanding of its limitations.

“Taking an approach that provides no assurances of accuracy of the data or explanations of its limitations will not allow patients to draw meaningful conclusions about the quality of care.”

Ardis Dee Hoven, M.D.
President, American Medical Association

Now the AMA has reversed course. They have removed the above press release from their news site, and an anonymous official has apparently spoken with an Associated Press reporter indicating that the AMA will not seek to enjoin the release of the data. For the AMA “not officially not opposing” something is as close to an endorsement as it gets. I expect that they will have something permanent to say on the matter pretty soon, and I will link to that here once they do.

Of course, we here at DocGraph disagree with most of the AMA’s brief opposing position. Generally, what a doctor has billed to Medicare is what they have billed to Medicare. The notion that a doctor is going to “correct” the billing record is a little silly. Even if the billing record were wrong for a doctor, it’s not likely they would go and engage with CMS to fix data. OIG has already documented the degree to which doctors ignore their responsibility to update their NPPES and PECOS record, and they can go and change that data at any time. So that point is pretty silly.

Most of the other points that the AMA makes here are pretty valid however. These criticisms are directed towards the press and the Internet community. Bad stories about doctors, (as opposed to good stories about bad doctors) have destroyed many careers, and if this data is presented poorly on the Internet, it could lead patients to make poor decisions.

The journalism and blogging communities have a responsibility to treat this data, and the doctors represented therein with respect. That means not jumping to conclusions. Using this data, it will be possible to see lots of new information about how doctors practice and how they are payed. But this data is extraordinarily complex; it will be very difficult to draw secondary conclusions from it consistently. Here are some core caveats for those seeking to work with the data set when it is released:

  • The first thing to understand about the data is that it is blinded to protect patient privacy. When less than 11 patients were treated by a given doctor, using a given procedure, that data is withheld by Medicare. Ensuring that more than 11 patients were involved in every publicized transaction ensures that this is a data set about doctors, rather than identifiable patients. I am not sure why 11 patients is the threshold instead of 10 or 9, but that is a common standard. Someone explained it to me once… my take away from that conversation was “because math”.
  • Some doctors have 90% of their patients payed for by Medicare, others have 9%. Some doctors do a lot of Medicaid (which will not be shown in this data release). Some primarily work with a single commercial payer, some work with lots of payers. You can think of all of the procedures that a doctor performs on all of his/her patients, no matter who is paying for the care, as a pie. Each source of income for the doctor is a slice of that pie.
  • The way the pie is sliced, and the size of each piece going to each payer is generally referred to as the “payer mix” in the healthcare industry. Payer mix makes analyzing this data especially difficult for doctors who do no bill Medicare much. In many case, these doctors will have trouble reaching the 11 patient per procedure threshold required to even have data in this release.
  • For many doctors this will mean that Medicare won’t have anything to say about how they practice at all, in order to protect patient privacy. Sometimes, for these “low billing doctors” they will randomly cross the threshold of 11 patients per procedure sometimes, but not others. It will be very difficult to perform accurate analysis on these doctors, in terms of their practice patterns as a whole. We should be very careful to not draw any conclusions at the low end of the spectrum. That doctor who “only” performed procedure X eleven times? That probably means nothing. What the doctor is actually doing with his/her patients is just not showing up at all.
  • The payer mix is further complicated because of the possibility that a doctor might only do one procedure for one payer. One can imagine a brain surgeon that only does brain surgery for Blue Cross Blue Shield patients, but does lots of “neurological consults” for Medicare patients. Typically the services offered by a doctor are relatively consistent between payers, but only “typically”, and at least some variation will be very normal.
  • Doctors that have lots of Medicare business will be easier and more productive to analyze. But be careful drawing conclusions about the “top billers” too. For many procedures (especially complex surgery) there is evidence that suggests that there is a “quality threshold effect”. If a doctor/surgeon does not get enough volume in a particular procedure, then in some cases it is difficult to maintain competency in that procedure. For some procedures this really matters. For others it doesn’t matter at all.
  • Because of the “payer mix problem” it will not be possible to reach a conclusion like “Surgeon A does three times as many X procedures than Surgeon B, therefore Surgeon A is probably better”. Surgeon B might be doing exactly the same number of procedures over all, but have far fewer Medicare patients. If you call both Surgeon A and Surgeon B and ask “what is your payer/procedure mix” then you have a much better chance at getting an accurate picture.
  • This data should include charge data. Charge data is an interesting topic that could use some careful investigation. The charge is like an “opening offer” that a doctor makes to Medicare, and then Medicare actually pays a completely different number. Sometimes charge data is used when billing patients without insurance, and sometimes it is used to calculate patient copay. There has been a lot of interesting and detailed articles that have come out recently about the interactions between “charge” prices and what patients ultimately pay. This data should make those analysis more robust, but in reality Medicare and Medicaid patients have strict policies on what expenses can actually return to a patient. As a result the charge data might have more implications for those who are on commercial plans.
  • Remember that this is not all of Medicare. Medicare Advantage plans are not covered here, and there are lots of people who opt for those plans.

I hope I have adequately scared you away from drawing conclusions simple “sort by column Y” type conclusions on this data. Medical Billing generally and Medicare specifically are extremely complex fields and it is easy to get lost. You might be asking “well what conclusions can be drawn here?”. I think the most interesting thing about this data is that we have really never had a solid picture of what doctors actually do for a living. Most of what the average public health student learns about the healthcare system is based on hearsay. Someone they know, told someone they studied with, once took a course from a guy whose wife worked at a single Cardiology practice and saw some interesting things in the data. Only people “behind the curtain” in healthcare have every been able to look at this data. This will not be a surprise to an EHR company, or a claims clearing house or a insurance company… but for the rest of us, for data scientist generally, this will be the most accurate picture of the healthcare system as a whole that has ever been revealed.

This data release will allow us to examine the most foundational partnership in our healthcare system, the collaboration between CMS and the AMA. I suspect that on some level, the AMA must be aware that this data release will serve as the ultimate testing ground for its CPT codeset. Currently HIPAA (yep, that healthcare privacy law) gives CMS the authority to dictate how doctors and payers communicate. This website lists their choices: X12 for bill formatting, ICD9 (becoming ICD10) for diagnostic codes, and CPT 4 for procedures. CMS mandates that everyone pay the AMA for CPT copyrights in order to gain the right to transact healthcare business. This is not just for Medicare/Medicaid, the HIPAA rule covers -any- electronic healthcare transaction, including those between doctors and third party insurance companies. The CEO of Sermo, a doctor social network at one time in partnership with the AMA, frequently rants against the way that the AMA uses its control of CPT codes to bully doctors, insurance companies and even CMS.

For the first time ever, it will now be possible to ask the fundamental question: Are CPT codes working? Is the AMA living up to the monopoly status granted it by the Federal Govt? Using the data from this release, as well as publicly verifiable data analysis methods, we can finally start to tease apart this question. Never before has the effectiveness of CPT codes been subject to public scrutiny. For years, industry insiders have been railing against the CPT code approval process, as well as the the controversial Relative Value Units (RVU) process, which ultimately dictates how much a medical procedure is worth (through the lens of the CPT encoding of that procedure). I cannot imagine that anyone other than the AMA believes that the CPT/RVU scheme is actually working well, but for the first time, we can quantify and categorize the problem. That is the first step on the road to something better.

Part of the mission of The DocGraph Journal is to support Healthcare Journalists as they write data-driven stories about the healthcare system. As a result, if you are a healthcare journalist and you are planning on doing a story on this data, I would be happy to provide you with a free copy of my book Hacking Healthcare. David Uhlman (my coauthor) and I both have an extensive background in Medical Billing and chapters 1,2,3 and 10 of the book cover medical billing concepts carefully. If you would like a copy, please send me a shout on Twitter. If you are not a Healthcare Journalist, I will see if I can get the O’Reilly folks to offer a sale on my book so you can save a few bucks. If you do not already know what “CPT, HCPCS and ICD9” mean… then you are really going to be over your head when looking at this data set.

More importantly, the entire DocGraph community is willing to help Journalists to make better stories with this data release. If you have a question about a particular provider, or about the data regarding a whole city or state, feel free to join the DocGraph mailing list and ask a question. We are here to help.

HHS is taking a huge risk in releasing this data in this manner. I don’t think that the AMA, as a whole, is against data transparency, but there are certainly detractors in that organization. If this data comes out and people publish a lot of half-baked stories or blog posts based on this data, it is going to give the “data hoarders” within the AMA and other organizations ammunition to prevent further data releases. The downside of transparency is that it creates the opportunity for careless slander. Let’s not do that. If you need help to write a good story about this, get in touch with me, and I will help. I would be happy to provide quotes, but also to help “sanity-check” conclusions. Believe me, there will be plenty of good-old-fashioned dirt that gets revealed by this data release, there is no reason to manufacture drama prematurely.



Update: Charles Ornstein wrote a piece over at Covering Health about this issue.




2 thoughts on “Don’t jump to conclusions with Doctor data”

  1. Pingback: Caution in order when tackling newly released Medicare data | Association of Health Care Journalists

  2. Pingback: The Big Medicare Payment Data Release | BigDataMedSci

Comments are closed.