DocGraph Teaming data update

CMS has released an updated version of the DocGraph teaming data set* that was redacted on October 5, 2015. The DocGraph Teaming data set documents how healthcare providers in the U.S. work together. We believe this release corrects the critical issues we identified prior to the redaction. Access to the raw data can be found through our Linea data portal, and higher-level support and software services can be found from our sister company, CareSet Systems.

The improvements to the data set includes:

  • Updated documentation which clearly states the date ranges found in each downloadable file. (The previous version of the data was retracted because the date ranges were mislabeled);
  • Very recent data. This data release has 2015 data updated through Oct 1, 2015;
  • Use of a consistent algorithm for all years from 2009-2015, which makes year-over-year analysis possible. The CareSet Systems blog will have several articles coming out soon about our year-over-year analysis of these data sets.

We expect to work with CMS to validate the data set in the coming months. We know that this data is a dramatic improvement on what was previously available, but we have not had the opportunity to review the data creation methods and validate that it was performed according to our algorithm specification. Until that happens, DocGraph cannot vouch for the data. We can only say that CMS is “vouching” that the data is fixed by releasing it, and that the most of the issues with the previous data sets appear to have been corrected.

As always, DocGraph would like to thank CMS and HHS for their continued commitment to openness.

There are still a few documentation issues with this data release, and we will be coordinating with CMS on an ongoing basis to correct them. Until then, we encourage the DocGraph community to keep the following in mind:

  • CMS has not explicitly documented the algorithm they used to create the data set. This algorithm has changed and is not 1-to-1 comparable with the data sets previously released. They appear to have made the modifications to the algorithm that we suggested, but we have no way of verifying that for now.
  • CMS continues to refer to the data as a “referral” data set, despite it being a “shared patient in time” data set that includes “referrals” as one subset. While this data does include traditional referrals as a subset, this is not strictly referral data. There are fields used in medical claims to document referrals and this data set was not generated using those fields.
  • CMS continues to label the file as “physician” despite it covering all medicare provider types (except pharmacies, due to the exclusion of Part D data). Being a physician is not a prerequisite for being included in these data sets, any provider who bills medicare enough to meet the patient privacy threshold will be included in the data set.
  • There are coding problems in the datasets. Specifically, there are many “impossible” NPIs (National Provider Identifiers). For instance, all of these NPI’s are returned from the query [of which year, 2014]:

npi, npi_count, problem
6073299,2,”This NPI is not 10 digits, it has 7: 6073299″
16073299,2,”This NPI is not 10 digits, it has 8: 16073299″
135632034,8,”This NPI is not 10 digits, it has 9: 135632034″
162909399,1,”This NPI is not 10 digits, it has 9: 162909399″
162915969,1,”This NPI is not 10 digits, it has 9: 162915969″
174031524,1,”This NPI is not 10 digits, it has 9: 174031524″
999999992,1,”This NPI is not 10 digits, it has 9: 999999992″
1063828204,1,”This NPI is does not pass luhn: 1063828204″
1194809840,3,”This NPI is does not pass luhn: 1194809840″
1245396655,1,”This NPI is does not pass luhn: 1245396655″
1619944960,1,”This NPI is does not pass luhn: 1619944960″
1740228645,3,”This NPI is does not pass luhn: 1740228645″
1750458455,5,”This NPI is does not pass luhn: 1750458455″
9999999991,502,”This NPI is does not pass luhn: 9999999991″
9999999992,11634,”This NPI is does not pass luhn: 9999999992″
9999999994,116,”This NPI is does not pass luhn: 9999999994″
9999999996,2975,”This NPI is does not pass luhn: 9999999996″

Based on previous investigations, we know that the contractors who generated the files are faithfully returning what is listed in the NPI field. The underlying problem is with the actual Medicare claims database. We suspect that these are the last vestigial organs of pre-NPI billing systems, but we cannot be sure. Happily these strange numbers are far less common in the 2015 data set. Perhaps CMS is succeeding in squashing the non-NPI coded transactions once and for all.

Here is a NPI validity report for the 2015 data:

npi, npi_count, problem
9999999991,103,”This NPI is does not pass luhn: 9999999991″
9999999992,1569,”This NPI is does not pass luhn: 9999999992″
Much better!

For real-time discussion about these data sets, join the DocGraph google group.

For a more thorough exploration of this data, sign up for CareSet news (link to sign up here).

Enjoy… and watch this space! We will be opening lots more data in 2016 and beyond.


Fred Trotter

Co-founder, DocGraph and CareSet Systems
* DocGraph teaming data shows how healthcare providers who bill Medicare cooperate to deliver care to their patients. Essentially, the teaming dataset documents Medicare providers who share patients in a given year. The result of this is a data structure that data scientist call a weighted directed graph of relationships. To be more specific, the method used to generate the graph converts the bi-partite (two types of nodes, patients and providers) graph structure into a graph with just one type, a graph showing relationships between providers. In layman’s terms, the DocGraph teaming data set is massive map of the healthcare system in the United States. It shows referrals, ordering patterns and many other types of healthcare provider collaborations. This data set exists as the result of a FOIA request made by DocGraph. Since that time, DocGraph has continued to collaborate with CMS to ensure that the data was updated and reliable.

Announcing EHR Vendor Attestation Report Card and Data

UPDATE (May 20, 2015) It looks like the forces for open data won this one! Here is the summary at fiercehealth and the actual policy change letter sent to payers.

DocGraph is crowdfunding releasing a dataset revealing how specific EHR vendors perform on Meaningful Use attestation, bringing greater transparency into the EHR industry. Until now it was easy to see how providers were performing on Meaningful Use attestation, but it has been difficult to hold EHR vendors accountable for attestation performance. What will likely be more controversial is that this data release will amount to the release of the “attesting” client list of every EHR vendor in the country.

An initial dataset will become available on the last full day at HIMSS, and the crowdfund will continue until Datapalooza. This post discusses our underlying motivation for creating a new dataset, as well as some of our goals with its release.

I enjoy and appreciate many aspects of the annual HIMSS conference: the people who run it, the attendees, educational sessions, and keynotes. Further, I find that regional and local HIMSS events are well worth attending. However, I am not a fan of the “big” HIMSS tradeshow floor. The parallels between walking down the “main aisle” at HIMSS and walking down the strip at Vegas creates are striking. The opulence of the Vegas strip and the excess in the HIMSS tradeshow floor both stir a sense of unease and bring up the same questions: “Who is paying for all of this? Is someone getting fleeced? Is it me? If it is not me, would that make the fleecing OK?”

The HIMSS tradeshow floor is a necessary evil because we have, in Health IT, no better way to make decisions about what products we buy. As it stands, figuring out which vendors have the biggest booths at HIMSS is probably not the worst way to make decisions about EHR systems.

The alternative is to hire someone to tell us which EHR vendor fits us best. Probably the most famous provider in this space is the “Best in Klas” service. However, Klas is famous for being payed by both sides of the industry. Klas is paid both by potential EHR purchasers and by those who sell EHR system. Like HIMSS, Klas creates a space for buyers and sellers to meet. I think Klas and HIMSS both do an admirable job trying to maintain fairness and objectivity, given the massive financial biases under which both organizations operate.

Both HIMSS and Klas are profiting from an underlying problem in the Health IT industry: Information Asymmetry. Anytime two people are making a business arrangement and one party has substantially more information than the other, there is a tendency to abuse that knowledge. This is why it is so uncomfortable to buy a car from a used car salesman. What do they know about this car that they are not telling you? Even though most used car salespersons are likely to be honest people who do not take advantage of their extra information, enough of them succumb to temptation to give that industry its seedy reputation. There are parallels between the Health IT industry and the used car marketplace:

  • Buyers are less informed than sellers. Health IT is complex enough that buyers do not understand how to evaluate an EHR product effectively, or even how to gain that understanding.
  • Vendors in the marketplace use Information Asymmetry to their advantage. Multiple vendors offer services that are very reasonable in cost. Some of those reasonably priced vendors do a terrible job, and some do a great job. Vendors with established reputations charge tremendously expensive prices. Buyers want to avoid fly-by-night operators, so they frequently overpay for merely adequate products.
  • Established vendors want to control their reputation in the marketplace. But what is unique to the EHR industry is how they do this: frequently introducing contractual constraints to ensure that buyers do not leak negative information into the public. This makes it very difficult to establish transparency, since many EHR buyers are forbidden from discussing the performance of their EHR software or their EHR software vendor.

There are several services, most notably the Kelly Blue Book, that help to provide consumers with the information that they need in order to effectively make used car purchase contracts. While Klass and HIMSS attempt to take this role in the Health IT marketplace, both of them “serve two masters” and have become, in some respects, part of the problem. Of course, Blue Book likely sells more of its pricing guide information to car salesmen than it does to consumers. It should be noted that DocGraph is sponsored by EHR industry players as well. How does the Kelly Blue Book maintain its credibility? The Blue Book is purely objective data. It does not make subjective statements about a particular car, or a particular car manufacturer. Its role is only as a data aggregator.

Recently, Niall Brennan’s team at CMS, and their counterparts at ONC modified how the EHR Attestation data was released. This modification, which we will discuss in detail later, has made it possible for DocGraph to create a new merged dataset about the EHR industry. We cleaned up this dataset in order to make a functional data release. This “derived EHR dataset” allows specific EHR vendors as well as specific EHR products to be tied with specific healthcare provider attestation results. The “unmerged and uncleaned” original dataset only allows for EHR providers to be compared in their attestation performance. The new DocGraph EHR dataset will allow vendors and products to be compared too.

This dataset can produce is a “Blue Book” style report card on vendors and products in the EHR industry. Over the next few days, we will be releasing such a report card. As far as we know this will be the first time the EHR industry as a whole will be held accountable for their performance in Meaningful Use attestation.

This is the first of several useful reports that are relatively easy to produce, using a dataset which associates specific attestations with specific EHR products. It is also important to understand if some EHR products are more likely to be “fired” and swapped, or to “fall out” of the attestation program. We can calculate who has benefited most by the extension of the Stage 1 funding. We can generate report cards that detail which EHR vendors perform best on measures that are beneficial to patients, public health reporting, etc.

A more controversial side-effect of this data release will be the association of attesting providers with their respective EHR provider(s). This amounts to publishing the majority of the client lists for every EHR vendor in the United States.

In reality, completely measuring the nuances of the Health IT industry by leveraging Meaningful Use attestation data will not work. As a result, we are announcing a crowdfund to support our efforts to create more open data about EHR vendors. Like all crowdfunds we offer rewards like books and t-shirts. Like all DocGraph crowdfunds we are also offering exclusive early access, and significant discounts to the new EHR dataset.

Tired of “out of network” insurance games

UPDATE (May 20, 2015) It looks like the forces for open data won this one! Here is the summary at fiercehealth and the actual policy change letter sent to payers.

If you spend much time in the patient community you meet someone who has been burned, badly by the “out of network” game that insurance companies play with/against healthcare providers.

Its simple, you get insurance plan A from company Z. Then you go to a specialist or get a scan or something and you ask, “do you take company Z insurance”? They say “sure”. You hand them the insurance card. What they don’t tell you is that they will be billing “out of network” which means they will be hardly covered at all.

You go to the insurance company, they point to the provider. You go to the provider, they point to the insurance company. Who is left with the huge bill? The patient.

Sometimes this gets really bad, in the worst cases important treatments to relieve suffering are delayed.

Are you tired of this? In order to fix this, we need to be able to build systems that tell us for sure which providers are in a given plan at a given time. We need to have that system available when we purchase our health insurance so that we can buy insurance that covers the doctors that we already use, or the ones that we want to use. We can imagine a theoretical tool called that solves this problem in a user friendly way.

There are lots of companies and journalists in the DocGraph community that would love to be able to build such a tool. DocGraph would love to provide the data for such a tool but right now that would require that we scrape the websites of every insurance company provider directory in the country. Those websites are really unfriendly to such efforts. The following text was taken from the user agreement of the doctor finder tool for Aetna:

By using DocFind, you acknowledge and agree that DocFind and all of the data contained in DocFind belongs exclusively to Aetna Inc. and is protected by copyright and other law. DocFind is provided solely for the personal, non-commercial use of current and prospective Aetna members and providers. Use of any robot, spider or other intelligent agent to copy content from DocFind, extract any portion of it or otherwise cause DocFind to be burdened with unwarranted high access or transaction activity is strictly prohibited. Aetna reserves all rights to take appropriate civil, criminal or injunctive action to enforce these terms of use. 

Provider information contained in this directory is updated 6 days per week, excluding holidays, Sundays, or interruptions due to system maintenance, upgrades or unplanned outages. This information is subject to change at any time. Therefore please check with the provider before scheduling your appointment or receiving services to confirm he or she is participating in Aetna’s network. Participating physicians, hospitals and other health care providers are independent contractors and are neither agents nor employees of Aetna. The availability of any particular provider cannot be guaranteed, and provider network composition is subject to change. Notice of the change shall be provided in accordance with applicable state law.

The underlines are mine.

First, Aetna does not want anyone scrapping their website. They do not want people like DocGraph to create these data sets. They view their list of providers as a protected information asset, that only they can leverage.

But more importantly, they put the responsibility on “who is in what plan” squarely on the doctors. Which really means the patients, because the doctors websites will just say “check the insurance company website”. See what I mean about finger pointing?

Insurance companies, and healthcare providers need to be held accountable for their in vs out status. The only way to do this is to create open data set that maps Plans to Providers so that projects like is really easy to build.

The policy wonks at HHS/CMS/ONC et al get this. The have recently added the following text to the rules for the 2016 insurance plans.

…we propose that a QHP issuer must publish an up-to-date, accurate, and complete provider directory, including information on which providers are accepting new patients, the provider’s location, contact information, specialty, medical group, and any institutional affiliations, in a manner that is easily accessible to plan enrollees, prospective enrollees, the State, the Exchange, HHS and OPM. As part of this requirement, we propose that a QHP issuer must update the directory information at least once a month, and that a provider directory will be considered easily accessible when the general public is able to view all of the current providers for a plan on the plan’s public Web site through a clearly identifiable link or tab without having to create or access an account or enter a policy number….(blah blah)…We also are considering requiring issuers to make this information publicly available on their Web sites in a machine-readable file and format specified by HHS.

underlines are mine…

This would solve the problem. Anyone who wanted to could create a website that showed what plans any given provider accepted, would be able to easily do so.

But they key word here is “propose”. Insurance companies in this country benefit greatly from the confusion about in network and out of network, and so do some unethical healthcare providers. There will be lots of people who oppose this proposal.

I hope that I have made the case that this information needs to be open and machine readable. If your convinced, then you can find the comment page to support this policy here. If you disagree with us, and you still want to submit a comment, you can use this page.

Please take a few moments and write in to support this policy change. The comments are due Dec 22nd 2014 which is basically tomorrow.

If you would like to read the in-progress comments from the DocGraph Journal you can go here. Feel free to cut and paste from out comments into your own comments, we would be flattered.

Feel free to tell them that I sent you 😉

-Fred Trotter


DocGraph Summit recap


The DocGraph Summit was a great success, a big thanks to everyone who made it down. We filled our day discussing current open health data initiatives, questions, and goals. We are grateful to Houston Technology Center for providing an excellent venue, and we are already scheming for the 2nd annual Summit next year! Check out the Storify here: .










Next Generation DocGraph Data

After many months of mild government employee harassment, and delays based mostly on “other projects” that HHS has been working on, (have I ever told you how many of my friends at HHS were pulled on to the launch… almost all of them), I am proud to announce that a new, improved update of the DocGraph Edge data set has been released.

This is not just an update, but a dramatic improvement in what data is available. After working with the original DocGraph, we thought of several fundamental improvements, and our FOIA request was much meatier this time. If you liked DocGraph before, you are going to love it now.

First, lets talk about the contents of the data. The original data set had three columns:

FirstNPI, SecondNPI, SharedTransactionCount

The SharedTransactionCount was the number of times that FirstNPI had seen a given patient first, and SecondNPI had seen the same patient later, within a 30 day window. (If that is tough to follow, you can read the full documentation of the original version of the data set). SharedTransactionCount was a measure of overlapping patient transactions, but we did not know how many patients were included. There was a threshold of patient count of at least 11 patients that had to be met. So if the SharedTransactionCount was 1100 there was no way to know if that meant 1100 patients, or 11 patients 100 times each. At least, that’s how the previous data set worked.

The new data set includes the actual number of patients in the patient sharing relationship. The new data set has the following data structure:

FirstNPI, SecondNPI, SharedTransactionCount, PatientTotal, SameDayTotal

The PatientTotal field is the total number of the patients involved in a treatment event (a healthcare transaction), which means that you can now tell the difference between high transaction providers (lots of transactions on few patients) and high patient flow providers (a few transactions each but on lots of patients).

In the original data set, you knew that the two treatment events happened somewhere between “on the same day” and within 30 days. In this new data set, you can differentiate treatment events that happened on the same day, using the SameDayTotal field. Now you can see how often the services were provided on the same day, which is really a whole new graph, with a 0-day window.

But wait…there’s more!! We also got additional “windows” beyond 30 days. We have data for 60, 90, 180 and 365 day windows. These data sets are much larger. The data is spread between 2012 and the middle of 2013 (which is not actually what we asked for, but we will take it). These data sets are enormous:

Window Edge Count
30 day 73 Million Edges
60 day 93 Million Edges
90 day 107 Million Edges
180 day 132 Million Edges
365 day 154 Million Edges

This means that for every edge in the database, we now have three weights instead of just one, and we have more than double the number of edges in our largest-window data set. I look forward to the DocGraph community doing a much more detailed analysis of this data set.

Probably the most significant announcement that we have to make is that we are releasing this data set for free and without any restriction. We have started a new DocGraph Alliance in which large companies pay the DocGraph Journal to reveal new and more interesting data sets, and to support the DocGraph community in analyzing open data. We will probably still crowdfund data sets when they are “brand new” but for older datasets like the DocGraph Edge data set, we are moving towards a fully open model, sponsored by Alliance Members. With that in mind, we asked HHS to go ahead and publish the newest version of DocGraph data on directly on their site for everyone to see. This means that the data can be used for any reason by anyone, without a license restriction by anyone. We hope to be announcing the initial DocGraph Alliance members soon, but you can thank them for sponsoring this model!

With that in mind, please find the data below:

What kinds of amazing things can you do with this new dataset?

Let us know on the DocGraph Community Google Group.


Fair Health: One step forward, One step back

Fair Health was formed as the result of a settlement between a group of insurance companies and the District Attorney’s office of NY. The site provides a front end to its considerable pricing data, but the data is coded in CPT codes. Fair Health has taken the approach of licensing CPT descriptions from the AMA, as well as making agreements to get access to even more claims data than it was originally entitled to under the settlement. Specifically, from the Fair Health FAQ:

FAIR Health welcomes organizations to link to our website and download materials for consumer use. FAIR Health incurs fees from third parties such as the American Medical Association for use of healthcare codes in its Lookup tools, however, so links to for commercial purposes require a license agreement and payment of nominal fees.  Such commercial purposes include, but are not limited to, links established by providers or third party payors in connection with participation on state or federal health benefit exchanges.  Please contact us at for further information.

FAIR Health also licenses our consumer resources, including educational material, videos and cost lookup tools for use on organization websites and for other uses. To learn more about licensing opportunities and the associated costs, contact

(emphasis mine)

These agreements culminate in a service agreement that actually attempts to ensure that third parties that link to the site pay a fee. I would be hard pressed to find any other site on the Internet that makes such a stance in its Terms/Conditions/AUP, and I am a little surprised that Fair Health believes this is reasonable… Here it is in short:

Hyperlink Use and Disclaimer. If you or your company has accessed the FAIR Health Consumer Site through the use of a hyperlink, you agree and acknowledge that you will follow the rules set forth below:

a. All links shall link only to the FAIR Health Consumer site home page currently located at (“Consumer Site”).

b. You shall not attempt to modify, alter or frame any content on the Consumer Site. We reserve the right to review your website at any time to ensure that the link is being used appropriately.

c. FAIR Health is a New York not-for-profit corporation qualifying under section 501(c) (3) of the Internal Revenue Code. Your use of a hyperlink shall not be construed to imply sponsorship or endorsement by FAIR Health of you, your website or your products.

d. We do not necessarily review or approve of the content displayed on all websites that have linked to the Consumer Site.

e. Your website shall not include any description of FAIR Health or its products without the prior written consent of FAIR Health.

f. You agree that all FAIR Health proprietary trademarks, service marks and logos (collectively, “Marks”), belong exclusively to FAIR Health and when you use these Marks on your website, you must comply with FAIR Health’s standards. Any such use must be approved in writing by FAIR Health.

g. If we object to the link between your website and the Consumer Site for any reason in our sole discretion, you agree to remove it within twenty-four (24) hours of receiving notice from us.

h. Your use of a hyperlink linking your website with the Consumer Site is at your own risk.

(emphasis mine)

You can read the whole Terms and Conditions here. It is not lost on us that Fair Health believes that providing a direct link to the Terms and Conditions, is in fact a contradiction of the Terms and Conditions in (a). Of course, given that we are writing this article without permission, we are also violating (e). We will obviously not be subjecting this article to approval from Fair Health, which means we are also contradicting (b) and (g). In the effort for full disclosure, we also submitted the FAQ and terms to the Internet Archive, so that we can tell if it has been changed. This would also contradict the terms, which would categorize this action as equivalent to uploading a virus to their servers. Given that we obviously cannot accept the Terms and Conditions, we will obviously not be using the website. However, I am not sure how Fair Health believes that given our outright rejection of these terms, they can control what we do or do not link to. Or how they expect to give Google and Yahoo access to their facts about their changing content, but to deny us the same privileges.  Most importantly, in Fair Health’s mind, our ability to read the terms, means that we have agreed to the terms.  Fascinating, no?

Their commitment to “reading but no parsing” goes so deep that Fair Health has disabled right mouse clicks using javascript. This prevents both copy and paste, but also “open in a new tab” etc etc. Normally when I see draconian steps like this, I also see a complete lack of attention to accessibility issues. To their credit, this is not the case with Fair Health. With the exception of some missing labels, their initial form was actually fairly accessible. While I am concerned that some users with disabilities might rely on right click menu items to enable their plugins, most of the standard screen reader technology should work just fine on the site.

This puts Fair Health into the interesting position of providing some transparency, without taking an open data approach. This is problematic because the “interests of the public” are only halfway met. Fair Health indeed enables patients to lookup cost data, but it does not allow for data journalists and data scientists to examine the trends and patterns in the same data. This halfway measure would not be such a problem were it not for the implicit endorsements that Fair Health seems to be getting from both industry and government, that this approach equates to transparency for the health insurance industry. It clearly does not.

Fair Health is obviously providing an important service for consumers, but I fear that in the long term this half measure may come at the expense of true transparency. It is easy enough to endorse them for working hard to move in the right direction. However much we dither about open data tactics: They deserve credit for the progress they have made! Obviously, the DocGraph project will never pay for access to data that comes with a caveat that it cannot be shared openly, and we certainly do not accept the terms of the Fair Health AUP. But in the spirit of not being needlessly confrontational, we can at least link you to Google Search results for Fair Health, rather than linking to them directly.

Great reference article about the settlement.

Comments on the aftermath of Fair Health

UPDATE April 29, 2015: Recently, Fair Health further restricted the number of searches that are available.


HIMSS always has lots of Open Source and Open Data, but you have to know where to look.

This year, the standout sessions were from the FDA Open Data talk and Fair Health.

The FDA is slowly announcing its new set of APIs. You can read more at and follow the latest at the @openfda twitter account. I heard a talk from Taha Kass-Hout, the Chief Informatics Officer of the FDA.

I will try to get slides to post here, or I will upload the blurry photos I took of the most important slides…

For now I want to cover some of the important themes from the talk.

First, the FDA is continuously investing in cloud-based technology that enables it to be open when it should be open, and secure when it should be secure. Pharma companies are now uploading genomic sequencing data to support many of their drug approval processes, and these uploads can be many Terabytes of data each. The space needed to effectively process and examine this data is frequently an order of magnitude larger than this. This is not at all the only source of genomic data that the FDA is processing. When it investigates food-based pathogen outbreak, it is sampling those pathogens and sequencing many of them. All in all, the FDA is now dealing with a deluge of incoming and outgoing data.

The FDA has built/is building a three-tiered cloud approach in response to this massive growth in data processing requirements. First, it is investigating internal cloud infrastructure with strict access control in order to protect the trade secrets that it implicitly gets from pharma companies, along with their data uploads. It is also building a public cloud infrastructure in order to effectively collaborate in the open, when its data allows openness, which it frequently is. Lastly, it is developing a hybrid cloud infrastructure to handle “middle cases”. All of these cloud infrastructures are designed to exchange data with each other, and with FDA sites spread across the country.

Of course, the DocGraph project is probably most interested in coming improvements to FDA labeling data. Drug labels are famously difficult to deal with, they are flat text files which must be carefully processed in order to correctly interpret them. The FDA is promising dramatic improvements in what is available here. This could have tremendous implications for in-the-open analysis of medication data…

The Open FDA also has a mostly empty github page, which is worth watching as it populates.

CMS Changes data policy and prepares to release data

Lots of you have written me to let me know that CMS has changed its policy regarding physician payment-related data and plans to release data soon.

They have also published the comments that they received on the policy change. We will be looking at these comments further, because they are an interesting data point in themselves.

The good news here, is that this represents an acknowledgement that physician salary will no longer result in a guaranteed no for FOIA requests. The downside is this is something of a non-policy, with some mixed messaging. For instance, the blog post and the actual policy document, don’t actually say the same thing at all.

As CMS makes a determination about how and when to disclose any information on a physician’s Medicare payment, we intend to consider the importance of protecting physicians’ privacy and ensuring the accuracy of any data released as well as appropriate protections to limit potential misuse of the information.

The problem here is that there is no information on how the balance between “accuracy of data”, “physician privacy” and “misuse avoidance” might mean. More importantly, it is not clear at all that CMS actually has the right to make determinations on “the accuracy of data” or what “misuse” might entail. The actual policy document is much more circumspect about the determinations that CMS can make:

CMS will make case-by-case determinations as to whether exemption 6 of the Freedom of Information Act applies to a given request for information  pertaining to the amounts that were paid to individual physicians under Medicare

The exemptions to FOIA are clearly spelled out, and there is no exemption for “inaccurate” data, or for “misuse” unless misuse = national security risk (exemption 1). So the blog post is somewhat confusingly claiming that A. They are are going to do something they do not have the right to do under FOIA and B. Something beyond what their actual published policy change states.

Clear as mud.


MRW I am trying to understand the new CMS FOIA policy

Essentially, the new policy is a somewhat confusing non-policy. They will release data when it is a “good idea” to release data, evoking the 6th exemption to FOIA (that’s the one that ensures that FOIA requests do not invade privacy) whenever they feel that is appropriate. Not sure what happens when I request something that can be “misused” or something that might be “inaccurate”. Personally I am probably more interested in the inaccurate data than anything else, since that is where the juicy stuff probably is…