Data release: Open Provider Directory and Open Formulary comment data

Recently, HHS released a proposed rule regarding new regulations for health insurance companies. The specific document is called:

Patient Protection and Affordable Care Act: HHS Notice of Benefit and Payment Parameters for 2016

In that proposed rule were two open data concepts that are worth noting:

  • A suggestion that insurance companies be required to release their formulary data as machine readable data sets.
  • A suggestion that insurance companies be required to release data about their current provider directory as machine readable data sets.

As you might imagine, the DocGraph Journal consistently advocates for open data and indeed, we did submit comments regarding this issue…

We had one of our part time researchers (thanks Armie!!) search all of the comments for mentions of “machine readable” and/or “data” to see who had commented on this matter besides us. Then we created a google sheets page with all of the relevant comments in one place. We are now releasing this data to the public.

Read on to access the data, and to read our first-pass analysis of what we found!

Accessing the comment data directly

You can access, comment on, copy, and fork this comment data set using this link.

If you would just like to read the comments, this link might work a little better.

The columns to the document are described here:

  • Commenter name – the name of the person or group that submitted the comment.
  • Commenter url – the url of the person or group that submitted the comment, if we could find one
  • Commenter description – a short description of the commenter taken from the about us on their web page or wikipedia, if we could find one
  • Link to comments – the url to the specific set of comments that we mined for this information
  • Comments on Machine Readable Formularies (if any) – contains the full text of the comments that were made on open formulary data
  • Med formulary data for/against – a column where we noted people who were “for”, “against”, or “concerned” about the open formulary data release
  • Comments about Machine Readable Provider Directories (if any) – contains the full text of the comments that were made on open provider network data
  • Provider data for/against – a column noting people who were “for”, “against”, or “concerned” about the open provider network data release
  • Type – the type of organization
  • Notes – any notes that I thought were worth making

Data license

We made enough use of Wikipedia content, that we are releasing the data under the Wikipedia license for good form. Creative Commons Attribution-ShareAlike License

Please consider letting us know if you extend this data, so that we can link to your work (if it’s good).

Data quality caveats

We tried to do our best to copy the comments accurately from the original. Sometimes, commenters would have two things to say about open data in two very different parts of the document… so we did occasionally elect to eliminate the middle parts and just include things that related directly to open data. It is also possible that someone discussed these open data issues in a way that our searches missed. Lastly, it is possible that we pasted things into the wrong columns (you know.. fat fingering). It is possible that we might have missed some commenters entirely, since there is no way to tell if you accidentally skipped an entry. We have been over the sheet once to make sure the errors are not too glaring. Still, if we missed your comments or we misrepresented your comments, please contact us and we will correct the data immediately.

We might have made a mistake getting the organization description or url into the chart, because we generally trusted Google searches not to mislead us. If we did this to your organization, please contact us and we will correct the data immediately.

Data that we made

We made determinations about who was “for”, “against” or “concerned” about the data release. “For” was the simplest, because support that was for the open data release was clear and unambiguous. “Against” and “concerned” on the other hand, were more difficult judgement calls. Generally, if a comment specifically said “HHS should not do this”, explicitly, or complained so much that it was obvious that the commenter clearly was against the whole idea, then we marked it as “against”. Sometimes, however, a commenter would label themselves as supportive of the open data release in principle, but concerned that such a release needed more deliberation, time, or work. We labeled these as “concerned”, since they had issues that could be addressed, and it seems like the commenter would have been “for” the release, if their concerns were addressed.

We had very few “types” of organizations that commented on these proposals.

  • Opendata – people like DocGraph who were clearly just saying nice things about the possibility of new open data
  • Consumer – people or groups whose role was advocating for a specific consumer group, or a particular type of patient
  • Providers – doctors, hospitals, nurses, and other people who deliver healthcare
  • Pharma – organizations that make or sell medications
  • Biotech – for the rare biotech organization that commented

About the data

The site lists 310 data points, but we found a couple of duplicate submissions and we did not include our three comments (because that skews the data from our point of view), so our table has 301 rows.

The vast majority of the commenters did not mention anything about the open data issues. In fact only 79 commenters out of the 301 that we looked at in total mentioned the machine readable data requirements.

Healthcare providers, including representatives, hospitals, doctors, nurses and midwife groups universally wanted the open data releases. Consumer or patient focused groups were uniformly for the data release. In fact, support for the open data efforts was consistent in every group except the insurance companies.

For the most part, only insurance companies were against or concerned about open data, although there were two insurance company groups who were for the open data releases (Alliance of Community Health Plans and Delta Dental Plans Association).

There were four insurance company groups who were clearly opposed to the open data release:

There were five insurance company groups in the “concerned” category:

I have to admit, this last one surprised me. I had thought that AHIP would have been more clearly opposed to these open data efforts, given how burdensome they must be on at least some of their members. However, the two relevant sections in the comments from AHIP begin:

We support the goal of making formulary drug lists publically available. We agree that it is important for consumers to have information available to them to ensure they choose a plan that best meets their needs – including information on covered drugs. However, we have concerns with the risks inherent in aspects of HHS’ proposed approach


We support transparent provider networks to ensure choice among consumers, but recommend postponing the inclusion of a machine-readable provider file or an alternative until standards exist to ensure uniformity.

These kinds of comments, generally supportive, but cautious, were consistent among the insurance companies we labeled as “concerned”.

Concept analysis

We really do not have the volume of comments here to justify either a cluster analysis or even enough to achieve statistical significance. However, we did think it was useful to classify and count the various concepts that appeared frequently in the comments. It should be noted that this analysis is highly biased because it was done quick-and-dirty. We started tracking the appearance of a concept when we thought we had read it a couple of times already and we frequently lost count… Please consider helping to redo this work in a careful and methodical way. In any case here is the concept frequency google sheet.

The first thing that jumps out is that everyone loves the Medicare Part D plan search tool. The people who do not want to have open data coming to the public directly from the insurance companies want HHS to build one using the information that the insurance companies already send them. The people who were for opening up the data thought that third-parties would be better at building something like this.

After that you see the basic arguments for and against the open data. Given how inaccurate our counting methods are, it is best not to presume that 9 beats a 7… but generally the insurance companies are concerned about the fidelity of third party tools.

I would encourage you to read carefully concepts or arguments that were only made once or twice. This is where you find the concepts that actually expand your conceptualization of the problem at hand. For instance, the Midwives professional organization pointed out that they are not typically “findable” as providers by the current insurance company plan search tools. This is a great point that I had not thought of.. what happens if instead of “I want insurance my doctor takes” what you really want is “I want insurance my favorite mid-level healthcare provider takes”. Great points to be had by scrolling to the bottom.

We would really like help making this better. So if you are going to read all of the comments in any case, please consider helping us extend this analysis.

Repetitive term analysis

It is fairly obvious, if you actually try and read everything, that some points and phrases are clearly variations on the same text. Mostly cut and paste by organizations seeking to save time by collaborating with other, like-minded organizations. The first example we noted was from the insurance industry. Search for the word “confidentiality” in the web-site version of the data and you will see what I mean. Confidentiality is a relationship between two parties. It can be between patient and doctor, lawyer and doctor, lawyer and patient, insurance company and insurance exchange… but by itself the word means nothing.. The word by itself just raises question: What should be kept confidential? Who gets to know?? But not tell?? Who is doing the telling?

It’s just a weird use of the term “confidentiality” that made me realize that all of the insurance replies were heavily coordinated. It is easier to see the pattern with a little text around them…
From AHIPs comments for the provider data:

and urge additional stakeholder input around key issues, such as data integrity, accuracy, potential for consumer confusion, and confidentiality.”

Cigna’s comment for provider data:

and urge additional stakeholder input around key issues, such as data integrity, accuracy, potential for consumer confusion and confidentiality.”
Kaiser family plan for provider data:
 “in addition to issues of data integrity, accuracy, interpretation, and confidentiality”
HealthPartners plan for provider data
We strongly urge additional stakeholder input around key issues, such as data integrity, accuracy, and potential for consumer confusion, plan liability and confidentiality. “
From AHIPs comments for the formularies:

“Similar to our current practices, confidentiality, use controls, and cost should also be addressed related to this proposal.”

Health Partners formularies 

“Similarly to our current practices, we recommend confidentiality, use controls, and cost should also be addressed in this proposal.”

This occurred with the provider groups too.

Searching for the phrase:

Finally, we encourage CMS to require issuers to make this information publicly available on their Web sites in a machine-readable file and format so as to provide the opportunity for third parties to create resources that aggregate information on different plans in a consumer-friendly format.

Reveals repeated comments by two different provider organizations including the phrase repeatedly. Of course this argues for a Reddit style comment submission system, where friendly organizations can just upvote the comments that reflect what they think. But since that is not going to happen… it might benefit someone at Code for America to write a comment parser that takes all of the text from the comments and does some basic “originality analysis” to see where the connections between comment makers are.

In summary

Thanks for thinking about these comments with us. I was actually happy to see to the degree to which the average insurance company really did seem to want to be generally transparent. I will likely be publishing an analysis of what positions HHS and CMS could take on this open data issue, in order to be most useful to patients and fairest to the insurance industry (which, to be fair, has done through a lot of change lately)

More soon,

Fred Trotter