HOUSTON, June 1, 2015 /PRNewswire/ — DocGraph is launching a new web based portal Linea ( to enable the health data science community to discover, aggregate and enrich new open healthcare datasets.

DocGraph Linea is based on technology developed and contributed by Merck (known as MSD outside the United States and Canada). DocGraph Linea will provide data scientists a socially-enabled community open data platform that collects details about disparate healthcare datasets, and further allows the community to extend what data is available. Users will be able to search datasets, understand data lineage, view relationship matrices, add metadata, and see community algorithms.

Initially, DocGraph will seed the site with its known list of viable data sources.  Users will be able to contribute data they discover or create themselves, and DocGraph Linea will act as a marketplace for innovative data releases and code. DocGraph Linea will pull together and link to assorted datasets under Public Domain, Open Source, Creative Commons, and other data licenses specific to the data’s source. The community will be able to review and evaluate datasets on the site to ensure quality. DocGraph Linea will provide a curated, disambiguated, and accessible directory of open data.

Fred Trotter, Founder and Data Journalist at DocGraph, said, “While there are already several places to discover and download open healthcare data, there is almost nothing available to help people learn to exercise these data sets. Merck’s IT group has made a substantial technology contribution, which will allow the larger healthcare community to derive new open healthcare data sets. The end result will be lots of new innovations, many new healthcare data startups, and ultimately better healthcare as our society’s understanding of the nuances of healthcare delivery accelerates.”

Peter Lega, Director of Emerging Technology at Merck, said, “As the ecosystem of open data grows, these new capabilities to easily discover, share and enrich it will help foster collaboration, a better corpus of data, and new insights in the open data community.”

DocGraph ( is an organization that works to create, maintain, and improve open healthcare datasets.  It aims to grow the open health data movement and build a community of data scientists, journalists, and clinical enterprises who use open data to understand and help evolve the healthcare system.

As always, DocGraph will be attending Datapalooza, May 31st-Jun 3 2015 in Washington D.C.

Thanks to our friends at we will be hosting a datathon the day before Datapalooza starts.

  • DocGraph data visualization hacking
  • Data structure tutorials for multiple open data sets
  • State-level doctor data hacking
  • Food data demos
  • Sessions on the new prescribing pattern data.
  • Sessions on the procedure pattern data
  • Sessions on the open payments data


1776 (12th Floor)
1133 15th Street Northwest
Washington, DC 20005

Saturday, May 30, 2015 from 9:00 AM to 5:00 PM (EDT)

It costs $35 to register for

Announcing EHR Vendor Attestation Report Card and Data

UPDATE (May 20, 2015) It looks like the forces for open data won this one! Here is the summary at fiercehealth and the actual policy change letter sent to payers.

DocGraph is crowdfunding releasing a dataset revealing how specific EHR vendors perform on Meaningful Use attestation, bringing greater transparency into the EHR industry. Until now it was easy to see how providers were performing on Meaningful Use attestation, but it has been difficult to hold EHR vendors accountable for attestation performance. What will likely be more controversial is that this data release will amount to the release of the “attesting” client list of every EHR vendor in the country.

An initial dataset will become available on the last full day at HIMSS, and the crowdfund will continue until Datapalooza. This post discusses our underlying motivation for creating a new dataset, as well as some of our goals with its release.

I enjoy and appreciate many aspects of the annual HIMSS conference: the people who run it, the attendees, educational sessions, and keynotes. Further, I find that regional and local HIMSS events are well worth attending. However, I am not a fan of the “big” HIMSS tradeshow floor. The parallels between walking down the “main aisle” at HIMSS and walking down the strip at Vegas creates are striking. The opulence of the Vegas strip and the excess in the HIMSS tradeshow floor both stir a sense of unease and bring up the same questions: “Who is paying for all of this? Is someone getting fleeced? Is it me? If it is not me, would that make the fleecing OK?”

The HIMSS tradeshow floor is a necessary evil because we have, in Health IT, no better way to make decisions about what products we buy. As it stands, figuring out which vendors have the biggest booths at HIMSS is probably not the worst way to make decisions about EHR systems.

The alternative is to hire someone to tell us which EHR vendor fits us best. Probably the most famous provider in this space is the “Best in Klas” service. However, Klas is famous for being payed by both sides of the industry. Klas is paid both by potential EHR purchasers and by those who sell EHR system. Like HIMSS, Klas creates a space for buyers and sellers to meet. I think Klas and HIMSS both do an admirable job trying to maintain fairness and objectivity, given the massive financial biases under which both organizations operate.

Both HIMSS and Klas are profiting from an underlying problem in the Health IT industry: Information Asymmetry. Anytime two people are making a business arrangement and one party has substantially more information than the other, there is a tendency to abuse that knowledge. This is why it is so uncomfortable to buy a car from a used car salesman. What do they know about this car that they are not telling you? Even though most used car salespersons are likely to be honest people who do not take advantage of their extra information, enough of them succumb to temptation to give that industry its seedy reputation. There are parallels between the Health IT industry and the used car marketplace:

  • Buyers are less informed than sellers. Health IT is complex enough that buyers do not understand how to evaluate an EHR product effectively, or even how to gain that understanding.
  • Vendors in the marketplace use Information Asymmetry to their advantage. Multiple vendors offer services that are very reasonable in cost. Some of those reasonably priced vendors do a terrible job, and some do a great job. Vendors with established reputations charge tremendously expensive prices. Buyers want to avoid fly-by-night operators, so they frequently overpay for merely adequate products.
  • Established vendors want to control their reputation in the marketplace. But what is unique to the EHR industry is how they do this: frequently introducing contractual constraints to ensure that buyers do not leak negative information into the public. This makes it very difficult to establish transparency, since many EHR buyers are forbidden from discussing the performance of their EHR software or their EHR software vendor.

There are several services, most notably the Kelly Blue Book, that help to provide consumers with the information that they need in order to effectively make used car purchase contracts. While Klass and HIMSS attempt to take this role in the Health IT marketplace, both of them “serve two masters” and have become, in some respects, part of the problem. Of course, Blue Book likely sells more of its pricing guide information to car salesmen than it does to consumers. It should be noted that DocGraph is sponsored by EHR industry players as well. How does the Kelly Blue Book maintain its credibility? The Blue Book is purely objective data. It does not make subjective statements about a particular car, or a particular car manufacturer. Its role is only as a data aggregator.

Recently, Niall Brennan’s team at CMS, and their counterparts at ONC modified how the EHR Attestation data was released. This modification, which we will discuss in detail later, has made it possible for DocGraph to create a new merged dataset about the EHR industry. We cleaned up this dataset in order to make a functional data release. This “derived EHR dataset” allows specific EHR vendors as well as specific EHR products to be tied with specific healthcare provider attestation results. The “unmerged and uncleaned” original dataset only allows for EHR providers to be compared in their attestation performance. The new DocGraph EHR dataset will allow vendors and products to be compared too.

This dataset can produce is a “Blue Book” style report card on vendors and products in the EHR industry. Over the next few days, we will be releasing such a report card. As far as we know this will be the first time the EHR industry as a whole will be held accountable for their performance in Meaningful Use attestation.

This is the first of several useful reports that are relatively easy to produce, using a dataset which associates specific attestations with specific EHR products. It is also important to understand if some EHR products are more likely to be “fired” and swapped, or to “fall out” of the attestation program. We can calculate who has benefited most by the extension of the Stage 1 funding. We can generate report cards that detail which EHR vendors perform best on measures that are beneficial to patients, public health reporting, etc.

A more controversial side-effect of this data release will be the association of attesting providers with their respective EHR provider(s). This amounts to publishing the majority of the client lists for every EHR vendor in the United States.

In reality, completely measuring the nuances of the Health IT industry by leveraging Meaningful Use attestation data will not work. As a result, we are announcing a crowdfund to support our efforts to create more open data about EHR vendors. Like all crowdfunds we offer rewards like books and t-shirts. Like all DocGraph crowdfunds we are also offering exclusive early access, and significant discounts to the new EHR dataset.

Data release: Open Provider Directory and Open Formulary comment data

Recently, HHS released a proposed rule regarding new regulations for health insurance companies. The specific document is called:

Patient Protection and Affordable Care Act: HHS Notice of Benefit and Payment Parameters for 2016

In that proposed rule were two open data concepts that are worth noting:

  • A suggestion that insurance companies be required to release their formulary data as machine readable data sets.
  • A suggestion that insurance companies be required to release data about their current provider directory as machine readable data sets.

As you might imagine, the DocGraph Journal consistently advocates for open data and indeed, we did submit comments regarding this issue…

We had one of our part time researchers (thanks Armie!!) search all of the comments for mentions of “machine readable” and/or “data” to see who had commented on this matter besides us. Then we created a google sheets page with all of the relevant comments in one place. We are now releasing this data to the public.

Read on to access the data, and to read our first-pass analysis of what we found!

Tired of "out of network" insurance games

If you spend much time in the patient community you meet someone who has been burned, badly by the “out of network” game that insurance companies play with/against healthcare providers.

Its simple, you get insurance plan A from company Z. Then you go to a specialist or get a scan or something and you ask, “do you take company Z insurance”? They say “sure”. You hand them the insurance card. What they don’t tell you is that they will be billing “out of network” which means they will be hardly covered at all.

You go to the insurance company, they point to the provider. You go to the provider, they point to the insurance company. Who is left with the huge bill? The patient.

Sometimes this gets really bad, in the worst cases important treatments to relieve suffering are delayed.

Are you tired of this? In order to fix this, we need to be able to build systems that tell us for sure which providers are in a given plan at a given time. We need to have that system available when we purchase our health insurance so that we can buy insurance that covers the doctors that we already use, or the ones that we want to use. We can imagine a theoretical tool called that solves this problem in a user friendly way.

There are lots of companies and journalists in the DocGraph community that would love to be able to build such a tool. DocGraph would love to provide the data for such a tool but right now that would require that we scrape the websites of every insurance company provider directory in the country. Those websites are really unfriendly to such efforts. The following text was taken from the user agreement of the doctor finder tool for Aetna:

By using DocFind, you acknowledge and agree that DocFind and all of the data contained in DocFind belongs exclusively to Aetna Inc. and is protected by copyright and other law. DocFind is provided solely for the personal, non-commercial use of current and prospective Aetna members and providers. Use of any robot, spider or other intelligent agent to copy content from DocFind, extract any portion of it or otherwise cause DocFind to be burdened with unwarranted high access or transaction activity is strictly prohibited. Aetna reserves all rights to take appropriate civil, criminal or injunctive action to enforce these terms of use. 

Provider information contained in this directory is updated 6 days per week, excluding holidays, Sundays, or interruptions due to system maintenance, upgrades or unplanned outages. This information is subject to change at any time. Therefore please check with the provider before scheduling your appointment or receiving services to confirm he or she is participating in Aetna’s network. Participating physicians, hospitals and other health care providers are independent contractors and are neither agents nor employees of Aetna. The availability of any particular provider cannot be guaranteed, and provider network composition is subject to change. Notice of the change shall be provided in accordance with applicable state law.

The underlines are mine.

First, Aetna does not want anyone scrapping their website. They do not want people like DocGraph to create these data sets. They view their list of providers as a protected information asset, that only they can leverage.

But more importantly, they put the responsibility on “who is in what plan” squarely on the doctors. Which really means the patients, because the doctors websites will just say “check the insurance company website”. See what I mean about finger pointing?

Insurance companies, and healthcare providers need to be held accountable for their in vs out status. The only way to do this is to create open data set that maps Plans to Providers so that projects like is really easy to build.

The policy wonks at HHS/CMS/ONC et al get this. The have recently added the following text to the rules for the 2016 insurance plans.

…we propose that a QHP issuer must publish an up-to-date, accurate, and complete provider directory, including information on which providers are accepting new patients, the provider’s location, contact information, specialty, medical group, and any institutional affiliations, in a manner that is easily accessible to plan enrollees, prospective enrollees, the State, the Exchange, HHS and OPM. As part of this requirement, we propose that a QHP issuer must update the directory information at least once a month, and that a provider directory will be considered easily accessible when the general public is able to view all of the current providers for a plan on the plan’s public Web site through a clearly identifiable link or tab without having to create or access an account or enter a policy number….(blah blah)…We also are considering requiring issuers to make this information publicly available on their Web sites in a machine-readable file and format specified by HHS.

underlines are mine…

This would solve the problem. Anyone who wanted to could create a website that showed what plans any given provider accepted, would be able to easily do so.

But they key word here is “propose”. Insurance companies in this country benefit greatly from the confusion about in network and out of network, and so do some unethical healthcare providers. There will be lots of people who oppose this proposal.

I hope that I have made the case that this information needs to be open and machine readable. If your convinced, then you can find the comment page to support this policy here. If you disagree with us, and you still want to submit a comment, you can use this page.

Please take a few moments and write in to support this policy change. The comments are due Dec 22nd 2014 which is basically tomorrow.

If you would like to read the in-progress comments from the DocGraph Journal you can go here. Feel free to cut and paste from out comments into your own comments, we would be flattered.

Feel free to tell them that I sent you 😉

-Fred Trotter


DocGraph Summit recap


The DocGraph Summit was a great success, a big thanks to everyone who made it down. We filled our day discussing current open health data initiatives, questions, and goals. We are grateful to Houston Technology Center for providing an excellent venue, and we are already scheming for the 2nd annual Summit next year! Check out the Storify here: .