DocGraph Open Health Data Summit – Oct 8

The DocGraph Summit is just around the corner!

This “unconference” will include short presentations on current projects of the participants, and discussions on the topics, challenges, and ideas deemed most relevant and paramount to the open health data community. Our goal is to set an atmosphere conducive to in-depth dialogue, concept mapping, networking, and brainstorming.

The Summit will also review DocGraph’s open healthcare data initiatives. These projects include food, medical, doctor, and hospital data, as well as other fun topics that are not easily categorized.

Currently we have academics, corporate delegates, researchers and entrepreneurs attending the Summit. Their areas of focus include data analytics, open source drug databases, EHRs, gene/drug interactions, VistA, Health IT, ACOs, statistics, etc. Attendees are coming from from Rice, Stony Brook, UTHSC, e-mds, PwC, Baylor Medicine, the DocGraph community, and more.

Join us!

Eventbrite - The DocGraph Summit

Email for university student and faculty discount codes.

The DocGraph Summit is being held alongside International Conference on Biomedical Ontology (ICBO) 14

Retiring Omni

DocGraph Omni was a website that we used to display a merged set of the open data that is available on healthcare providers.

It was a good idea, but it did not work. Or at least, it is used so infrequently that it is not worth the resources that DocGraph is spending on it. Omni was interesting only to the degree that it could serve as a crowdsourcing mechanism for even more awesome open data about doctors and hospitals. Omni is just not doing its job as a crowdsourcing tool.

More importantly, two of our informal journalism partners, Propublica and US News, have both begun offering more popular consumer facing systems, using our data.

We would rather invest in doing a better job providing US News and Propublica with data, then offer a clearly inferior consumer facing product ourselves. We will do our best to ensure that both Propublica and US News at least have the option of replicating the all of the functionality of DocGraph Omni.

More importantly, CareSet Systems, the sister company to DocGraph which focuses on healthcare system analytics, is offering a commercial product called Patch that does far more than Omni ever did. But Patch functionality is focused on the needs of healthcare organizations, like Hospitals, ACOs, SNFs and LTACs.

We have decided to retire Omni, and invest in our relationships with other data journalists and in CareSet Patch service. We will leave the Omni server up for the next few days, but expect that site to forward to soon.

-Fred Trotter

DocGraph Summit 2014

The DocGraph Journal creates multiple, unprecedented datasets to improve healthcare. It is focused on building an open community of data scientists primed to share analysis of the torrential amount of new healthcare data posted by federal and state governments. The DocGraph Journal interfaces government affairs (with HHS and CMS), to journalism organizations (O’Reilly Media, US News, ProPublica), to academics and entrepreneurs. The journal is supported by research grants from Merck, athenahealth, and Robert Wood Johnson Foundation.

The 2014 Summit will review DocGraph’s open data healthcare projects. These projects include food, medical, doctor, and hospital data, as well as other fun topics that are not easily categorized.

Join fellow health data enthusiasts for an engaging day of unconference-style discussions and presentations, as well as meals and happy hour within walking distance of the venue.

Date: October 8, 2014

Location: Houston Technology Center

Eventbrite - The DocGraph Summit

The DocGraph Summit is being held alongside International Conference on Biomedical Ontology (ICBO) 14

ICBO 14 runs Oct 6-9 and we are encouraging DocGraph Summit participants to attend the first two days of ICBO (Oct 6, 7), which will feature workshops discussing the options for an Open Source Medication Database.

NY Release of Taxi data: Still the right move

Today, word came out that NY released taxi data that has been entirely reidentified.

The technique and concepts to conduct the attack can be found here, and I also found the slashdot discussion interesting.

The result is that the identity and paths of specific named taxi cabs is now public information. This is not entirely bad, since now the data set will be extensively used to detect specific bad actors. Still it was more than the NY government intended and will probably result in a lawsuit.

That lawsuit will be mostly justified, since it is well-understood among security professionals how you do de-identification right and the rules were not followed. If you are doing this with health data, I can recommend fellow O’Reilly Author Khaled El Emam who wrote both Anonymizing Health Data and also Guide to the De-Identification of Personal Health Information both of which I can recommend. You can hire him through Privacy Analytics. He is the de-identification expert that I know the best and I can endorse, but he is far from the only one.

Generally, hashing can be a reasonable approach as long as salts are used in combination with a secure hash algorithum. I prefer to use a different salt for every id, which makes a rainbow attack (like this one) pretty hard to do. 

More importantly, it also entirely appropriate to simply use a randomly generated number instead of a hash. Hashes are convenient when you need to rely on a dynamic and extensible process, rather than static data. It also allows you to throw away the original data, and know that you can reliably repeat the process given new data. That is why it is used so frequently in password storage.

This will result in a chilling effect for open data releases unfortunately, but I am glad it happened. This is a relatively unimportant data set. Which is to say, this could have been much worse. This could have happened with patient data. I work with stuff like HIV and TB infection data, as well as EHR notes containing infidelities etc. I hate to say it, but its better for governments to learn on taxi cabs.

Lastly, I would encourage those who are considering doing data releases like this to reach out to organizations like Propublica and/or DocGraph. If you cannot afford to hire Khaled, we can at least help to ensure that you avoid the basic mistakes. Believe it or not, data journalists like myself are not interested in violating legitimate privacy rights (although we can have a healthy debate around the word “legitimate”) and we would be more than happy to help ensure that a data release is free from reidentification drama.

Part of me wonders why they didn’t just release the taxi data with the taxi numbers intact. I strongly prefer real-name accountability in data sets like this. It might be because by learning the identity of the taxi, you might be able to infer the identity of the passenger, who has a legitimate privacy concern.

Accidents like this will happen, and NY was right to make a release rather than hold back a release because there “might” be a way to reidentify a data set. My hat is off again to NY state/city… innovators in open data.

-Fred Trotter

The DocGraph Journal adds 3 founding members to the DocGraph Alliance

The DocGraph Alliance is a new group of organizations committed to supporting data journalism and data science community efforts. Three global leaders in healthcare, athenahealth, CareSet, and Merck (known as MSD outside the United States and Canada), have signed on as founding members of the Alliance.

The DocGraph Alliance’s community mission is to encourage an ecosystem of innovators to collaborate and share tools and research methodologies around open healthcare datasets. This Alliance will help further develop technical analysis and methods around data released by federal, county, and state entities, as well as those originated by the community.

The DocGraph Alliance is a project of The DocGraph Journal, who shares data with a community of quantitatively minded professionals who mine publicly available clinical datasets to uncover interesting and meaningful insights. Support from the Alliance members means the DocGraph Journal can continue providing support for the growing community of data scientists focused on leveraging initiatives of transparency in healthcare.  As a result of the community’s work, specific news coverage has incorporated DocGraph data, including work from US News, Propublica and Kansas City Star.

“The DocGraph project created a platform for data scientists to collaborate openly on publicly available health data sources where nothing existed before”, said James Ciriello, Associate Vice President of Merck IT Strategy and Innovation, “and as we watched this community become more and more active in trying to address significant problems, we wanted to support it and help it grow. As publicly available healthcare data continues to grow at a fast pace, coordination and comparatives of care become commonplace, and insights on therapy start to drive novel innovation.”

“We are thrilled to partner with the DocGraph Alliance. Fred Trotter in particular has taken on ambitious and important work to socialize open data assets in healthcare and to leverage data in meaningful ways to advance the industry,” said Todd Rothenhaus, chief medical officer, athenahealth, Inc. “At athenahealth, we believe healthcare could benefit from more data openness and transparency. Access to expanded and new types of data through the DocGraph Alliance will support our work to improve our cloud-based services and further innovate based on evidence-based insights and industry trends.”

“Our business, as well as countless others, rely on the availability of Open Healthcare datasets. Our healthcare system modeling tools improve with every Open Data release..”, said Ashish Patel, founder of CareSet Systems. “We want to ensure that DocGraph continues to flourish! The healthcare system needs a cadence of Open Data in order to effectively pursue the Triple Aim.”

DocGraph will work to grow and nurture an open community of data professionals through a series of trainings and events with a focus on further use of open health datasets and development new methods and tools to analyze those datasets.

About The DocGraph Journal

The DocGraph Journal seeks to create and disseminate new open healthcare data sets, and to foster a community of data scientists who contribute tools and expertise to the analyses of open healthcare data. The Journal was founded after Fred Trotter’s crowdfunding of the first DocGraph data set demonstrated a demand for open healthcare data. The original data set, created from a FOIA request, showed how physicians and other healthcare providers collaborate to deliver care to Medicare patients. This original DocGraph data set remains the largest real-­‐name social graph available to the public.


DocGraph Journal

Alma Trotter,<>

Fred Trotter speaking at Health Datapalooza 2014

Fred Trotter, co-founder of The DocGraph Journal, will be speaking at Health Datapalooza in D.C. this week! Health Datapalooza is a national conference focused on liberating health data, and bringing together the companies, startups, academics, government agencies, and individuals with the newest and most innovative and effective uses of health data to improve patient outcomes.

If you are attending the conference, be sure to attend one of his panels listed below, and watch for tweets from @DocGraph and @fredtrotter.


2:45 – 3pm

 Genius Stage

4:15-5:45pm Tech Track:

HealthCare Entrepreneurs BootCamp- Strategy, Practice & Games for Using Public Data to Build, Scale and Deliver Value


1:30-2:30pm Consumer Track:

Data Scientist- Extracting Data Forcefully From Bureaucracies

 3-4pm Tech Track

“What if It Actually Works?” A World with People Using Open Health Data- Dystopia or the Singularity?

DocGraph is an ONC code-a-thon finalist

To catch everyone up, here is the brief sequence of events:

With that in mind, we are happy to announce that DocGraph Omni has been selected as one of the winning entries into the Code-a-Palooza!

Now its time for a look at the competition!

  • Arcadia Solutions  Arcadia looks like a top of the line Health IT consulting shop, and they have previously won a Surescripts hackathon.
  • DocSpot DocSpot is an advanced doctor search tool. They are also an active member of the DocGraph community and have done some innovative work with chargemaster data in hospitals.
  • karmadata karmadata is an advanced API to lots of healthcare data sets. They seem to have lots of international data, and solid data sets for clinical trials. They should be able to come up with something really easy using their other data sources!
  • Lyfechannel Develops advanced patient intervention mobile apps.
  • Medecision  Population management with Big Data.
  • Team FloriDUH Another DocGraph community member Mandi Bishop is leading a team of top thinkers!
  • University of Wisconsin-Madison Cant find a link for this one, but its one of the two academic teams!!
  • Zynx Health Another Big Data player, this time with expertise in Clinical Decision Support

Lots of Big Data expertise, experience designing software, even AI and robotics experience. You can expect some crazy good applications and a fierce competition. Which we plan on winning.