Open FDA at HIMSS – The DocGraph Journal

HIMSS always has lots of Open Source and Open Data, but you have to know where to look.

This year, the standout sessions were from the FDA Open Data talk and Fair Health.

The FDA is slowly announcing its new set of APIs. You can read more at open.fda.gov and follow the latest at the @openfda twitter account. I heard a talk from Taha Kass-Hout, the Chief Informatics Officer of the FDA.

I will try to get slides to post here, or I will upload the blurry photos I took of the most important slides…

For now I want to cover some of the important themes from the talk.

First, the FDA is continuously investing in cloud-based technology that enables it to be open when it should be open, and secure when it should be secure. Pharma companies are now uploading genomic sequencing data to support many of their drug approval processes, and these uploads can be many Terabytes of data each. The space needed to effectively process and examine this data is frequently an order of magnitude larger than this. This is not at all the only source of genomic data that the FDA is processing. When it investigates food-based pathogen outbreak, it is sampling those pathogens and sequencing many of them. All in all, the FDA is now dealing with a deluge of incoming and outgoing data.

The FDA has built/is building a three-tiered cloud approach in response to this massive growth in data processing requirements. First, it is investigating internal cloud infrastructure with strict access control in order to protect the trade secrets that it implicitly gets from pharma companies, along with their data uploads. It is also building a public cloud infrastructure in order to effectively collaborate in the open, when its data allows openness, which it frequently is. Lastly, it is developing a hybrid cloud infrastructure to handle “middle cases”. All of these cloud infrastructures are designed to exchange data with each other, and with FDA sites spread across the country.

Of course, the DocGraph project is probably most interested in coming improvements to FDA labeling data. Drug labels are famously difficult to deal with, they are flat text files which must be carefully processed in order to correctly interpret them. The FDA is promising dramatic improvements in what is available here. This could have tremendous implications for in-the-open analysis of medication data…

The Open FDA also has a mostly empty github page, which is worth watching as it populates.