Announcing MrPUP

We are happy to announce that the DocGraph Journal is releasing the first Medicare Public Use File released by a private organization.

The new public use file is called MrPUP, and it details how Outpatient providers in Medicare refer procedures. MrPUP stands for Medicare Referring Provider Utilization for Procedures. You can download the data here, the Open Source Eventually version is free to attendees of Datapalooza!

There are many Medicare procedures, including most lab and imaging tests, that require a specific physician to refer them. When a Medicare beneficiary gets an X-Ray or a blood test done, Medicare wants to know what provider ordered these tests. As a result, for some, but not all, procedures, Medicare requires the referring doctor’s NPI number.

Previously, CMS has released information on the performed procedures in Medicare, coded by NPI and hcpcs_code. MrPUP has a similar data structure, but instead of including the performing NPI, it includes the referring NPI.


The simplest way to explain the value generated with this new data, is to demonstrate how the analysis of lab referrals took place on open data before this data release, and look at it after. Before this data release, it was possible to understand which providers were sending data to which labs by leveraging the teaming dataset, that was the first data set that we released and the one that gave DocGraph its name. We also know, from the current performing NPI PUF from CMS, what kinds of tests those laboratories are performing. Using the analysis tools provided by CareSet Systems we can see how that data analysis might look for a specific doctor:



This is a flow graph from an actual interface inside CareSet systems. On the left we see the procedures that our doctor is responsible for with Medicare in blue. Our doctor, Dr. Magid, is in orange in the center. The green dots represent the laboratories that Dr. Magid shares patients with. The red dots show the procedures that the laboratories are performing. Obviously the labs that this provider works with, which include both LabCorp, Quest and the lab at this providers group practice, do hundreds of different and distinct lab tests with a huge variety and overlap. Our doctor could only be responsible for a tiny fraction of these… but which ones?

Enter MrPUP. Using MrPUP we can generate a new graph, where the edges of the connections between the labs and the provider can be labeled with specific lab tests. This gives us insight into how this doctor is referring labs, in a way that the first analysis just does not.


You can see how much more powerful this analysis method is.


There are some caveats to how this data should be analyzed. Not every procedure in Medicare requires a referral NPI to included in the claim. But all claims allow for this field to be filled in. There are some cases where CMS requires this field to be properly filled in, but there are no cases that we are aware of where its use is forbidden. That means that this data set might include some interesting data that is not indicative of any trends. Lets imagine a Cardiologist who configures her billing system to always include the NPI of the primary care doctor who referred a patient. In our data set, those primary care providers would appear to have a strange and unusual amount of “referred cardiology procedures”. It would make them really stand out in the data set as unique. But in fact, that information does not say anything interesting at all about those primary care doctors… its just an artifact of the strange way that one cardiologist has decided to bill.

There are many cases where the underlying CMS claims data includes what appear to be self-referrals. Of course, self-referral is technically not allowed by policy, but that policy does not extend to requirements about how providers have to fill out specific claims forms. So the underlying data includes lots of cases where the same provider is included in both the referring provider NPI field, and the performing provider NPI field. The vast majority of these are not actually doing anything shady, but are just honoring CMS requirements about how to use the various claims fields. More importantly, when a self-referral is made, then those procedure patterns already appear in the standard CMS outpatient utilization PUF file. For those reasons, and to generally avoid drama, we have excluded self referring procedures from this PUF. We might change how we address this in the future, but this is the simplest way to keep this data release clean.

Hopefully the release of the MrPUP data will draw attention to the requirements that CMS makes regarding this specific billing field, and future policies will ensure that the data becomes more reliable over time. Or not, one never can tell about these things.

Data Licensing

Although this is open data, it is not costless. It takes money for us to work on DocGraph and as a result we are charging a nominal fee for access to the data. If you are a student, researcher, academic or hacker, you probably want to purchase the Open Source Eventually (OSE) version of the data. This version of the data is much cheaper (think textbook) and in one year will become a Creative Commons licensed data file. In the meantime, any work you do with or on the file must be released under the Creative Commons, or some other Open Source License. This version does not allow you to share the data in any way.

If you would like to use the product in your product or service or otherwise leverage this data, you can purchase a commercial-friendly license for it. This costs a little more, but it is still hundreds of thousands of dollars less that it would cost to create the data set yourself. We appreciate those who choose to purchase this license, because this is what allows us to continue our work at DocGraph.

How to get the data for free for Datapalooza attendees:

In order to get the data for free (you will be getting the OSE version), you must @ mention @DocGraph in a tweet that shows you pictured with something fun that clearly demonstrates that you are in attendance at datapalooza. In fact, if you are not at datapalooza, and your tweet pretending that you are at datapalooza is clever enough, we might just decide to give you a free copy in case. After that, go ahead and apply for the free data at the MrPUP download page. Once you have tweeted at us, follow @DocGraph so that we can DM you a link to the download file!