OpenHospitalCost
Home / Methodology

Methodology

How raw hospital files become comparable prices — and the limits of what the data can tell you.

Sourcing

We start from the federal hospital directory to identify hospitals and their identifiers, then locate each hospital's machine-readable file (MRF) of standard charges, which it must publish under 45 CFR §180. We supplement with public bed counts and metro (CBSA) data to add context. Files are downloaded directly from each hospital — only the price files they are required to publish — and we identify ourselves honestly when we do.

Parsing & code matching

MRFs come in many shapes — CSV and JSON, often compressed, sometimes multiple gigabytes — and hospitals format and label them inconsistently. We parse each file and match its line items to the procedures we track. Because the same procedure can be represented in very different ways from one hospital to the next, a good deal of our work goes into matching it reliably across those variations.

Quality scoring

Each file gets a File Quality Score (0–100) based on how completely it populates expected fields and whether it actually contains usable negotiated and cash prices. A hospital becomes “money-page eligible” — shown on comparison pages — only when its latest file clears that bar. This keeps thin or unparseable files from polluting comparisons.

Representative price

A single procedure can appear many times in one file — across payers, plans, settings, and as separate professional vs. facility components. We distill those into one representative, comparable price per hospital, chosen to reflect the typical cost of the full service rather than an unrepresentative fragment — so a “$67 MRI” that is really just one component doesn't misstate the real price. For negotiated rates we also show the range across payers, and national figures summarize across hospitals with the spread between them.

Freshness & provenance

Price records are append-only; we keep the history and surface the current snapshot. Every hospital page cites the source file and the date it was ingested, and raw files are archived so prices can be re-derived if our processing changes.

Limitations

Found an error? Submit a correction — every report is checked against the source file.