Why Legacy EDC Is Holding Back Outcomes Research
Electronic data capture was built for clinical trials, not for the messy, longitudinal reality of outcomes research. It's time we stopped pretending otherwise.
Ben Smith
Founder & CEO, Principia Health Sciences
There’s a dirty secret in outcomes research: most teams are still running their studies on tools designed for Phase III clinical trials.
Electronic data capture systems — the EDCs that have dominated clinical research for two decades — were built for a specific job. They excel at structured, protocol-driven data collection where every field is predefined, every visit is scheduled, and every deviation is a violation. That’s exactly what you need for a randomized controlled trial.
It’s exactly wrong for outcomes research.
The mismatch nobody talks about
Outcomes research is fundamentally different from a clinical trial. The data is messy. Patients don’t follow protocols — they live their lives. Data arrives from EHRs, registries, patient-reported outcomes, labs, claims databases, and increasingly from wearables and genomic platforms. It arrives on its own schedule, in its own format, with its own quality issues.
Legacy EDC handles this the way a spreadsheet handles a database: technically possible, increasingly painful, and eventually catastrophic.
Here’s what that looks like in practice:
- Data silos everywhere. Each data source gets its own capture mechanism, its own validation rules, its own export format. Integrating them is a manual, error-prone process that happens after the fact.
- Rigid schemas that fight reality. When your data model can’t accommodate a new biomarker or an unexpected clinical event, you’re stuck filing change requests and waiting for IT.
- No longitudinal awareness. EDC thinks in visits. Outcomes research thinks in patient journeys. The gap between those two mental models creates enormous friction.
- Cost multiplication. Every site, every study, every data source requires its own setup, its own validation, its own training. Costs scale linearly with complexity — exactly when they should be declining.
What outcomes research actually needs
The infrastructure for outcomes research needs to start from different assumptions:
Data arrives continuously, not in visits. The system needs to ingest data as it’s generated — from EHR feeds, lab results, PRO submissions, registry updates — and harmonize it in real time, not in a batch process six months after the study closes.
Schemas evolve, not break. When a new data element becomes relevant mid-study (and it will), adding it should be a configuration change, not a development project.
Integration is the default, not the exception. Data from different sources needs to be linked at the patient level automatically, with transparent provenance and quality scoring.
Analysis starts on day one. Researchers shouldn’t have to wait for a data lock to start exploring their data. Interim analyses, safety signals, and quality checks should be continuous.
The real cost isn’t the license fee
When I talk to research teams about their EDC costs, they usually start with the per-study license. That’s the smallest number in the equation.
The real costs are the ones that don’t show up on the vendor invoice: the data management team spending 60% of their time on data cleaning and reconciliation. The six-month delay between study completion and final dataset. The analyses that never happen because the data isn’t structured for the question. The studies that don’t get funded because the projected data management costs make the ROI impossible.
These are the costs that compound. And they’re the costs that purpose-built infrastructure eliminates.
A different approach
At Principia, we built CuRE because we lived this problem. We spent years watching research teams duct-tape EDC systems into shapes they were never designed for, and we decided to start from the actual requirements of outcomes research rather than from the legacy of clinical trials.
That means continuous data ingestion over FHIR and SMART on FHIR, flexible schemas, built-in harmonization to a shared OMOP model, and analytics that work from the moment the first data point arrives. It means AI-assisted capture and validation that’s attributable and human-in-the-loop, not a black box. And it means treating integration as a first-class concern — one governed record across the whole platform, not an afterthought bolted on between tools.
It’s not a better EDC. It’s what comes after EDC — for the research that matters most.