The Real-World Evidence Generation Gap
Everyone agrees RWE is the future. Almost nobody has the infrastructure to generate it systematically. Here's what's missing.
Ben Smith
Founder & CEO, Principia Health Sciences
The enthusiasm for real-world evidence has outpaced the infrastructure to produce it. That gap — between what regulators and payers are asking for and what most research organizations can actually deliver — is the defining challenge of outcomes research today.
Let’s be specific about what the gap looks like.
What regulators are asking for
The regulatory landscape has shifted dramatically. FDA’s Real-World Evidence Framework, EMA’s Big Data Steering Group, and similar initiatives worldwide have established that evidence from real-world settings can support regulatory decisions — including drug approvals, label expansions, and post-market safety monitoring.
This isn’t theoretical. We’re seeing real-world evidence contribute to regulatory submissions with increasing frequency. The question is no longer whether RWE will play a role in drug development. The question is whether your organization can generate it.
What most organizations can deliver
Here’s the honest assessment: most research organizations, even well-funded ones, are generating real-world evidence through heroic manual effort rather than systematic infrastructure.
A typical RWE study today looks something like this:
- Identify a research question that could be answered with existing clinical data
- Negotiate data use agreements with contributing sites (3-12 months)
- Extract data from disparate EHR systems (each one a custom project)
- Clean, map, and harmonize the data into a common format (the most labor-intensive step)
- Build an analytical dataset (another custom project)
- Conduct the analysis
- Discover data quality issues that require going back to step 3
Steps 2 through 5 typically consume 70-80% of the total project timeline and budget. The actual science — the part that generates evidence — gets squeezed into whatever time and resources remain.
This isn’t sustainable, and it’s not scalable. You can do one study this way. You can’t build a research program.
What systematic RWE infrastructure looks like
The organizations that are closing the generation gap share a few characteristics:
They invest in data pipelines, not data projects. Instead of building a custom extraction and harmonization process for each study, they build reusable infrastructure that continuously ingests, maps, and quality-scores data from their source systems. They lean on standards rather than reinventing schemas — FHIR and SMART on FHIR for connectivity, the OMOP Common Data Model as the analytical backbone so every source lines up, SDTM and CDASH for submission-grade outputs. The cost is higher upfront. The cost per study drops dramatically.
They treat data quality as a continuous process. Rather than discovering quality issues during analysis, they monitor data quality metrics in real time and address issues at the source. This is a fundamentally different mindset than the “clean it when you need it” approach that most organizations default to.
They build for the portfolio, not the study. Individual studies have specific data requirements. But the underlying infrastructure — consent management, data governance, harmonization rules, analytical tools — should serve the entire research portfolio. Every study should make the next study easier.
They engage sites as partners. Contributing sites aren’t just data sources. They’re research partners who need tools that fit their workflows, feedback on data quality, and visibility into how their contributions are being used. Infrastructure that treats sites as passive data extractors will always struggle with engagement and quality.
They put AI to work where it earns its keep. The labor-intensive steps — extracting structure from clinical notes, classifying columns, mapping local codes to a common vocabulary, flagging validation issues — are exactly where embedded, attributable AI pays off. Not generative hype, but practical assistance that keeps a human in the loop: confidence scores, visible sources, and clear correction paths. That’s how harmonization stops being the step that eats the budget.
The opportunity
The generation gap is real, but it’s also an opportunity. Organizations that build systematic RWE infrastructure now will have a compound advantage as demand for real-world evidence continues to grow. Each study improves the infrastructure. Each improvement reduces the cost and timeline for the next study.
The organizations that wait — that continue to approach RWE as a series of one-off projects — will find themselves increasingly unable to compete for the research programs, partnerships, and funding that require systematic evidence generation.
The question isn’t whether to invest in RWE infrastructure. It’s whether you can afford not to.