AI systems are making high-stakes medication safety decisions in hospitals. No independent, clinically rigorous evaluation measures whether they are safe. Posognos publishes the first continuously updated safety benchmark for clinical AI, built on the standards healthcare already trusts.
AI is replacing the clinical decision support systems hospitals have relied on for decades. These new models make medication safety recommendations that directly affect patient outcomes, but until now, no independent organization has measured whether they actually work.
Hospital systems still miss roughly one in three harmful medication orders. AI is replacing legacy rules engines, but no one is verifying whether it performs better.
Clinicians override up to 96% of drug safety alerts because most are irrelevant. AI that cannot distinguish critical from routine makes the problem worse.
Annual U.S. exposure from preventable adverse drug events. Independent evaluation is the missing infrastructure to reduce that number.
PsiBench translates the clinical safety standards hospitals already trust into automated evaluation scenarios, runs them against AI models independently, and publishes the results.
Domain experts translate established medication safety standards into automated benchmark scenarios. Every scenario is validated by named clinical authorities and grounded in standards the industry already reports against.
Posognos evaluates clinical AI models through EHR test environments and API endpoints using synthetic patient scenarios. No protected health information is accessed, generated, or stored.
Aggregate scores are published on the PsiBench scorecard, freely available to the public. Detailed failure analysis, expert annotations, and remediation guidance are available to subscribers.
The scorecard measures what matters to the people who carry liability when AI gets it wrong: Does it catch the orders that could harm a patient? How does it compare to alternatives? Does performance hold across updates?
| AI Model | Contraindication Detection | Alert Specificity | Override Appropriateness | Overall Score |
|---|---|---|---|---|
| Model A | 87 | 72 | 64 | 74 |
| Model B | 69 | 81 | 78 | 76 |
| Model C | 58 | 55 | 41 | 51 |
| Legacy Rules Engine | 44 | 22 | 30 | 32 |
Whether you build clinical AI, deploy it, or set the standards it should meet, PsiBench gives you the independent safety data you need to make better decisions.
Hospital systems are starting to require independent safety validation for clinical AI. PsiBench provides third-party evaluation against the standards hospitals already trust, so you can demonstrate safety performance with data, not claims.
Clinical AI vendors are making safety claims you cannot independently verify. PsiBench provides the evaluation layer that lets you compare products against the standards your organization already uses, without building the testing infrastructure yourself.
We do not invent safety metrics. We operationalize the clinical safety standards the industry already uses, so evaluation results are immediately meaningful to the organizations that rely on them.
Posognos' first benchmark is built on the medication safety evaluation standard already adopted by over 3,000 U.S. hospitals.
Automated, continuous evaluation replaces weeks of manual testing. Synthetic-first methodology. Zero PHI. Privacy by design.
Credible safety evaluation requires independence from the organizations being evaluated, deep clinical expertise, and access to the standards the industry already trusts. Posognos was built on all three.
Posognos' founding experts co-created the national medication safety evaluation used by 2,000+ U.S. hospitals. They bring decades of domain authority and direct relationships with the standards bodies that define clinical safety.
Every PsiBench scenario is built and peer-reviewed by named domain experts with verifiable credentials. Their names appear on the evaluations they validate, because accountability is how trust is built.
Posognos is not funded by EHR vendors or AI labs. We do not consult for the entities we evaluate. Evaluation results are published independently. The integrity of the benchmark depends on it.
Posognos' credibility comes from who builds and validates the evaluation. The team includes the original authors of the national medication safety standard, clinical informatics leaders, and the engineers who built the clinical decision support platforms used in thousands of hospitals.
Co-created the CPOE safety evaluation adopted by 2,000+ U.S. hospitals. Built one of the first computerized physician order entry systems. National patient safety informatics leader.
Medication safety pioneer. Founding CEO of TheraDoc (acquired). Creator of one of the first real-time clinical decision support systems. Former chief strategy officer at Pascal Metrics.
Former EIR at the Allen Institute for AI (AI2); led product at PierianDx (45+ clinical genomics labs) and Tute Genomics through its acquisition.
SVP Product at Vizient; previously built and scaled the clinical quality platforms at TheraDoc and Safe & Reliable Healthcare through two acquisitions.
Architected TheraDoc's clinical decision-support alerting engine; co-founded Safe & Reliable Healthcare (acquired by Vizient); re-architected the Knome genomics platform at Tute Genomics.
Whether you build clinical AI, evaluate it for procurement, or hold the clinical expertise that should inform how it is measured, we want to hear from you.