Why LLMs demand a rethink of healthcare AI governance

3 weeks ago

5 minutes read

The introduction of Large Language Models (LLMs) and Generative AI to healthcare has created a new set of governance challenges that have rendered traditional approaches inadequate.

Justin Norden, left, and Kedar Mate

Most healthcare organizations today have developed governance frameworks for new healthcare technologies (e.g. EMR, digital health tech, etc.), typically relying on periodic committee reviews and manual oversight.

However, Generative AI has driven three fundamental shifts beyond traditional predictive models that require a transformation in how healthcare organizations must govern AI:

Unprecedented impact and demand-driven scale of deployment
The need for new types of metrics and ongoing, continuous monitoring
A dramatically accelerated rate of change in models and use patterns

As a result of these rapid shifts, healthcare organizations must evolve towards dynamic, real-time governance and risk-based management of AI tools. Getting there will require advanced technical infrastructure that simultaneously orchestrates policies across multiple AI applications, real-time monitoring systems, automated risk detection capabilities, and granular controls that can adapt to rapidly changing conditions.

For healthcare organizations, failure to embrace this new dynamic approach to AI governance could bring heavy costs: Either unnecessarily restricting the use of these powerful tools, or worse, putting AI tools into use for patient care without implementing adequate safety measures ahead of time.

Why LLMs break traditional governance models

To illustrate how Gen AI tools differ from traditional technology governance models, consider a comparison between a traditional stroke detection algorithm and a modern LLM-based clinical documentation assistant. The stroke model delivers binary results (i.e., “yes or no” predictions) for a narrowly defined clinical task while a documentation assistant generates complex narrative outputs (i.e., free-text summaries) that require new approaches to evaluation, which introduces greater risk and complexity.

Unlike traditional ML tools, where performance may change slowly through data drift, Gen AI tools are constantly changing as companies update the foundation models behind the scenes and users rapidly evolve their usage patterns. Additionally, LLM-based tools can be used flexibly, meaning a single tool might support documentation, decision support, and patient communication – all with different safety profiles and governance needs.

Below is a deeper look at the three fundamental ways LLMs are shifting governance requirements:

Impact and scale

Traditional predictive AI in healthcare operated within narrowly defined contexts. These models were designed for specific tasks, user groups, and points in the clinical workflow, delivering critical but isolated decisions without broader systemic influence. Their integration was not easy, but it was straightforward, as these technologies typically affected a fairly small number of users (ER clinicians as in the stroke detection example) with clear opportunities for governance to doubt-check wayward technologies.

In contrast, LLM tools fundamentally reshape clinical workflows by offering broad, flexible capabilities across the entire health and care workforce. This versatility introduces significant governance challenges, as organizations must ensure that the tools are used appropriately in numerous different contexts and safeguard against unintended influences on physician behavior and documentation practices.

Governance complexity is further intensified by the rapid and widespread pressure to adopt multiple LLM-based applications simultaneously. Unlike traditional technology implementations that followed deliberate, phased rollouts, healthcare organizations now face the need to evaluate and integrate various LLM-based AI tools simultaneously across documentation, clinical decision support, revenue cycle, and patient communication in parallel.

These interconnected applications often interact with each other and legacy systems, adding layers of complexity that traditional governance frameworks were not designed to handle. Consequently, healthcare organizations must quickly adapt their oversight strategies to manage an evolving and expansive AI ecosystem, balancing innovation with patient safety and clinical integrity.

Metrics and monitoring

Traditional technology governance in healthcare relied on simple, direct metrics like positive predictive value and negative predictive value to evaluate model performance, particularly for deterministic models such as stroke detection algorithms. These models produced consistent, binary outputs that were easy to monitor through regular accuracy reviews and edge case assessments. When performance issues arose, they were typically easy to detect using standard metrics.

Gen AI, on the other hand, is often producing a non-categorical response (non-binary, not even a small set of categories) so assessment of the “accuracy” of the AI can be more subjective and much more challenging. And finally, because the errors introduced can confidently imitate accurate clinical or administrative information, they may go unnoticed for extended periods, making traditional validation difficult.

Because of these complexities, monitoring Gen AI applications in clinical environments requires a shift toward frameworks more akin to those used in other safety-critical fields like autonomous vehicles. Healthcare organizations must implement both leading indicators, such as shifts in note structure or medication list anomalies, and lagging indicators, like tracking actual medication errors or misdiagnoses, to catch problems early and assess long-term impacts.

Ultimately, Gen AI governance will require a more sophisticated, dynamic, technology-enabled monitoring system than what was needed for traditional health technology deployments.

Because of its accessibility and ease of use, Gen AI needs to be monitored for unauthorized use as well. “Shadow AI”, the unsanctioned or untracked use of AI within an organization without the knowledge or approval of IT or security departments, is rampant in healthcare. This increases the risk of data security breaches, PHI leakage, compliance violations and other misuse, and the resulting reputation damage.

Rate of change

Models are advancing rapidly as the result of unprecedented investment. The “best-performing models by leaderboard” constantly changes among OpenAI, Anthropic, Google, etc. – and new model versions are coming out every few months. Users expect “the best” models and so organizations must constantly adapt to the state of the art or people will move off approved methods into “shadow AI.”

Just as the models change, so do the methods for getting the most out of them: prompting techniques, retrieval augmented generation setups, reasoning models, and more. Each of these changes the surface area and risk parameters for how these models must be managed.

The workforce is still adapting to the capabilities of the models and learning new ways to interact with these systems. It’s been said that even if the models stopped improving it would take a decade to learn and adapt to the power of these new tools. This high-speed evolution requires constant updates to risk profiles and governance methods.

A new model of governance

In the face of these rapidly evolving conditions, healthcare organizations face an obligation to measure, monitor and govern genAI technologies.

One option for healthcare organizations is to rely on vendors to monitor their own performance and safety profiles, but this introduces an obvious conflict of interest. Instead, health system leaders and their Boards will have to design their own internal monitoring, security and governance tools that are tailored to their workflows, risk tolerances, and regulatory requirements that allow assurances that these powerful tools are being used safely and securely.

Such tools must capture all LLM use across the organization, especially the unsanctioned, “shadow use” of AI. When improper usage is detected using regular audit trails, it will be important to provide training to the health system employees.

Effective LLM governance requires a shift from static oversight to a more agile, technology-enabled approach. At the core of this model is real-time monitoring that continuously cross-references LLM-generated outputs with electronic health record (EHR) source data.

This ensures early detection of issues such as hallucinations, clinical inaccuracies, or workflow issues – risks that might otherwise go unnoticed using traditional evaluation methods. Of course, not all AI needs to be governed equally; critical clinical decisions must be monitored more closely than administrative functions.

In addition to monitoring, healthcare organizations must implement dynamic risk management that adapts to clinical context. Rather than relying on binary “yes or no” decisions, modern governance systems can automatically calibrate controls – for instance, routing high-risk ICU documentation for human review while allowing automated processing for routine visits. Governance committees also need to change. They should measure LLM data and usage against organizational needs.

Successfully governing LLMs requires both technical sophistication and human judgment. Automation can enhance governance, but it cannot replace the need for clinicians and administrators to stay vigilant and adaptive.

Healthcare organizations that embrace this shift from static oversight to dynamic management will be better equipped to unlock the promise of LLMs while protecting patient safety and clinical integrity.

Dr. Justin Norden is co-founder and CEO of Qualified Health, a digital health company. Kedar Mate, MD is co-founder and chief medical officer of Qualified Health.

Source link

Why LLMs demand a rethink of healthcare AI governance

Apple’s smartphone market share plummets as Samsung surges — here’s why

U.S. seizes $2.8 million in crypto from Zeppelin ransomware operator

FBI issues warning to all smartphone users — a dangerous new scam could be at your door

How to maximize your privacy using Signal calls and chat

Kali Linux can now run in Apple containers on macOS systems

Google backtracks on plans to deactivate shortened goo.gl links

ChatGPT cheaper plan costs $4 or £3.50, might release everywhere

Anker’s 3-in-1 Qi2 charging station has returned to its Prime Day low

UFC 319 LIVE: Du Plessis vs Chimaev, watch from anywhere, fight times, live updates

Nintendo Switch 2 restocks live — Best Buy, Walmart, and more retailers all have stock now

Allianz Life confirms data breach impacts majority of 1.4 million customers

How to sign up for Amazon Prime 2025

Related posts: