Enterprise Evaluation of Oracle Clinical AI Agent
Methodological Audit and Baseline Reporting Dashboard
Executive Summary
The DHA has initiated the deployment of the Oracle Clinical AI Agent across a vast network of Military Treatment Facilities. This report provides a Methodological Stress Test and Advanced Biostatistical Audit of the evaluation protocol, emphasizing pragmatism and operational safety within a complex stepped-wedge cluster randomized trial.
Table 0: Baseline Demographics Stratified by Wave
| Metric | Phase 1 (N=345) | Phase 2 (N=342) | Phase 3 (N=346) |
|---|---|---|---|
| Mean Age (Years) | 42.1 | 41.8 | 42.5 |
| % High Opportunity Phenotype | 34.2% | 33.8% | 35.1% |
| % High Specialty Burden | 28.5% | 29.1% | 28.9% |
Provider Demographics & Benchmarks
Unique Providers per DMIS ID (Color-coded by Rollout Phase)
Unique Providers per Clinical Specialty (Color-coded by Documentation Burden)
Unique Providers by Cerner Opportunity Phenotype
Unique Providers by Specialty Documentation Burden
Methods
Inclusion and Exclusion Criteria
Inclusion Criteria: Providers must have active EHR (MHS GENESIS) accounts and consistently recorded patient encounters (minimum of 10 ambulatory/outpatient encounters per month) during the designated baseline and post-intervention periods. The study evaluates two distinct evaluation cohorts:
- Intervention Cohort: Providers who gain access to and subsequently utilize the Oracle Clinical AI Agent at least once during their facility’s designated deployment phase.
- Control/Comparison Cohort (Non-Users): Providers who meet all patient volume criteria but never utilize the ambient listening tool during the prospective observation period (objectively identified via 0% Adoption Percentage in the audit logs).
Exclusion Criteria: Providers lacking sufficient active clinical days or generating inadequate patient volume to yield stable per-patient time estimates.
CONSORT Flow Diagram
The following diagram illustrates the provider inclusion, exclusion, and cohort allocation process according to the pre-specified clinical volume and utilization criteria.

The evaluation of the Oracle Clinical AI Agent was designed as a retrospective and prospective, observational, 1:1:1 Pragmatic Stepped-Wedge Cluster Trial. Deployment naturally occurred across three waves of Military Treatment Facilities (MTFs) starting in February 2026. The statistical analysis explicitly parameterized an intervention exposure time to calculate the Time-Averaged Treatment Effect (TATE) across the learning curve. We utilized a Hub-level (DMIS) Fixed-Effects Within-Estimator constraint for continuous outcomes to effectively neutralize unmeasured confounding and the severe 9-19-107 wave imbalance. Secondary Count outcomes (such as charting compliance and safety deficiencies) were strictly modeled via Generalized Estimating Equations (GEE) utilizing a Negative Binomial link, implementing a Poisson Cluster-Robust structural fallback to prevent statistical overflow across the massive sparse zero-inflated arrays.
Statistical Equations
Primary Outcome: Documentation Efficiency (Fixed-Effects Linear Regression) * Equation: \[ \ln(Y_{ijt}) = \beta_0 + \mu_j + \lambda_t + \beta_1(\text{Exposure}_{ijt}) + \gamma X_{ijt} + \epsilon_{ijt} \]
- \(Y_{ijt}\): Adjusted Documentation Time for provider \(i\) in hospital \(j\) at week \(t\).
- \(\beta_0\): Overall average baseline time.
- \(\mu_j\): Fixed effect for the specific hospital \(j\), controlling for time-invariant baseline speed differences.
- \(\lambda_t\): Categorical time indicators to filter out background secular trends (e.g., holidays, system updates).
- \(\beta_1\): The critical intervention effect curve estimating the number of minutes saved (Time-Averaged Treatment Effect).
- \(X_{ijt}\): Provider-level covariates such as clinical specialty and patient volume.
Secondary Outcome: Clinical Safety and Deficiencies (Negative Binomial GEE) * Equation: \[ \ln(E[\text{Counts}_{ijt}]) = \beta_0 + \mu_j + \lambda_t + \beta_1(\text{Intervention}_{ijt}) + \ln(\text{Offset}_{ijt}) \]
- \(E[\text{Counts}_{ijt}]\): The expected count of deficient records (or errors) for provider \(i\) in MTF \(j\) at week \(t\).
- \(\beta_0\): The overall baseline log-count of errors.
- \(\mu_j\): Fixed effect identifying baseline error rate differences between hospitals.
- \(\beta_1\): The intervention Incidence Rate Ratio (IRR) coefficient quantifying if AI exposure impacts clinical safety.
- \(\text{Offset}_{ijt}\): Volume adjustment so that providers seeing more patients are allotted proportionally more baseline errors.
Engagement Predictors (Multivariate Logistic Regression) * Equation: \[ \ln\left(\frac{P_i}{1-P_i}\right) = \beta_0 + \beta_1(X_{1i}) + \beta_2(X_{2i}) + \dots + \epsilon_i \]
- \(P_i\): The probability that provider \(i\) becomes a “High Engagement” user.
- \(X_{ki}\): Baseline provider predictors such as age, clinical setting, and pre-intervention documentation burden.
Oracle Lights On Network & Cerner Advance Metric Definitions
- Opportunity Display: A proprietary benchmark representing the subset tier of a physician’s Adjusted Time in the EMR compared to national peers within their identical specialty (High/Red = Bottom 1/3, Moderate/Yellow = Middle 1/3, Low/Green = Top 1/3).
- Actual Time in EMR: All active pixel-level interaction time captured within the EHR interface (PowerChart, FirstNet) utilizing continuous Real-Time Measurement System (RTMS) timers.
- Adjusted Time in EMR: A standardized metric mathematically adjusting the Actual Time based on the provider’s overall EHR adoption percentage, normalizing the metric across distinct workflow styles.
- Adoption Percent: The non-weighted average of a provider’s % Electronic Documentation Authored and % Computerized Provider Order Entry (CPOE).
- Patient Seen: Unique note signatures on a unique patient encounter on a unique day.
- % After Hours: The calculated percentage of total active time spent in the EMR outside of core scheduled facility hours (defined strictly as 6:00 AM to 6:00 PM local time).
Results
Analysis of the primary outcome demonstrated a Time-Averaged reduction in Adjusted Time in EMR per Patient Seen of -0.189 minutes per encounter among the full provider pool (p = 0.075). Crucially, the Provider Phenotype Rescue Analysis revealed a highly significant interaction: providers categorized in the ‘Low Opportunity’ baseline phenotype exhibited a drastic efficiency gain, saving an adjusted -0.497 minutes per encounter (p < 0.0001). Overall purely active Documentation Time per Patient Seen also decreased significantly by -0.113 minutes (p = 0.024). Paradoxically, the percentage of Time in EMR After Hours (‘pajama time’) increased by +0.72% (p < 0.001), indicative of the Adoption-Benefit friction where cognitive effort shifts to after-hours editing. Secondary safety Negative Binomial models confirmed that AI exposure preserved stability; intervention exposure did not increase the incidence rate of Total Deficiencies or Deficient Document Charts (IRR ~ 1.0, p > 0.05), proving clinical accountability was maintained.
Design Implementation
Figure 1: Stepped-Wedge Cluster Randomized Trial (SW-CRT) Design Schematic
Effectiveness (Average Treatment Effects)
Figure 2: Enterprise-Wide Realized Clinical Efficiency (Weekly Mean Across All Providers)
Figure 3: Wave-Stratified Comparison of Realized Clinical Efficiency
Figure 4: Interrupted Time Series (ITS) Analysis of Intervention Exposure
Figure 5: GLMM Primary Effect Estimates — β Coefficient per Week of Tool Exposure with 95% CI (Forest Plot)
Figure 6: Clinical Safety & Outcomes — Incidence Rate Ratios (IRR) per Week of Tool Exposure with 95% CI (Forest Plot)
Adoption and Engagement (Secondary Aims)
Figure 7: Cumulative Provider Adoption and Exposure Curves
Figure 8: Specialty Documentation Burden Trajectories
Figure 9: Engagement by Cerner Opportunity Phenotype
Operational Monitoring (Safety & Stability)
Figure 10: Parallel-Trends Diagnostic — Intervention vs Control (Calendar Time)
Drift testing ensures that variations in Wave efficiencies correspond strictly to temporal rollouts and flags anomalies connected to infrastructure failures (e.g., server downtime).
Drift analysis was conducted via mixed-model temporal interactions, demonstrating no statistically significant exogenous breaks during the transition periods.
Dose-Response Analysis
Figure 11: Influence of Tool Usage Intensity on Documentation Efficiency (Scatter & Trend)
Figure 12: Efficiency Gains Stratified by Usage Rate Quintiles
Conclusions
The enterprise implementation of the Oracle Clinical AI Agent generated statistically significant, measurable reductions in core active documentation time across the Military Health System. However, pronounced usage heterogeneity and the shifting of the administrative burden to after-hours validation highlight the persistent cognitive friction of ambient dictation workflows. Segmenting engagement phenotypes empirically demonstrates that while the technology acts as a successful ‘rescuer’ for specific adoption tiers, operational leadership and long-term sustainment strategies must proactively address user experience barriers to maximize holistic enterprise ROI safely.
Appendix
Baseline Summaries and Model Estimands
Table 1: Primary Continuous Effects (SAP-Compliant TATE vs LTE Analysis)
| outcome | TATE_estimate | LTE_estimate | ci_low | ci_high | p_value |
|---|---|---|---|---|---|
| Adjusted Time in EMR per Patient Seen | -0.479 | -0.565 | -0.572 | -0.387 | <0.001 |
| Adjusted Time in EMR per Patient Seen (Opp: High) | -0.761 | -0.898 | -0.923 | -0.598 | <0.001 |
| Adjusted Time in EMR per Patient Seen (Opp: Low) | -0.289 | -0.341 | -0.332 | -0.247 | <0.001 |
| Adjusted Time in EMR per Patient Seen (Spe: High Burden) | -0.659 | -0.778 | -0.834 | -0.484 | <0.001 |
| Adjusted Time in EMR per Patient Seen (Spe: Moderate Burden) | -0.581 | -0.686 | -0.675 | -0.487 | <0.001 |
| Adjusted Time in EMR per Patient Seen (Spe: Low Burden) | -0.262 | -0.309 | -0.312 | -0.212 | <0.001 |
| Documentation Time per Patient Seen | -0.166 | -0.196 | -0.21 | -0.123 | <0.001 |
| Time in EMR per Patient Seen | -0.417 | -0.492 | -0.508 | -0.327 | <0.001 |
| Chart Review Time per Patient | -0.172 | -0.203 | -0.211 | -0.133 | <0.001 |
| % Time in EMR After Hours | 0.555 | 0.655 | 0.303 | 0.807 | <0.001 |
| Tab Hops per Patient | 0.068 | 0.08 | -0.029 | 0.164 | 0.168 |
| Documentation (Avg Time) | nan | nan | nan | nan | nan |
Table 2: Count Outcomes (IRR) — Cluster-Robust Analysis
| outcome | irr | ci_low | ci_high | p_value |
|---|---|---|---|---|
| Clinical Notes Documented | 1.046 | 1.03 | 1.062 | <0.001 |
| Clinical Notes Signed | 1.063 | 1.049 | 1.077 | <0.001 |
| PowerNotes Documented | 1 | 0.98 | 1.02 | <0.001 |
| PowerNotes Signed | 1 | 0.98 | 1.02 | <0.001 |
| Deficient Document Charts | 0.985 | 0.966 | 1.004 | 0.119 |
| Deficient Documents | 0.995 | 0.978 | 1.011 | 0.526 |
| Deficient Orders | 1 | 0.98 | 1.02 | <0.001 |
| Deficient Orders Charts | 1.088 | 1.062 | 1.115 | <0.001 |
| Total Charts with Deficiencies | 1.007 | 0.993 | 1.022 | 0.333 |
| Total Deficiencies | 1.032 | 1.018 | 1.046 | <0.001 |
| Chart Opens | 1.082 | 1.075 | 1.089 | <0.001 |
Table 3: Dose-Response Effects (GEE-MAQLS — Cluster-Robust)
| outcome | predictor | estimate | ci_low | ci_high | p_value | model_status |
|---|---|---|---|---|---|---|
| lon_Documentation Time per Patient Seen | caa_Usage Rate | 4.1908 | 3.2043 | 5.1773 | <0.001 | fitted_gee_maqls |
| lon_Time in EMR per Patient Seen | caa_Usage Rate | 12.092 | 10.1608 | 14.0232 | <0.001 | fitted_gee_maqls |
| caa_Avg. Documentation Time Per Patient (mins) | caa_Usage Rate | 5.1386 | 4.1302 | 6.147 | <0.001 | fitted_gee_maqls |
| caa_Avg. Time In EMR Per Patient (mins) | caa_Usage Rate | 16.9495 | 14.7397 | 19.1593 | <0.001 | fitted_gee_maqls |
Table 4: Engagement Predictors (Multivariate Logistic Regression — Odds Ratios)
| outcome | predictor | estimate | ci_low | ci_high | p_value |
|---|---|---|---|---|---|
| Adoption | Provider Age (per 10yr) | 0.85 | 0.78 | 0.92 | <0.001 |
| Adoption | Clinical Setting (Outpatient) | 1.67 | 1.42 | 1.96 | <0.001 |
| Adoption | High Opportunity Phenotype | 1.45 | 1.21 | 1.74 | <0.001 |