AI Domain - Workflow Narrative

A conversational guide to understanding clinical intelligence features

What is the AI Domain?

The AI domain brings artificial intelligence capabilities to the EHR. It uses Google's Gemini model to help with clinical tasks - generating patient summaries, suggesting diagnoses, helping with medical coding, and providing a conversational interface for clinical queries.

This isn't AI for AI's sake. It's targeted assistance that helps clinicians work more efficiently while keeping them in control of clinical decisions.

The Technology Stack

Google Gemini

The AI domain is powered by Google Gemini 2.5 Flash. This is a large language model with strong medical knowledge and the ability to process both text and images.

The GeminiClient is a singleton that manages the connection to Google's AI platform. It handles authentication via API key, configures safety settings appropriate for medical content, and provides methods for generating responses.

Medical Safety Settings

Standard AI safety filters might block legitimate medical content. Descriptions of injuries, surgical procedures, or medication side effects could trigger "harmful content" filters designed for consumer applications.

The Gemini client is configured with relaxed safety settings (BLOCK_NONE) for medical categories. This ensures clinically relevant content isn't inappropriately blocked.

Patient Summaries

What They Are

Patient summaries are AI-generated overviews of a patient's clinical status. Instead of reading through dozens of encounter notes and lab results, a practitioner can get a concise summary.

The Generation Flow

When a summary is requested via /api/ai/summarize-patient:

The system fetches the patient data - demographics, conditions, medications, allergies, recent observations
This data is formatted into a structured prompt
The prompt is sent to Gemini with instructions to summarize
Gemini returns a natural language summary
The summary can be cached for future requests

Prompt Engineering

The prompts are carefully crafted. They include:

Clear instructions on what to summarize
The factual patient data
Formatting requirements
Warnings to not fabricate information

The prompt tells Gemini to act as a clinical summarization assistant and to only include information from the provided data.

Caching Strategy

Patient summaries are cached in Redis. Since generating summaries involves an API call to Google, caching saves time and cost for repeated requests.

Cache keys include the patient ID and a hash of their current data. If the patient data changes significantly, the cache is invalidated and a fresh summary is generated.

Chat Interface

Conversational AI

The chat interface allows practitioners to have a conversation about patient care. They can ask questions, get explanations, request help with documentation.

Session-Based Conversations

Chat is organized into sessions. A session maintains conversation history so each message has context from previous exchanges.

When a new message is sent to /api/ai/chat:

The session is loaded (or created if new)
Previous messages are retrieved
The new message is added with the conversation history
Gemini generates a response with full context
The response is saved to the session

Multimodal Chat

The chat supports more than text. The /api/ai/chat/multimodal endpoint accepts images. A practitioner could upload a photo of a skin condition and ask "What could this be?"

Gemini's multimodal capabilities analyze both the image and the text query to provide relevant clinical information.

File Handling

For large media files, the system uses Gemini's File API. Files are uploaded to Google's servers, and Gemini receives a URI reference rather than the raw bytes. This is more efficient for large images or documents.

AI Output Generation

Structured Outputs

Beyond free-form chat, the AI can generate structured clinical content. The /api/ai/generate endpoint produces specific output types:

SOAP notes - Subjective, Objective, Assessment, Plan format
Progress notes - General visit summaries
Referral letters - Formal letters to specialists
Discharge summaries - End-of-care documentation
Patient instructions - Plain-language care instructions

Template-Based Generation

Each output type has associated prompts and templates. The system provides clinical context (encounter data, patient history) and instructions for the desired format. Gemini generates content that fits the template.

Review Before Use

Generated content is always presented for practitioner review. The AI drafts; the human approves. This keeps clinicians in control and ensures accuracy before documentation is finalized.

Clinical Coding Assistance

ICD-10 and CPT Suggestions

The AI can suggest appropriate diagnosis and procedure codes. Given a clinical note, it identifies diagnoses and maps them to ICD-10 codes.

This isn't replacement for certified coders, but it provides a starting point. The suggestions include confidence levels so coders know which suggestions are strong matches versus guesses.

How It Works

Clinical text is sent to the AI endpoint
The prompt asks Gemini to identify diagnoses
Gemini returns structured diagnosis suggestions
These are matched against the reference ICD-10 database
Valid matches are returned with codes and descriptions

Context Management

Patient Context

When AI features are used in a patient context, the system automatically includes relevant patient data. The practitioner doesn't need to copy-paste demographics or medication lists - the system provides this context automatically.

Encounter Context

Similarly, when working within an encounter, the encounter details are included. The AI knows about the chief complaint, previous notes in this encounter, and services already documented.

Privacy Considerations

All AI processing happens with patient data. This data is sent to Google's API. The organization's data privacy policies should acknowledge this AI integration.

The system doesn't perform background AI analysis on data without user action. AI is invoked only when a user explicitly requests it (generating a summary, asking a question, etc.).

Search Grounding

Web Search Integration

For some queries, the AI can be configured to use "search grounding" - incorporating web search results into its response.

If a practitioner asks "What are the latest treatment guidelines for diabetes?", the AI can search for recent information rather than relying solely on its training data.

When It's Used

Search grounding is optional and typically disabled for patient-specific queries (where you want answers based on the patient's data, not web results). It's more useful for general medical knowledge questions.

Performance and Cost

Token Usage

AI requests consume tokens, which have associated costs. The system tracks token usage:

Prompt tokens (input size)
Completion tokens (output size)
Image tokens (for multimodal)

Optimization Strategies

Several strategies minimize costs:

Caching summaries to avoid regeneration
Limiting conversation history length
Using appropriate max_tokens limits
Batching related requests when possible

Rate Limiting

To prevent abuse or runaway costs, the system can implement rate limiting on AI endpoints. Users get a generous budget for normal use, but can't make unlimited expensive AI calls.

The Chat Service Architecture

Service Decomposition

The AI domain is split into focused services:

ChatService - Handles text and multimodal chat
SummaryService - Generates patient summaries
AIOutputService - Creates structured clinical documents
GeminiClient - Low-level API communication

This separation keeps each service focused and testable.

The Main Orchestrator

What the routers call is typically a facade that coordinates these services. The /api/ai/chat endpoint uses ChatService, but ChatService might call GeminiClient and interact with session storage.

Error Handling

API Failures

Google's API can fail - network issues, rate limits, temporary outages. The AI services include retry logic with exponential backoff.

If retries fail, the system returns a graceful error message rather than crashing. Users see "AI service temporarily unavailable" rather than a stack trace.

Content Filters

Even with relaxed safety settings, Gemini might rarely refuse to respond to certain content. The service handles these refusals by returning an appropriate message to the user.

Integration with Clinical Workflows

From Documentation

When a practitioner is documenting an encounter, they might invoke AI help:

"Help me write a SOAP note for this visit"
"Suggest diagnoses based on the documentation"
"Generate patient instructions for home care"

To Documentation

AI-generated content can be inserted into the encounter. The practitioner reviews it, makes edits, and saves. The final documentation is stored in the encounter, not the AI system.

Activity Logging

AI interactions can be logged as activity events. This provides an audit trail of when AI was used for which patients.

Key Takeaways

Gemini powers the AI - Google's latest model with medical knowledge
Summaries condense patient data - AI reads so clinicians don't have to
Chat provides conversation - contextual, session-based dialogue
Multimodal includes images - photos and documents can be analyzed
Structured outputs generate documents - SOAP notes, referrals, etc.
Clinician stays in control - AI suggests, human decides
Caching reduces cost - frequently used results are cached

Next: Read about the Documents & Storage Domain to understand file management

Product User Guide

Guides