AI Domain - Workflow Narrative
A conversational guide to understanding clinical intelligence features
What is the AI Domain?
The AI domain brings artificial intelligence capabilities to the EHR. It uses Google's Gemini model to help with clinical tasks - generating patient summaries, suggesting diagnoses, helping with medical coding, and providing a conversational interface for clinical queries.
This isn't AI for AI's sake. It's targeted assistance that helps clinicians work more efficiently while keeping them in control of clinical decisions.
The Technology Stack
Google Gemini
The AI domain is powered by Google Gemini 2.5 Flash. This is a large language model with strong medical knowledge and the ability to process both text and images.
The GeminiClient is a singleton that manages the connection to Google's AI platform. It handles authentication via API key, configures safety settings appropriate for medical content, and provides methods for generating responses.
Medical Safety Settings
Standard AI safety filters might block legitimate medical content. Descriptions of injuries, surgical procedures, or medication side effects could trigger "harmful content" filters designed for consumer applications.
The Gemini client is configured with relaxed safety settings (BLOCK_NONE) for medical categories. This ensures clinically relevant content isn't inappropriately blocked.
Patient Summaries
What They Are
Patient summaries are AI-generated overviews of a patient's clinical status. Instead of reading through dozens of encounter notes and lab results, a practitioner can get a concise summary.
The Generation Flow
When a summary is requested via /api/ai/summarize-patient:
- The system fetches the patient data - demographics, conditions, medications, allergies, recent observations
- This data is formatted into a structured prompt
- The prompt is sent to Gemini with instructions to summarize
- Gemini returns a natural language summary
- The summary can be cached for future requests
Prompt Engineering
The prompts are carefully crafted. They include:
- Clear instructions on what to summarize
- The factual patient data
- Formatting requirements
- Warnings to not fabricate information
The prompt tells Gemini to act as a clinical summarization assistant and to only include information from the provided data.
Caching Strategy
Patient summaries are cached in Redis. Since generating summaries involves an API call to Google, caching saves time and cost for repeated requests.
Cache keys include the patient ID and a hash of their current data. If the patient data changes significantly, the cache is invalidated and a fresh summary is generated.
Chat Interface
Conversational AI
The chat interface allows practitioners to have a conversation about patient care. They can ask questions, get explanations, request help with documentation.
Session-Based Conversations
Chat is organized into sessions. A session maintains conversation history so each message has context from previous exchanges.
When a new message is sent to /api/ai/chat:
- The session is loaded (or created if new)
- Previous messages are retrieved
- The new message is added with the conversation history
- Gemini generates a response with full context
- The response is saved to the session
Multimodal Chat
The chat supports more than text. The /api/ai/chat/multimodal endpoint accepts images. A practitioner could upload a photo of a skin condition and ask "What could this be?"
Gemini's multimodal capabilities analyze both the image and the text query to provide relevant clinical information.
File Handling
For large media files, the system uses Gemini's File API. Files are uploaded to Google's servers, and Gemini receives a URI reference rather than the raw bytes. This is more efficient for large images or documents.
AI Output Generation
Structured Outputs
Beyond free-form chat, the AI can generate structured clinical content. The /api/ai/generate endpoint produces specific output types:
- SOAP notes - Subjective, Objective, Assessment, Plan format
- Progress notes - General visit summaries
- Referral letters - Formal letters to specialists
- Discharge summaries - End-of-care documentation
- Patient instructions - Plain-language care instructions
Template-Based Generation
Each output type has associated prompts and templates. The system provides clinical context (encounter data, patient history) and instructions for the desired format. Gemini generates content that fits the template.
Review Before Use
Generated content is always presented for practitioner review. The AI drafts; the human approves. This keeps clinicians in control and ensures accuracy before documentation is finalized.
Clinical Coding Assistance
ICD-10 and CPT Suggestions
The AI can suggest appropriate diagnosis and procedure codes. Given a clinical note, it identifies diagnoses and maps them to ICD-10 codes.
This isn't replacement for certified coders, but it provides a starting point. The suggestions include confidence levels so coders know which suggestions are strong matches versus guesses.
How It Works
- Clinical text is sent to the AI endpoint
- The prompt asks Gemini to identify diagnoses
- Gemini returns structured diagnosis suggestions
- These are matched against the reference ICD-10 database
- Valid matches are returned with codes and descriptions
Context Management
Patient Context
When AI features are used in a patient context, the system automatically includes relevant patient data. The practitioner doesn't need to copy-paste demographics or medication lists - the system provides this context automatically.
Encounter Context
Similarly, when working within an encounter, the encounter details are included. The AI knows about the chief complaint, previous notes in this encounter, and services already documented.
Privacy Considerations
All AI processing happens with patient data. This data is sent to Google's API. The organization's data privacy policies should acknowledge this AI integration.
The system doesn't perform background AI analysis on data without user action. AI is invoked only when a user explicitly requests it (generating a summary, asking a question, etc.).
Search Grounding
Web Search Integration
For some queries, the AI can be configured to use "search grounding" - incorporating web search results into its response.
If a practitioner asks "What are the latest treatment guidelines for diabetes?", the AI can search for recent information rather than relying solely on its training data.
When It's Used
Search grounding is optional and typically disabled for patient-specific queries (where you want answers based on the patient's data, not web results). It's more useful for general medical knowledge questions.
Performance and Cost
Token Usage
AI requests consume tokens, which have associated costs. The system tracks token usage:
- Prompt tokens (input size)
- Completion tokens (output size)
- Image tokens (for multimodal)
Optimization Strategies
Several strategies minimize costs:
- Caching summaries to avoid regeneration
- Limiting conversation history length
- Using appropriate max_tokens limits
- Batching related requests when possible
Rate Limiting
To prevent abuse or runaway costs, the system can implement rate limiting on AI endpoints. Users get a generous budget for normal use, but can't make unlimited expensive AI calls.
The Chat Service Architecture
Service Decomposition
The AI domain is split into focused services:
- ChatService - Handles text and multimodal chat
- SummaryService - Generates patient summaries
- AIOutputService - Creates structured clinical documents
- GeminiClient - Low-level API communication
This separation keeps each service focused and testable.
The Main Orchestrator
What the routers call is typically a facade that coordinates these services. The /api/ai/chat endpoint uses ChatService, but ChatService might call GeminiClient and interact with session storage.
Error Handling
API Failures
Google's API can fail - network issues, rate limits, temporary outages. The AI services include retry logic with exponential backoff.
If retries fail, the system returns a graceful error message rather than crashing. Users see "AI service temporarily unavailable" rather than a stack trace.
Content Filters
Even with relaxed safety settings, Gemini might rarely refuse to respond to certain content. The service handles these refusals by returning an appropriate message to the user.
Integration with Clinical Workflows
From Documentation
When a practitioner is documenting an encounter, they might invoke AI help:
- "Help me write a SOAP note for this visit"
- "Suggest diagnoses based on the documentation"
- "Generate patient instructions for home care"
To Documentation
AI-generated content can be inserted into the encounter. The practitioner reviews it, makes edits, and saves. The final documentation is stored in the encounter, not the AI system.
Activity Logging
AI interactions can be logged as activity events. This provides an audit trail of when AI was used for which patients.
Key Takeaways
-
Gemini powers the AI - Google's latest model with medical knowledge
-
Summaries condense patient data - AI reads so clinicians don't have to
-
Chat provides conversation - contextual, session-based dialogue
-
Multimodal includes images - photos and documents can be analyzed
-
Structured outputs generate documents - SOAP notes, referrals, etc.
-
Clinician stays in control - AI suggests, human decides
-
Caching reduces cost - frequently used results are cached
Next: Read about the Documents & Storage Domain to understand file management