Documents & Storage Domain - Workflow Narrative
A conversational guide to understanding file management workflows
What is the Documents & Storage Domain?
This domain handles all file management in the EHR - uploading, storing, retrieving, and organizing documents. In healthcare, documents are everywhere: lab reports, imaging studies, consent forms, clinical photographs, scanned records from other providers.
The challenge is storing these files securely while making them easily accessible to authorized users.
Storage Architecture
Cloudflare R2
The primary storage backend is Cloudflare R2, an S3-compatible object storage service. Files are uploaded to R2 buckets and referenced by their storage paths.
Why R2? It's cost-effective, globally distributed, and provides the same API as AWS S3. This means the code can easily switch between storage backends if needed.
The Storage Service
FileStorageService is the abstraction layer over storage. It supports multiple backends:
- local - For development, stores files on the local filesystem
- s3 - For production, uses S3-compatible storage (Cloudflare R2)
- gcs - Google Cloud Storage (available but not primary)
The service determines which backend to use based on the STORAGE_TYPE environment variable.
Signed URLs
Files aren't directly accessible via public URLs. Instead, the system generates signed URLs - time-limited, authenticated URLs that provide temporary access.
When you request a document, you get a signed URL valid for a short period (typically 15-60 minutes). This prevents unauthorized access even if someone discovers a file path.
Document References
Metadata vs Content
The actual file content lives in R2. The database stores metadata about the file in the DocumentReference model:
- id - Unique document ID
- patient_id - Which patient this document belongs to
- encounter_id - Optional link to an encounter
- title - Human-readable name
- category - Classification (lab, imaging, consent, etc.)
- content_type - MIME type (image/jpeg, application/pdf)
- url - Storage path (s3://bucket/path)
- file_size - Size in bytes
- thumbnail_url - For images, a smaller preview
This separation means queries against documents don't need to fetch file content - just metadata.
Uploading Documents
The Upload Flow
When a user uploads a file via /documents:
- File Validation - Is the file type allowed? Is it within size limits?
- Storage Upload - File is uploaded to R2 with a generated path
- Thumbnail Generation - For images, a thumbnail is created
- Metadata Creation - DocumentReference record is created
- Activity Event - Event logged for the upload
- Response - Return document ID and signed URL
Path Generation
Files are stored with organized paths:
{organization_id}/{patient_id}/documents/{year}/{month}/{uuid}.{ext}
This structure makes it easy to manage storage by organization and patient.
Thumbnail Generation
For image files (JPEG, PNG), thumbnails are automatically generated. These smaller previews load faster in document galleries without needing to download full-resolution images.
Thumbnails are stored alongside the original with a _thumb suffix.
Bulk Upload
Multiple Files at Once
The bulk_upload_documents() method handles multiple file uploads in a single request. Common use case: a patient brought 5 pages of records from another provider.
Transaction Safety
Bulk uploads are wrapped in a transaction. If any file fails (validation error, storage error), the entire batch is rolled back. You don't end up with partial uploads.
Progress Tracking
For large batch uploads, the response includes status for each file:
- Which succeeded
- Which failed and why
- Overall success/failure count
Retrieving Documents
List Documents
Document listing supports filters:
- By patient (most common)
- By category (show only lab reports)
- By encounter (documents from a specific visit)
- By date range (documents from last 6 months)
Results include metadata but not file content. To get the actual file, you request a download URL.
Download URLs
DocumentUrlService.get_download_url() generates a signed URL for accessing the file. The frontend uses this URL in an <img> tag or download link.
The signed URL includes:
- The storage path
- Expiration time
- Cryptographic signature
R2 validates the signature before serving the file.
Streaming for Large Files
Large files (videos, DICOM imaging) might use streaming downloads. The client requests ranges of the file rather than downloading all at once. This is handled by R2's range request support.
Document Categories
Organization by Type
Documents are categorized for easier organization:
- lab - Laboratory reports and results
- imaging - X-rays, CT scans, ultrasounds
- clinical-note - Scanned clinical notes
- consent - Signed consent forms
- referral - Referral letters
- insurance - Insurance cards, EOBs
- photo - Clinical photographs
- other - Miscellaneous documents
Category-Based Access
Access control can be category-aware. Billing staff might see insurance documents but not clinical photos. Categories enable this granular control.
Document Annotations
Markup and Comments
The DocumentAnnotation model allows users to add annotations to documents. This is especially useful for imaging:
- Drawing on an X-ray to highlight a fracture
- Adding comments to a lab result
- Marking areas of concern on clinical photos
Annotation Data
Annotations are stored as JSON with:
- type - marker, drawing, text, highlight
- content - The annotation content (text, coordinates)
- page_number - For multi-page documents
- created_by - Who made the annotation
FHIR-Inspired Design
The annotation model follows FHIR patterns. Annotations can reference specific regions using bounding boxes or free-form paths.
Profile Images
Special Case: Profile Photos
User and patient profile images are a special document type. They're handled by ProfileImageService which provides:
- Automatic resizing to appropriate dimensions
- Thumbnail generation for avatars
- Update handling (replacing old images)
- Cleanup of old images when replaced
Storage Path
Profile images use a different path pattern:
{organization_id}/profiles/{type}/{id}/{uuid}.{ext}
Where type is 'patient' or 'practitioner'.
Security Considerations
Organization Isolation
Documents are strictly scoped to organizations. The storage path includes organization_id. Queries filter by organization. Cross-organization document access is prevented at every layer.
Access Control
If ACL is enabled, document access follows patient access. If you can't view a patient, you can't view their documents. This is checked when generating download URLs.
Audit Logging
Document access can be logged. When a user downloads a document, an audit entry records who accessed what and when. This supports HIPAA compliance requirements.
No Direct Storage Access
End users never get direct access to the R2 bucket. They always go through the API, which enforces authentication and authorization before generating signed URLs.
Integration with Clinical Domains
From Encounters
Many documents are uploaded in encounter context. A lab result from today's visit gets the encounter_id, linking it to that specific clinical interaction.
To AI
Documents can be sent to the AI domain for analysis. A clinical photo can be included in a multimodal chat. Lab reports can be summarized by AI.
From External Sources
Documents might come from external sources - faxed records, patient portal uploads, integration feeds. These flows use the same upload service but with different triggers.
Clean-up and Retention
Retention Policies
Healthcare documents typically have long retention requirements (7+ years). The system is designed for long-term storage, not automatic deletion.
Manual Deletion
Document deletion is restricted. Usually only administrators can delete documents, and soft-delete is preferred. The document reference is marked inactive but the file remains for compliance.
Orphan Cleanup
A maintenance process can identify orphaned files - storage files without corresponding database references. These might result from failed uploads or deleted references.
Key Takeaways
-
R2 is the storage backend - S3-compatible object storage
-
Signed URLs provide access - time-limited, authenticated access
-
Metadata in database - DocumentReference stores metadata, not content
-
Thumbnails for images - automatic preview generation
-
Categories organize documents - lab, imaging, consent, etc.
-
Annotations allow markup - drawings, comments, highlights
-
Security is layered - organization isolation, ACL, audit logging
Next: Read about the Access Control Domain to understand security workflows