Documents & Storage Domain - Workflow Narrative

A conversational guide to understanding file management workflows

What is the Documents & Storage Domain?

This domain handles all file management in the EHR - uploading, storing, retrieving, and organizing documents. In healthcare, documents are everywhere: lab reports, imaging studies, consent forms, clinical photographs, scanned records from other providers.

The challenge is storing these files securely while making them easily accessible to authorized users.

Storage Architecture

Cloudflare R2

The primary storage backend is Cloudflare R2, an S3-compatible object storage service. Files are uploaded to R2 buckets and referenced by their storage paths.

Why R2? It's cost-effective, globally distributed, and provides the same API as AWS S3. This means the code can easily switch between storage backends if needed.

The Storage Service

FileStorageService is the abstraction layer over storage. It supports multiple backends:

local - For development, stores files on the local filesystem
s3 - For production, uses S3-compatible storage (Cloudflare R2)
gcs - Google Cloud Storage (available but not primary)

The service determines which backend to use based on the STORAGE_TYPE environment variable.

Signed URLs

Files aren't directly accessible via public URLs. Instead, the system generates signed URLs - time-limited, authenticated URLs that provide temporary access.

When you request a document, you get a signed URL valid for a short period (typically 15-60 minutes). This prevents unauthorized access even if someone discovers a file path.

Document References

Metadata vs Content

The actual file content lives in R2. The database stores metadata about the file in the DocumentReference model:

id - Unique document ID
patient_id - Which patient this document belongs to
encounter_id - Optional link to an encounter
title - Human-readable name
category - Classification (lab, imaging, consent, etc.)
content_type - MIME type (image/jpeg, application/pdf)
url - Storage path (s3://bucket/path)
file_size - Size in bytes
thumbnail_url - For images, a smaller preview

This separation means queries against documents don't need to fetch file content - just metadata.

Uploading Documents

The Upload Flow

When a user uploads a file via /documents:

File Validation - Is the file type allowed? Is it within size limits?
Storage Upload - File is uploaded to R2 with a generated path
Thumbnail Generation - For images, a thumbnail is created
Metadata Creation - DocumentReference record is created
Activity Event - Event logged for the upload
Response - Return document ID and signed URL

Path Generation

Files are stored with organized paths:

{organization_id}/{patient_id}/documents/{year}/{month}/{uuid}.{ext}

This structure makes it easy to manage storage by organization and patient.

Thumbnail Generation

For image files (JPEG, PNG), thumbnails are automatically generated. These smaller previews load faster in document galleries without needing to download full-resolution images.

Thumbnails are stored alongside the original with a _thumb suffix.

Bulk Upload

Multiple Files at Once

The bulk_upload_documents() method handles multiple file uploads in a single request. Common use case: a patient brought 5 pages of records from another provider.

Transaction Safety

Bulk uploads are wrapped in a transaction. If any file fails (validation error, storage error), the entire batch is rolled back. You don't end up with partial uploads.

Progress Tracking

For large batch uploads, the response includes status for each file:

Which succeeded
Which failed and why
Overall success/failure count

Retrieving Documents

List Documents

Document listing supports filters:

By patient (most common)
By category (show only lab reports)
By encounter (documents from a specific visit)
By date range (documents from last 6 months)

Results include metadata but not file content. To get the actual file, you request a download URL.

Download URLs

DocumentUrlService.get_download_url() generates a signed URL for accessing the file. The frontend uses this URL in an <img> tag or download link.

The signed URL includes:

The storage path
Expiration time
Cryptographic signature

R2 validates the signature before serving the file.

Streaming for Large Files

Large files (videos, DICOM imaging) might use streaming downloads. The client requests ranges of the file rather than downloading all at once. This is handled by R2's range request support.

Document Categories

Organization by Type

Documents are categorized for easier organization:

lab - Laboratory reports and results
imaging - X-rays, CT scans, ultrasounds
clinical-note - Scanned clinical notes
consent - Signed consent forms
referral - Referral letters
insurance - Insurance cards, EOBs
photo - Clinical photographs
other - Miscellaneous documents

Category-Based Access

Access control can be category-aware. Billing staff might see insurance documents but not clinical photos. Categories enable this granular control.

Document Annotations

Markup and Comments

The DocumentAnnotation model allows users to add annotations to documents. This is especially useful for imaging:

Drawing on an X-ray to highlight a fracture
Adding comments to a lab result
Marking areas of concern on clinical photos

Annotation Data

Annotations are stored as JSON with:

type - marker, drawing, text, highlight
content - The annotation content (text, coordinates)
page_number - For multi-page documents
created_by - Who made the annotation

FHIR-Inspired Design

The annotation model follows FHIR patterns. Annotations can reference specific regions using bounding boxes or free-form paths.

Profile Images

Special Case: Profile Photos

User and patient profile images are a special document type. They're handled by ProfileImageService which provides:

Automatic resizing to appropriate dimensions
Thumbnail generation for avatars
Update handling (replacing old images)
Cleanup of old images when replaced

Storage Path

Profile images use a different path pattern:

{organization_id}/profiles/{type}/{id}/{uuid}.{ext}

Where type is 'patient' or 'practitioner'.

Security Considerations

Organization Isolation

Documents are strictly scoped to organizations. The storage path includes organization_id. Queries filter by organization. Cross-organization document access is prevented at every layer.

Access Control

If ACL is enabled, document access follows patient access. If you can't view a patient, you can't view their documents. This is checked when generating download URLs.

Audit Logging

Document access can be logged. When a user downloads a document, an audit entry records who accessed what and when. This supports HIPAA compliance requirements.

No Direct Storage Access

End users never get direct access to the R2 bucket. They always go through the API, which enforces authentication and authorization before generating signed URLs.

Integration with Clinical Domains

From Encounters

Many documents are uploaded in encounter context. A lab result from today's visit gets the encounter_id, linking it to that specific clinical interaction.

To AI

Documents can be sent to the AI domain for analysis. A clinical photo can be included in a multimodal chat. Lab reports can be summarized by AI.

From External Sources

Documents might come from external sources - faxed records, patient portal uploads, integration feeds. These flows use the same upload service but with different triggers.

Clean-up and Retention

Retention Policies

Healthcare documents typically have long retention requirements (7+ years). The system is designed for long-term storage, not automatic deletion.

Manual Deletion

Document deletion is restricted. Usually only administrators can delete documents, and soft-delete is preferred. The document reference is marked inactive but the file remains for compliance.

Orphan Cleanup

A maintenance process can identify orphaned files - storage files without corresponding database references. These might result from failed uploads or deleted references.

Key Takeaways

R2 is the storage backend - S3-compatible object storage
Signed URLs provide access - time-limited, authenticated access
Metadata in database - DocumentReference stores metadata, not content
Thumbnails for images - automatic preview generation
Categories organize documents - lab, imaging, consent, etc.
Annotations allow markup - drawings, comments, highlights
Security is layered - organization isolation, ACL, audit logging

Next: Read about the Access Control Domain to understand security workflows

Product User Guide

Guides