Solo Operator Runbook¶

This page is the operating procedure for a controlled GiveCare pilot while one person is reviewing the system. It is internal-facing: it defines what the operator checks, when to intervene, and when to pause enrollment.

Operating Posture¶

GiveCare is SMS support for caregivers. Mira can remember context, help sort one next step, check in, and point to resources or benefits worth checking. Mira is not an emergency service, clinician, lawyer, benefits adjudicator, or care coordinator.

During a solo pilot, keep enrollment small enough that one operator can review every exception the same day. If the queue cannot be reviewed reliably, pause new enrollment.

Daily Checks¶

Check these at least once per active pilot day:

Area	Admin action	What to look for
Production flow	`productionFlow`	failed outbound messages, stuck jobs, fallback traces, human-review traces
Safety queue	`listSafetyEvents`	open critical or high-severity events
Review queue	`listReviewQueue`	blocked model replies, benefits certainty, resource issues, clinical/legal boundary flags
Population health	`listOrgs`, `getOrgPopulationHealth`	enrollment, consent rate, score coverage, pressure zones, open loops, opt-outs
Caregiver trace	`getCaregiverByPhone`, `getCaregiverTraces`, `getCaregiverMemory`	whether Mira is using memory correctly and staying inside boundaries

Safety Event Procedure¶

Open the safety event and recent caregiver traces.
Confirm whether the deterministic safety response was sent.
If there is imminent danger, the SMS response should route to emergency/crisis resources; do not attempt to manage the emergency yourself.
If a human follow-up is appropriate, send a short message that reinforces safety and encourages local/emergency support.
Resolve the safety event only after the record has an outcome note.
If multiple safety events appear or review falls behind, pause the org or public signup path.

Operator messages should be short, factual, and boundary-safe. Do not diagnose, promise availability, or imply human clinical monitoring.

Review Queue Procedure¶

Use the review queue for messages blocked by evaluator policy or routed to human review.

Read the triggering trace, model draft, flags, and recent context.
Decide one of: approve no send, send a safer operator reply, or mark resolved with no reply.
If replying, keep it to one useful next step or one clarifying question.
Resolve the queue item with the reason.

Common reasons to avoid sending: legal certainty, benefits certainty, unsupported resource claims, medical advice, excessive length, or too many steps.

Failed Delivery Procedure¶

If production flow shows failed outbound messages:

Check whether failures are isolated or systemic.
Confirm Twilio status if available.
For isolated failures, leave the caregiver record intact and retry only if the failure reason is transient.
For systemic failures, pause proactive outreach until delivery recovers.
Do not manually re-send multiple texts to the same caregiver without checking recent outbound history.

Enrollment And Pilot Health¶

For each partner cohort, review weekly:

enrolled caregivers
consent granted / pending / revoked
active caregivers in the last 7 days
inbound and outbound message counts
failed or pending outbound messages
assessment completion and score coverage
top pressure zones
open caregiver loops
suggested benefit/resource opportunities
safety and review events

This is a manual report until a partner-facing dashboard is intentionally released.

Data Requests¶

For deletion requests, use the admin wipe path only after confirming the caregiver identity and recording the request outside the caregiver record. For export requests, collect conversation, assessment, score, memory, and screening data manually until an explicit export workflow exists.

Pause Conditions¶

Pause enrollment or the affected org when any of these are true:

safety or review queues cannot be checked that day
outbound delivery is failing broadly
repeated evaluator fallbacks indicate a broken model or prompt release
a partner cohort exceeds the solo review capacity
public copy or partner expectations no longer match actual system behavior

InvisibleBench Use¶

InvisibleBench is the regression and safety-evaluation layer. Use it before changing model, prompt, tool, safety, or benefits behavior. It does not replace live operations: benchmark pass rates show expected behavior in known scenarios, while this runbook covers what the operator does when real caregivers trigger exceptions.