HiringCoachAI

Disaster recovery plan

Last reviewed 2026-05-19

Recovery objectives

ScopeRTO (Recovery Time)RPO (Recovery Point)
Application (Vercel)1 h0: stateless, git-sourced
Firestore data4 h target24 h target from managed daily backups; 7-day PITR available for point-in-time recovery
Cloud Storage (backup/export buckets)4 h target24 h target for supported export workflows; bucket soft-delete/versioning evidence captured for the primary backup/export bucket
Identity (Firebase Auth)4 h0: rebuild via admin SDK export if needed
DNS1 h0
Email (SendGrid)8 h0 (queues retry)
End-to-end user-facing restore4 h24 h

Backup strategy

Current posture verified on 2026-05-07:

  • Google Cloud provides managed platform durability and encryption for Firestore and Cloud Storage.
  • Firestore is in a US (North America) multi-region Google Cloud location, with PITR enabled for a 7-day window.
  • Firestore managed daily backups are configured with 98-day retention, and ready backup snapshots are present.
  • A separate US multi-region backup/export bucket uses Nearline storage, object versioning, public access prevention, uniform bucket-level access, 90-day soft delete, and an unlocked 90-day retention policy.
  • A manual local JSON export of Firestore can be produced with a retention-managed export script.
  • Live Google Cloud backup-configuration evidence is retained in the internal audit evidence set; restore procedures reference the verified live settings rather than checking backup configuration into the repository.

Recovery scenarios

Scenario A: Firestore data corruption or accidental deletion

Trigger: application error, rule misconfig, human error causing data loss on a subset.

1. Identify affected paths; snapshot current state. 2. Use Firestore PITR (gcloud firestore export --database=(default) --collection-ids=... --point-in-time=<timestamp>) for recoveries within the 7-day PITR window, or use the latest managed backup / confirmed export for older recovery points. 3. Restore to a staging project; diff; merge back into production. 4. Verify with targeted queries; log evidence.

Target RTO: 1-2 h for targeted restore.

Scenario B: Full Firestore loss / region outage

1. Provision a new Firestore instance (alternate region if original is unavailable). 2. Import the most recent verified backup or manual export. 3. Apply the version-controlled Firestore Security Rules and Firestore indexes from git. 4. Point the application at the new project (update the relevant Firebase configuration in the hosting platform). 5. Verify signin, core reads/writes. 6. Communicate 24-hour data-loss window to any affected users.

Target RTO: 4 h.

Scenario C: Vercel hosting outage

1. Detect via external monitoring and Vercel status page. 2. If prolonged: deploy next build && next export static bundle to Firebase Hosting as read-only. 3. Re-enable writes once Vercel restored.

Target RTO: 2 h for read-only; depends on Vercel for writes.

Scenario D: Compromised credentials / production breach

See incident response for full procedure. At minimum: 1. Revoke all sessions (sessions collection purge). 2. Rotate all secrets listed in the internal secrets rotation log (available on request via [email protected]). 3. Force re-auth for all users. 4. Restore from last known-clean backup if data tampering suspected.

Scenario E: Primary incident-owner unavailability

  • Recovery relies on documented recovery procedures, recovery materials version-controlled in git, and the offline credential-recovery process.

Scenario F: Sub-processor outage

  • OpenAI → AI features degraded; failover to an alternate AI provider requires code/config changes plus DPA execution and sub-processor inventory update before production traffic flows.
  • SendGrid → Manual DNS + template switch to AWS SES or Resend (pre-staged).
  • Stripe → No automatic failover; pause signups with banner; existing subs unaffected.
  • Deepgram / ElevenLabs → Google STT/TTS fallback wired at application layer.

Drill schedule

FrequencyScope
QuarterlyTabletop scenario review
QuarterlySynthetic safe restore drill using npm run backup:restore-drill:safe
Weekly / on demandAutomated continuity evidence bundle (BCP/DRP documentation checks, synthetic restore verification, public health probes, and optional read-only Google Cloud backup checks)
Semi-annuallyPartial restore drill (one collection to staging)
AnnuallyFull restore drill, RTO measured

Current evidence and gaps

As of 2026-05-07, live GCP evidence confirms Firestore PITR, managed daily Firestore backups with 98-day retention, and a separate US multi-region backup/export bucket with versioning, 90-day soft delete, and a 90-day retention policy. The evidence record is retained in the internal audit evidence set.

As of 2026-05-19, an automated continuity evidence bundle runs the BCP/DRP documentation-review checks, a safe synthetic restore drill, HECVAT caveat checks, and read-only health probes. The bundle complements (does not replace) the human tabletop and full live restore/failover drills.

Last-drill log

DateScenarioDurationRTO achievedRPO achievedFindingsRecord
2026-04-24Targeted Firestore PITR: single subcollection to staging47 min≤ 1 h ✅30 min ✅Doc fix: prefer gcloud storage over deprecated gsutil.Restricted evidence record
2026-05-02Documentation-review tabletop (automated)<1 minn/a (not a live restore)n/a27/27 checks PASS. Drill records, sections, retention claims, sub-processor list, and repo artifacts all internally consistent. Live restore + RTO measurement deferred to next semi-annual cycle.Internal drill record
2026-05-07Synthetic safe local restore drill<1 minn/a (no production restore)n/aPASS. Synthetic Firestore-shaped export restored into isolated local scratch target and checksum matched. No production data read or written.Restricted evidence record
2026-05-14Automated continuity evidence bundle<1 minn/a (not a live restore)n/aPASS: 0 blocking failures. Bundle covered BCP/DRP documentation checks, synthetic restore verification, HECVAT caveat check, and read-only public health probes.Restricted evidence record
2026-05-19Automated continuity evidence bundle<1 minn/a (not a live restore)n/aPASS: 0 blocking failures. Bundle covered BCP/DRP documentation checks, synthetic restore verification, HECVAT caveat check, and read-only public health probes; no production writes performed.Restricted evidence record

Each drill appends a row here. Detailed drill records are retained in the restricted disaster-recovery evidence set.

Related


← Back to the trust center

showUpgradeModal: false, modalType: migration, planName: