HiringCoachAI

Business continuity plan

Last reviewed 2026-05-19

Objective

Maintain the HiringCoachAI service for users and customers during disruption to people, technology, or third-party services. Keep customer data safe and recoverable throughout.

Business impact

FunctionTolerable downtimeTolerable data loss
Authentication (signin, signup)4 h0
Customer-facing web app4 h0
Resume / cover-letter generation (AI)8 h0
Payment processing (Stripe)4 h0
Transactional email (SendGrid)8 h0
Admin tooling24 h0
Analytics ingestion7 d7 d

These inform the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) in disaster recovery plan.

Key dependencies

DependencyRoleFailure impactAlternative
VercelApplication hostingApp unreachableFail over to Firebase Hosting (static) + direct Cloud Functions (documented in DRP)
Firebase AuthAuthenticationNo signinNo alternative (re-provision in another GCP project in worst case)
FirestorePrimary databaseNo reads/writesRestore from backup to new Firestore project
Cloud StorageBackup/export buckets and Cloud Functions source buckets; user file uploads are not live in production yetBackup/export workflows affected; live user workflows continue unless they depend on restore/export operationsAlternate bucket in different region
StripePaymentsNo new subscriptions; existing charges continuePayments pause with customer notice
SendGridEmailMagic-link signin & notifications fail; in-app still worksFailover to AWS SES or Resend (requires DNS change + template migration)
OpenAI (called both directly and via Vercel AI Gateway)AI generationAI features degradedVercel AI Gateway provides a routing layer to alternate AI providers; failover would require code/config changes plus DPA execution and sub-processor inventory update for any new provider before production traffic flows
DeepgramTranscriptionVoice features disabledQueue for later or switch to Google STT
ElevenLabsSpeech synthesisVoice features disabledGoogle TTS fallback already wired
DNS (Namecheap/Cloudflare)ResolutionSite offlineAlternate registrar record + longer TTL

Activation triggers

The Security Officer activates the BCP when any of:

  • Customer-facing downtime exceeds 1 h
  • Data loss is confirmed or suspected
  • Security incident at Severity 1 or 2 (per incident response)
  • Loss of access to production systems
  • Natural disaster or personnel incident affecting the Security Officer

Response structure

RoleResponsibility
Incident CommanderSecurity Officer. Coordinates response, declares severity, authorizes customer comms.
Technical LeadDiagnoses, orchestrates recovery.
Communications LeadUpdates status page, sends customer comms. Often same person as IC in small-team mode.

Contact info and escalation paths are maintained in an internal on-call register; the current Incident Commander is reachable at [email protected].

Communications

  • Status page: hiringcoach.ai/status: updated within 30 min of declared incident.
  • Customer email: Sent via backup SES/Resend channel if SendGrid is the failing dependency.
  • Regulatory: Per breach notification if personal data is affected.

Continuity of operations

  • All source of truth lives in git (policies, config, code).
  • Firestore PITR, managed daily Firestore backups with 98-day retention, and US multi-region backup/export bucket evidence are retained in the restricted continuity evidence set.
  • Business-continuity recovery relies on documented continuity procedures, recovery materials version-controlled in git, and the Incident Commander role defined in this plan.

Testing

  • Quarterly: Tabletop scenario review (15 min, Security Officer with any additional engineer available).
  • Annually: Full restore drill (logged in DRP).
  • Post-incident: Plan update within 30 days.
  • Weekly / on demand: an automated continuity evidence bundle runs on a scheduled workflow or on demand. It exercises BCP/DRP documentation reviews, a synthetic safe restore drill, public health probes, and optional read-only Google Cloud backup checks when credentials are available; results are archived in the internal drill evidence set. This complements (does not replace) the human tabletop and full live restore/failover drills.

Last-drill log

DateScopeOutcomeRecord
2026-04-24Targeted Firestore PITR restore to staging (single collection, DRP-linked continuity restore drill)PASS: partial-scope RTO 47 min; RPO 30 minInternal drill record
2026-05-02Documentation-review tabletop (automated)PASS: 23/23 checksInternal drill record
2026-05-14Automated continuity evidence bundlePASS: 0 blocking failures across BCP and DRP documentation checks, synthetic restore, HECVAT caveat check, and read-only health probesInternal drill record
2026-05-19Automated continuity evidence bundlePASS: 0 blocking failures across BCP and DRP documentation checks, synthetic restore, HECVAT caveat check, and read-only public health probes; no production writes performedInternal drill record

Each drill appends a row here. The automated runner verifies BCP sections, cross-reference resolution, and dependency-list parity with the sub-processors. Targeted restore drills, human tabletop sessions, and full live failover exercises are tracked here too with their own dated records.

Resumption

An "all clear" is declared by the Incident Commander when: 1. Service is restored. 2. Data integrity verified. 3. Root cause understood. 4. Short-term mitigation in place.

Long-term remediation is tracked in the post-mortem (see incident response).

Related


← Back to the trust center

showUpgradeModal: false, modalType: migration, planName: