Business continuity plan

Last reviewed 2026-05-19

Objective

Maintain the HiringCoachAI service for users and customers during disruption to people, technology, or third-party services. Keep customer data safe and recoverable throughout.

Business impact

Function	Tolerable downtime	Tolerable data loss
Authentication (signin, signup)	4 h	0
Customer-facing web app	4 h	0
Resume / cover-letter generation (AI)	8 h	0
Payment processing (Stripe)	4 h	0
Transactional email (SendGrid)	8 h	0
Admin tooling	24 h	0
Analytics ingestion	7 d	7 d

These inform the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) in disaster recovery plan.

Key dependencies

Dependency	Role	Failure impact	Alternative
Vercel	Application hosting	App unreachable	Fail over to Firebase Hosting (static) + direct Cloud Functions (documented in DRP)
Firebase Auth	Authentication	No signin	No alternative (re-provision in another GCP project in worst case)
Firestore	Primary database	No reads/writes	Restore from backup to new Firestore project
Cloud Storage	Backup/export buckets and Cloud Functions source buckets; user file uploads are not live in production yet	Backup/export workflows affected; live user workflows continue unless they depend on restore/export operations	Alternate bucket in different region
Stripe	Payments	No new subscriptions; existing charges continue	Payments pause with customer notice
SendGrid	Email	Magic-link signin & notifications fail; in-app still works	Failover to AWS SES or Resend (requires DNS change + template migration)
OpenAI (called both directly and via Vercel AI Gateway)	AI generation	AI features degraded	Vercel AI Gateway provides a routing layer to alternate AI providers; failover would require code/config changes plus DPA execution and sub-processor inventory update for any new provider before production traffic flows
Deepgram	Transcription	Voice features disabled	Queue for later or switch to Google STT
ElevenLabs	Speech synthesis	Voice features disabled	Google TTS fallback already wired
DNS (Namecheap/Cloudflare)	Resolution	Site offline	Alternate registrar record + longer TTL

Activation triggers

The Security Officer activates the BCP when any of:

Customer-facing downtime exceeds 1 h
Data loss is confirmed or suspected
Security incident at Severity 1 or 2 (per incident response)
Loss of access to production systems
Natural disaster or personnel incident affecting the Security Officer

Response structure

Role	Responsibility
Incident Commander	Security Officer. Coordinates response, declares severity, authorizes customer comms.
Technical Lead	Diagnoses, orchestrates recovery.
Communications Lead	Updates status page, sends customer comms. Often same person as IC in small-team mode.

Contact info and escalation paths are maintained in an internal on-call register; the current Incident Commander is reachable at [email protected].

Communications

Status page: hiringcoach.ai/status: updated within 30 min of declared incident.
Customer email: Sent via backup SES/Resend channel if SendGrid is the failing dependency.
Regulatory: Per breach notification if personal data is affected.

Continuity of operations

All source of truth lives in git (policies, config, code).
Firestore PITR, managed daily Firestore backups with 98-day retention, and US multi-region backup/export bucket evidence are retained in the restricted continuity evidence set.
Business-continuity recovery relies on documented continuity procedures, recovery materials version-controlled in git, and the Incident Commander role defined in this plan.

Testing

Quarterly: Tabletop scenario review (15 min, Security Officer with any additional engineer available).
Annually: Full restore drill (logged in DRP).
Post-incident: Plan update within 30 days.
Weekly / on demand: an automated continuity evidence bundle runs on a scheduled workflow or on demand. It exercises BCP/DRP documentation reviews, a synthetic safe restore drill, public health probes, and optional read-only Google Cloud backup checks when credentials are available; results are archived in the internal drill evidence set. This complements (does not replace) the human tabletop and full live restore/failover drills.

Last-drill log

Date	Scope	Outcome	Record
2026-04-24	Targeted Firestore PITR restore to staging (single collection, DRP-linked continuity restore drill)	PASS: partial-scope RTO 47 min; RPO 30 min	Internal drill record
2026-05-02	Documentation-review tabletop (automated)	PASS: 23/23 checks	Internal drill record
2026-05-14	Automated continuity evidence bundle	PASS: 0 blocking failures across BCP and DRP documentation checks, synthetic restore, HECVAT caveat check, and read-only health probes	Internal drill record
2026-05-19	Automated continuity evidence bundle	PASS: 0 blocking failures across BCP and DRP documentation checks, synthetic restore, HECVAT caveat check, and read-only public health probes; no production writes performed	Internal drill record

Each drill appends a row here. The automated runner verifies BCP sections, cross-reference resolution, and dependency-list parity with the sub-processors. Targeted restore drills, human tabletop sessions, and full live failover exercises are tracked here too with their own dated records.

Resumption

An "all clear" is declared by the Incident Commander when: 1. Service is restored. 2. Data integrity verified. 3. Root cause understood. 4. Short-term mitigation in place.

Long-term remediation is tracked in the post-mortem (see incident response).

disaster recovery plan: technical recovery procedures
incident response: severity levels, comms, post-mortems
breach notification: regulatory timelines

← Back to the trust center