Part 09 of 13

Shipping Code

Three environments with promotion gates at each stage. A 7-stage CI pipeline. Auto-rollback on production failures. AI-powered diagnosis and auto-remediation.

~10 minute read

Three Environments

Code flows through three environments, each with its own database, auth instance, and third-party service configuration.

Development main branch All tests run on every push Staging staging branch Full re-run + perf benchmarks Production prod branch Smoke tests + auto-rollback promote promote
EnvironmentBranchDatabaseStripeAI CallsAlerts
DevelopmentmainDev projectTest modeReal (dev keys)Console only
StagingstagingStaging projectTest modeReal (dev keys)Email
ProductionprodProduction (scaled)Live modeReal (prod keys)Email + PagerDuty

The 7-Stage CI Pipeline

Every push to main triggers a 7-stage pipeline. All stages are hard fails: nothing moves to staging unless every stage passes.

StageWhat RunsTarget Time
1. Installpnpm install, dependency audit<30s
2. Lint + FormatESLint, Biome, import restrictions<20s
3. TypecheckTypeScript strict mode across all packages<40s
4. Unit testsVitest: pure functions, validation, business logic<60s
5. Integration testsAPI routes with real DB (Neon), RLS verification, metering<90s
6. E2E tests (Tier 1)Critical user flows per app, Playwright headless<120s
7. Security + Performancenpm audit, bundle size check, Lighthouse scores<60s

Total pipeline target: under 7 minutes. If it exceeds 10 minutes, optimize before adding features.

Promotion Gates

Main to Staging

Promotion is a manual trigger (not automatic on every push). The orchestrator creates a PR from main to staging, and the full test suite re-runs against the staging environment:

If any test fails, staging promotion is blocked. The failure enters the diagnosis flow.

Staging to Production

After staging passes completely:

  1. PR created from staging to prod.
  2. Deployment triggered to production environment.
  3. 7-point smoke test runs against production (health endpoints, auth flow, DB read, file access, AI health, email health, payment health).
  4. SLO validation: query response times for the first 5 minutes, compare against targets.
  5. Error monitoring: Sentry error rate compared against pre-deployment baseline. If error rate increases >50% or a new error type appears, trigger rollback.

Auto-Rollback

Production failures trigger instant rollback with zero human intervention:

  1. Smoke test failure or error rate spike detected.
  2. Deployment platform reverts all apps to previous production deployment instantly (zero-downtime, the old deployment stays warm).
  3. Failure notification sent: email with deployment SHA, failure reason, error logs, monitoring links.
  4. Issue created in the issue tracker with: full diagnosis, affected apps, Sentry link, recommended fix.
  5. Diagnosis flow begins automatically.
Ideally, rollback never happens. The multi-layer testing in dev and staging should catch every issue before production. Rollback is the safety net, not the workflow. If rollbacks happen more than once per month, the test suite has gaps.

AI-Powered Diagnosis and Auto-Remediation

When any test fails (in dev, staging, or production), an AI agent on a separate machine diagnoses the failure, proposes a fix, and in some cases applies it automatically.

Three-Tier Auto-Fix

TierFix TypeActionExamples
1Simple, safeAuto-commit to mainLint errors, missing imports, type mismatches, flaky test retry
2Moderate, needs reviewCreate PR for human approvalAPI contract changes, schema fixes, dependency updates
3Complex or riskyCreate issue with diagnosisData integrity issues, security vulnerabilities, multi-service failures

The Diagnosis Flow

  1. Failure detected (CI stage, staging test, production smoke test).
  2. AI agent checks out the codebase on the external machine.
  3. Reads the failure logs, error traces, and test output.
  4. Identifies the root cause and categorizes it (Tier 1, 2, or 3).
  5. For Tier 1: generates fix, runs tests locally, commits directly to main.
  6. For Tier 2: generates fix, creates PR, requests human review.
  7. For Tier 3: creates issue with diagnosis, recommended approach, and estimated complexity.
  8. After fix is merged: re-triggers the promotion flow (max 3 diagnosis cycles before hard escalation).
Linear Example: Deployment Pipeline

Linear ships updates continuously. Their deployment pipeline uses canary deployments (internal dogfooding before general availability). When a new build is ready, Linear employees use it for their own project management first. If no issues surface after an internal testing window, it rolls out to all users.

For a solo founder, the staging environment serves the same purpose: it is your personal testing ground before customers see the code. The key difference is that your testing is automated, not manual.