Part 09 of 13

Shipping Code

Three environments with promotion gates at each stage. A 7-stage CI pipeline. Auto-rollback on production failures. AI-powered diagnosis and auto-remediation.

~10 minute read

Three Environments

Code flows through three environments, each with its own database, auth instance, and third-party service configuration.

Environment	Branch	Database	Stripe	AI Calls	Alerts
Development	main	Dev project	Test mode	Real (dev keys)	Console only
Staging	staging	Staging project	Test mode	Real (dev keys)	Email
Production	prod	Production (scaled)	Live mode	Real (prod keys)	Email + PagerDuty

The 7-Stage CI Pipeline

Every push to main triggers a 7-stage pipeline. All stages are hard fails: nothing moves to staging unless every stage passes.

Stage	What Runs	Target Time
1. Install	pnpm install, dependency audit	<30s
2. Lint + Format	ESLint, Biome, import restrictions	<20s
3. Typecheck	TypeScript strict mode across all packages	<40s
4. Unit tests	Vitest: pure functions, validation, business logic	<60s
5. Integration tests	API routes with real DB (Neon), RLS verification, metering	<90s
6. E2E tests (Tier 1)	Critical user flows per app, Playwright headless	<120s
7. Security + Performance	npm audit, bundle size check, Lighthouse scores	<60s

Total pipeline target: under 7 minutes. If it exceeds 10 minutes, optimize before adding features.

Promotion Gates

Main to Staging

Promotion is a manual trigger (not automatic on every push). The orchestrator creates a PR from main to staging, and the full test suite re-runs against the staging environment:

All 7 CI stages re-run against staging database and services.
E2E Tier 2 tests added: extended flows, real email delivery verification, metering exhaustion scenarios.
Performance benchmarks compared against staging baselines.
Real AI extraction with timing validation (must complete within targets).

If any test fails, staging promotion is blocked. The failure enters the diagnosis flow.

Staging to Production

After staging passes completely:

PR created from staging to prod.
Deployment triggered to production environment.
7-point smoke test runs against production (health endpoints, auth flow, DB read, file access, AI health, email health, payment health).
SLO validation: query response times for the first 5 minutes, compare against targets.
Error monitoring: Sentry error rate compared against pre-deployment baseline. If error rate increases >50% or a new error type appears, trigger rollback.

Auto-Rollback

Production failures trigger instant rollback with zero human intervention:

Smoke test failure or error rate spike detected.
Deployment platform reverts all apps to previous production deployment instantly (zero-downtime, the old deployment stays warm).
Failure notification sent: email with deployment SHA, failure reason, error logs, monitoring links.
Issue created in the issue tracker with: full diagnosis, affected apps, Sentry link, recommended fix.
Diagnosis flow begins automatically.

Ideally, rollback never happens. The multi-layer testing in dev and staging should catch every issue before production. Rollback is the safety net, not the workflow. If rollbacks happen more than once per month, the test suite has gaps.

AI-Powered Diagnosis and Auto-Remediation

When any test fails (in dev, staging, or production), an AI agent on a separate machine diagnoses the failure, proposes a fix, and in some cases applies it automatically.

Three-Tier Auto-Fix

Tier	Fix Type	Action	Examples
1	Simple, safe	Auto-commit to main	Lint errors, missing imports, type mismatches, flaky test retry
2	Moderate, needs review	Create PR for human approval	API contract changes, schema fixes, dependency updates
3	Complex or risky	Create issue with diagnosis	Data integrity issues, security vulnerabilities, multi-service failures

The Diagnosis Flow

Failure detected (CI stage, staging test, production smoke test).
AI agent checks out the codebase on the external machine.
Reads the failure logs, error traces, and test output.
Identifies the root cause and categorizes it (Tier 1, 2, or 3).
For Tier 1: generates fix, runs tests locally, commits directly to main.
For Tier 2: generates fix, creates PR, requests human review.
For Tier 3: creates issue with diagnosis, recommended approach, and estimated complexity.
After fix is merged: re-triggers the promotion flow (max 3 diagnosis cycles before hard escalation).

Linear Example: Deployment Pipeline

Linear ships updates continuously. Their deployment pipeline uses canary deployments (internal dogfooding before general availability). When a new build is ready, Linear employees use it for their own project management first. If no issues surface after an internal testing window, it rolls out to all users.

For a solo founder, the staging environment serves the same purpose: it is your personal testing ground before customers see the code. The key difference is that your testing is automated, not manual.

08: Parallel Execution

10: Keeping It Running