Upgrades & Migrations

Vulcan is designed for zero-downtime upgrades with automatic database migrations.

How Upgrades Work

┌─────────────────────────────────────────────────────────────┐
│                    Upgrade Flow                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. New version pushed to main branch                       │
│        ↓                                                    │
│  2. GitHub Actions builds Docker image                      │
│        ↓                                                    │
│  3. ECS starts new task with new image                      │
│        ↓                                                    │
│  4. New task runs migrations automatically                  │
│        ↓                                                    │
│  5. Health check passes → traffic routes to new task        │
│        ↓                                                    │
│  6. Old tasks drain connections gracefully                  │
│        ↓                                                    │
│  7. Old tasks terminate                                     │
│                                                             │
│  ✅ Database persists throughout — no data loss             │
└─────────────────────────────────────────────────────────────┘

Database Migrations

Vulcan uses a sequential migration system that runs automatically on startup.

How It Works

Version tracking — Each migration has a unique version number
Tracked in database — Applied migrations are recorded in a migrations table
Transactional — Each migration runs in a transaction (automatic rollback on error)
Idempotent — Safe to run multiple times (won't re-apply)

Migration Log

On startup, you'll see:

INF connecting to postgres...
INF postgres connected successfully
INF running database migrations...
INF Applying migration version=1 description="Add relay_tokens table"
INF Migration applied version=1 duration=45ms
INF Applying migration version=2 description="Add threat_intel_config table"
INF Migration applied version=2 duration=12ms
INF migrations complete

Checking Migration Status

SELECT version, description, applied_at, duration_ms 
FROM migrations 
ORDER BY version;

Data Persistence

What Survives Upgrades

Data	Storage	Preserved
Tenants & users	RDS PostgreSQL	✅ Yes
Discovered assets (nodes/edges)	RDS PostgreSQL	✅ Yes
Compliance findings	RDS PostgreSQL	✅ Yes
Credentials (encrypted)	RDS PostgreSQL	✅ Yes
Audit logs	RDS PostgreSQL	✅ Yes
Generated documents	RDS + S3	✅ Yes
Scheduled jobs	RDS PostgreSQL	✅ Yes
License information	RDS PostgreSQL	✅ Yes

What Reconnects Automatically

Component	Behavior
WebSocket connections	Clients auto-reconnect
Relay connectors	Auto-reconnect with backoff
Agent heartbeats	Resume on next interval
Scheduled scans	Continue from schedule

ECS Deployment Details

Rolling Update Strategy

deploymentConfiguration:
  maximumPercent: 200
  minimumHealthyPercent: 100

This means:

New tasks start before old tasks stop
At least 100% capacity maintained during deploy
Up to 200% capacity during transition

Health Checks

ECS waits for health checks before routing traffic:

healthCheck:
  command: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
  interval: 30
  timeout: 5
  retries: 3
  startPeriod: 60

Connection Draining

ALB drains connections gracefully:

New requests → new tasks
In-flight requests → complete on old tasks
Default drain time: 300 seconds

Rollback Procedures

Automatic Rollback

If a new task fails health checks, ECS automatically:

Stops the failing task
Keeps old tasks running
Reports deployment failure

Manual Rollback

Option 1: Revert to previous image

# Deploy previous version
aws ecs update-service \
  --cluster vulcan-prod \
  --service vulcan-prod \
  --task-definition vulcan-prod:PREVIOUS_REVISION \
  --force-new-deployment

Option 2: Database migration rollback

# SSH to container
aws ecs execute-command --cluster vulcan-prod --task TASK_ID --command "/bin/sh"

# Roll back last N migrations
vulcan migrate down 2

Option 3: Point-in-time recovery

# Restore RDS to specific time
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier vulcan-prod \
  --target-db-instance-identifier vulcan-prod-restored \
  --restore-time 2026-04-10T12:00:00Z

Migration Best Practices

Safe Migration Patterns

✅ Safe	❌ Avoid
`ADD COLUMN`	`DROP COLUMN`
`CREATE TABLE IF NOT EXISTS`	`DROP TABLE`
`ADD INDEX CONCURRENTLY`	`ALTER COLUMN TYPE`
New nullable columns	Renaming columns

Adding a New Migration

Add to internal/storage/postgres/migrations.go:

{
    Version:     6,
    Description: "Add new_feature table",
    Up: func(ctx context.Context, tx pgx.Tx) error {
        _, err := tx.Exec(ctx, `
            CREATE TABLE IF NOT EXISTS new_feature (
                id TEXT PRIMARY KEY,
                tenant_id TEXT NOT NULL,
                created_at TIMESTAMPTZ DEFAULT NOW()
            );
            CREATE INDEX IF NOT EXISTS idx_new_feature_tenant 
            ON new_feature(tenant_id);
        `)
        return err
    },
    Down: func(ctx context.Context, tx pgx.Tx) error {
        _, err := tx.Exec(ctx, `DROP TABLE IF EXISTS new_feature;`)
        return err
    },
},

Test locally:

# Apply migration
go run ./cmd/vulcan serve --db-type postgres --db-url "$DB_URL"

# Verify
psql $DB_URL -c "SELECT * FROM migrations ORDER BY version;"

Monitoring Upgrades

CloudWatch Metrics

Monitor during deployments:

ECS/CPUUtilization — Should stay stable
ECS/MemoryUtilization — Should stay stable
ALB/HealthyHostCount — Should not drop to zero
ALB/UnHealthyHostCount — Should return to zero
RDS/DatabaseConnections — May spike briefly

Deployment Alerts

Set up alerts for:

ECS Deployment State = FAILED
ALB HealthyHostCount < 1
RDS DatabaseConnections > 80% of max

Scheduled Maintenance Windows

For major version upgrades that require extended migrations:

Announce maintenance — Notify users in advance
Scale down — Reduce to single task
Run migrations — May take longer for large datasets
Verify — Check migration status and data integrity
Scale up — Return to normal capacity
Monitor — Watch metrics for 30 minutes

FAQ

Do I need to stop the service for upgrades?

No. The rolling deployment strategy ensures zero downtime for normal upgrades.

What if a migration fails?

The migration runs in a transaction. On failure:

Transaction rolls back automatically
Task fails health check
ECS keeps old tasks running
No data is modified

How long do migrations take?

Most migrations complete in under 1 second. Large table alterations may take longer, but these are rare and announced in release notes.

Can I skip migrations?

No. Migrations are required and run automatically. Skipping would cause schema mismatches and errors.

How do I check what migrations have run?

SELECT * FROM migrations ORDER BY version;

Or via API:

GET /api/v1/admin/migrations

How Upgrades Work​

Database Migrations​

How It Works​

Migration Log​

Checking Migration Status​

Data Persistence​

What Survives Upgrades​

What Reconnects Automatically​

ECS Deployment Details​

Rolling Update Strategy​

Health Checks​

Connection Draining​

Rollback Procedures​

Automatic Rollback​

Manual Rollback​

Migration Best Practices​

Safe Migration Patterns​

Adding a New Migration​

Monitoring Upgrades​

CloudWatch Metrics​

Deployment Alerts​

Scheduled Maintenance Windows​

FAQ​

Do I need to stop the service for upgrades?​

What if a migration fails?​

How long do migrations take?​

Can I skip migrations?​

How do I check what migrations have run?​