Database Migrations — Zero-Downtime SQL, Alembic & Schema Evolution (2026)
Skip to main content BACKEND ARCHITECTURE MASTERY ⏱️ 18 min read Series: Logic & Legacy Day 12 / 30 Level: Senior Architecture ⏳ Context: In Day 11, we perfected our relational models. But applications evolve. Marketing demands a new column; security demands an index. If you manually run ALTER TABLE in your production database, you are playing Russian Roulette. If the command locks the table during peak traffic, your entire API goes offline. Your Python codebase is tracked immutably in Git, but your database schema often lives in the wild west. To fix this, we use Database Migrations. While the Python ecosystem uses Alembic to manage these migrations, a Senior Architect does not blindly trust Python ORM abstractions. You must understand the exact Raw SQL (Data Definition Language - DDL) that Alembic generates. Today, we architect schema evolution entirely through the lens of the Postgres SQL engine. ▶ Table of Contents 🕉️ (Click to Expand) The Paradigm: DDL as Versioned Code The Architect's Zero-Downtime Protocol (The 3-Phase SQL Shift) Concurrent Indexing: Escaping the Table Lock The ENUM Dilemma: Managing Custom Postgres Types SQLite Batch Mode: The Secret Table Rebuild Day 12 Project: The Advanced Alembic Arsenal FAQ: Migration Architecture Database Resources "A database schema is not a destination; it is a timeline. You cannot blindly teleport to the future; you must build a mathematically safe bridge to get there." Developers often ask: "If I update my SQLAlchemy models, why doesn't the database just update itself?" Because defining state is easy; mutating state is dangerous. If you run SQLAlchemy's Base.metadata.create_all(), it issues CREATE TABLE IF NOT EXISTS commands. But it cannot safely issue ALTER TABLE commands. It doesn't know what to do with the millions of rows of existing data when you change a column type. Alembic works by generating a sequential timeline of Python scripts. When you run alembic upgrade head, Alembic checks a tiny Postgres table called alembic_version, finds out it is currently on Revision 4, and sequentially executes the raw SQL for Revisions 5, 6, and 7. The Scenario: You have a users table with a full_name column. You have 10 million rows. Product wants to split it into first_name and last_name. If you issue ALTER TABLE users DROP COLUMN full_name; and add the new columns, your live API—which is still running the old Python code—will instantly crash with a 500 Internal Server Error because it expects full_name to exist. To safely mutate data structures in a live environment, architects execute a multi-phase deployment using explicit SQL steps. Phase 1 & 2: The Additive Migration & Data Sync (Alembic SQL) def upgrade(): # PHASE 1: Add new columns as NULLABLE. # UNDER THE HOOD SQL: # ALTER TABLE users ADD COLUMN first_name VARCHAR(100); # ALTER TABLE users ADD COLUMN last_name VARCHAR(100); op.add_column('users', sa.Column('first_name', sa.String(100), nullable=True)) op.add_column('users', sa.Column('last_name', sa.String(100), nullable=True)) # PHASE 2: Data ETL directly inside the database engine. # Executing raw SQL here is exponentially faster than pulling 10M rows into Python. op.execute(""" UPDATE users SET first_name = split_part(full_name, ' ', 1), last_name = substring(full_name from position(' ' in full_name) + 1) WHERE full_name IS NOT NULL AND first_name IS NULL; """) # PHASE 3: Enforcing Integrity on the disk level. # UNDER THE HOOD SQL: # ALTER TABLE users ALTER COLUMN first_name SET NOT NULL; op.alter_column('users', 'first_name', nullable=False) op.alter_column('users', 'last_name', nullable=False) We DO NOT drop the full_name column in Phase 1. We deploy the updated Python API that reads/writes exclusively to the new columns. Once we are 100% certain the old API containers are destroyed, we issue a second migration (weeks later) to safely execute ALTER TABLE users DROP COLUMN full_name;. If you run CREATE INDEX idx_users_email ON users(email); on a 50GB table, Postgres acquires a ShareLock. This blocks all INSERT, UPDATE, and DELETE operations until the index finishes building (which could take 20 minutes). Your API is effectively dead. Postgres provides a savior: CONCURRENTLY. It builds the index in the background without locking writes. However, it cannot be run inside a transaction block (BEGIN ... COMMIT), which Alembic uses by default. Zero-Downtime Indexing (Raw SQL via Alembic) def upgrade(): # CRITICAL: Step outside the default Alembic transaction block! with op.get_context().autocommit_block(): # UNDER THE HOOD SQL: # CREATE INDEX CONCURRENTLY idx_users_last_name ON users (last_name); op.create_index( 'idx_users_last_name', 'users', ['last_name'], postgresql_concurrently=True ) Unlike simple VARCHAR columns, Postgres handles ENUM types as first-class database objects. You cannot simply attach an enum to a column; you must run a CREATE TYPE statement first. Alembic's --autogenerate frequently fails to deduce this properly. Safe Postgres ENUM Creation def upgrade(): from sqlalchemy.dialects import postgresql # UNDER THE HOOD SQL 1: # CREATE TYPE user_status_enum AS ENUM ('active', 'suspended', 'banned'); status_enum = postgresql.ENUM('active', 'suspended', 'banned', name='user_status_enum') status_enum.create(op.get_bind()) # UNDER THE HOOD SQL 2: # ALTER TABLE users ADD COLUMN status user_status_enum NOT NULL DEFAULT 'active'; op.add_column('users', sa.Column( 'status', status_enum, server_default='active', nullable=False )) If you are developing locally with SQLite, you will eventually hit a wall: SQLite does not fully support ALTER TABLE DROP COLUMN. How does Alembic handle drops in SQLite? READ MORE HERE : https://logicandlegacy.blogspot.com
