Making Schema Changes
Pages in this section
This user guide describes the problem space of schema changes and the various approaches you may use with Vitess.
Quick links:
- Vitess supports EXPERIMENTAL managed, online schema changes via
gh-ost
orpt-online-schema-change
, and with visibility and control over the migration process - Multiple approaches to unmanaged schema changes, either blocking, or owned by the user/DBA.
Some background on schema changes follows.
The schema change problem #
Schema change is one of the oldest problems in MySQL. With accelerated development and deployment flows, engineers find they need to deploy schema changes sometimes on a daily basis. With the growth of data this task becomes more and more difficult. A direct MySQL ALTER TABLE
statement is a blocking (no reads nor writes are possible on the migrated table) and resource heavy operation; variants of ALTER TABLE
include InnoDB
Online DDL, which allows for some concurrency on a primary
(aka master
) server, but still blocking on replicas, leading to unacceptable replication lags once the statement hits the replicas.
ALTER TABLE
operations are greedy, consume as much CPU/Disk IO as needed, are uninterruptible and uncontrollable. Once the operation has begun, it must run to completion; aborting an ALTER TABLE
may be more expensive than letting it run through, depending on the progress the migration has made.
Direct ALTER TABLE
is fine in development or possibly staging environments, where datasets are either small, or where table locking is acceptable.
ALTER TABLE solutions #
Busy production systems tend to use either of these two approaches, to make schema changes less disruptive to ongoing production traffic:
- Using online schema change tools, such as gh-ost and pt-online-schema-change. These tools emulate an
ALTER TABLE
statement by creating a ghost table in the new desired format, and slowly working through copying data from the existing table, while also applying ongoing changes throughout the migration. Online schema change tools can be throttled on high load, and can be interrupted at will. - Run the migration independently on replicas; when all replicas have the new schema, demote the
primary
and promote areplica
as the newprimary
; then, at leisure, run the migration on the demoted server. Two considerations if using this approach are:- Each migration requires a failover (aka successover, aka planned reparent).
- Total wall clock time is higher since we run the same migration in sequence on different servers.
Schema change cycle and operation #
The cycle of schema changes, from idea to production, is complex, involves multiple environments and possibly multiple teams. Below is one possible breakdown common in production. Notice how even interacting with the database itself takes multiple steps:
- Design: the developer designs a change, tests locally
- Publish: the developer calls for review of their changes (e.g. on a Pull Request)
- Review: developer’s colleagues and database engineers to check the changes and their impact
- Formalize: what is the precise
ALTER TABLE
statement to be executed? If running withgh-ost
orpt-online-schema-change
, what are the precise command line flags? - Locate: where does this change need to go? Which keyspace/cluster? Is this cluster sharded? What are the shards?
Having located the affected MySQL clusters, which is the
primary
server per cluster? - Schedule: is there an already running migration on the relevant keyspace/cluster(s)?
- Execute: invoke the command. In the time we waited, did the identity of
primary
servers change? - Audit/control: is the migration in progress? Do we need to abort for some reason?
- Cut-over/complete: a potential manual step to complete the migration process
- Cleanup: what do you do with the old tables? An immediate
DROP
is likely not advisable. What’s the alternative? - Notify user: let the developer know their changes are now in production.
- Deploy & merge: the developer completes their process.
Steps 4
- 10
are tightly coupled with the database or with the infrastrcture around the database.
Schema change and Vitess #
Vitess solves or automates multiple parts of the flow:
Formalize #
In managed, online schema changes the user supplies a valid SQL ALTER TABLE
statement, and Vitess generates the gh-ost
or pt-online-schema-change
command line invocation. It will also auto generate config files and set up the environment for those tools. This is hidden from the user.
Locate #
For a given table in a given keyspace, Vitess knows at all times:
- In which shards (MySQL clusters) the table is found
- Which is the
primary
server per shard.
When using either managed schema changes, or direct schema changes via vtctl
or vtgate
, Vitess resolves the discovery of the affected servers automatically, and this is hidden from the user.
Schedule #
In managed, online schema changes, Vitess owns and tracks all pending and active migrations. As a rule of thumb, it is generally advisable to only run one online schema change at a time on a given server. Following that rule of thumb, Vitess will queue incoming schema change requests and schedule them to run sequentially.
Execute #
In managed, online schema changes, Vitess owns the execution of gh-ost
or pt-online-schema-change
. While these run in the background, Vitess keeps track of the migratoin state.
In direct schema changes via vtctl
or vtgate
, Vitess issues a synchronous ALTER TABLE
statement on the relevant shards.
Audit/control #
In managed, online schema changes, Vitess keeps track of the state of the migration. It automatically detects when the migration is complete or has failed. It will detect failure even if the tablet itself, which is running the migration, fails. Vitess allows the user to cancel a migration. If such a migration is queued by the scheduler, then it is unqueued. If it’s already running, it is interrupted and aborted. Vitess allows the user to check on a migration status across the relevant shards.
Cut-over/complete #
Vitess runs automated cut-overs. The migration will complete as soon as it’s able to.
Cleanup #
In the case of managed, online schema changes via pt-online-schema-change
, Vitess will ensure to drop the triggers in case the tool failed to do so for whatever reason.
Vitess automatically garbage-collects the “old” tables, artifacts of gh-ost
and pt-online-schema-change
. It drops those tables in an incremental, non blocking method.
The various approaches #
Vitess allows a variety of approaches to schema changes, from fully automated to fully owned by the user.
- Managed, online schema changes are experimental at this time, but are Vitess’s way forward
- Direct, blocking ALTERs are generally impractical in production given that they can block writes for substantial lengths of time.
- User controlled migrations are allowed, and under the user’s responsibility.
See breakdown in managed, online schema changes and in unmanaged schema changes.