Vitess | A database clustering system for horizontal scaling of MySQL

Vitess 4.0 has been released!

Head to the release notes for an overview of functionality added or changed, as well as important changes for those upgrading from earlier releases.

We wanted to use this post as an opportunity to reflect on three aspects of Vitess’ development, as it has evolved from an internal project at YouTube to a CNCF project with almost 200 contributors.

Improved SQL Query Support #

Vitess 4.0 takes a big leap forward in improving the coverage of MySQL syntax that is supported. Vitess can now support bulk statements, such as an insert that may need to cross shard boundaries.

A broader range of SELECT statements is also supported, including support for distinct aggregate queries such as COUNT(DISTINCT …). You can also modify Vitess itself via SQL with ALTER VSCHEMA.

We have also started testing Vitess with common applications and frameworks, and adding failures to our test suite. The goal is to make it possible to move from a monolithic MySQL or MariaDB to sharded Vitess, without the application having to know about it.

Improvements to Usability #

Vitess was created to solve a major challenge - scaling the databases of YouTube during its massive growth. In solving YouTube’s needs, Vitess also added a number of features that help you run databases better in production. The downside however, is that with internal development there is not always enough focus on the onboarding experience of new users.

With Vitess 4.0, we have worked on several initiatives to make it easier for new users:

We’ve worked on refining the getting started tutorials for local development, Kubernetes and Vagrant.
Less configuration is required to get started, as the MySQL version can in many cases be auto-detected.
A number of error messages have been refined, and focus has been placed on making sure they are at the correct log-level (error, warning or info).

This will continue to be a focus as we head into the development of Vitess 5.0.

Experimental Support for VReplication #

One of the downsides of sharding is that you can be forced to make trade-offs. For example in an ecommerce platform where you have buyers and sellers, you could choose to shard by buyers, in which case querying by seller in the sharded system may become much slower.

For applications that need to query efficiently by both buyer and seller, VReplication provides a way to subscribe to changes made on each shard (using the MySQL binary log), and keep redundant copies of key data available on other shards. You can think of this feature as similar to materialized views available in other commercial databases.

Looping back to SQL support, Vitess 4.0 also supports the concept of table equivalence. This means that you can instruct Vitess to read from either the original table or the VReplication materialized view: whichever makes the query execute faster.

–

This was just a quick overview of some of the new features in Vitess 4.0, but again we encourage you to check out the release notes which go into additional detail.

Starting with Vitess 5.0 we will be transitioning to a 12 week release cycle. We welcome your contributions. Please check out the open issues in vitessio/vitess and vitessio/website (docs) and join the Vitess slack channel.