Resharding

This guide follows on from the Get Started guides. Please make sure that you have an Operator, local or Helm installation ready. It also assumes that the MoveTables user guide has been followed.

Preparation #

Resharding enables you to both initially shard and reshard tables so that your keyspace is partitioned across several underlying tablets. A sharded keyspace has some additional restrictions on both query syntax and features such as auto_increment, so it is helpful to plan out a reshard operation diligently. However, you can always reshard again if your sharding scheme turns out to be suboptimal.

Using our example commerce and customer keyspaces, lets work through the two most common issues.

Sequences #

The first issue to address is the fact that customer and corder have auto-increment columns. This scheme does not work well in a sharded setup. Instead, Vitess provides an equivalent feature through sequences.

The sequence table is an unsharded single row table that Vitess can use to generate monotonically increasing IDs. The syntax to generate an id is: select next :n values from customer_seq. The vttablet that exposes this table is capable of serving a very large number of such IDs because values are cached and served out of memory. The cache value is configurable.

The VSchema allows you to associate a column of a table with the sequence table. Once this is done, an insert on that table transparently fetches an id from the sequence table, fills in the value, and routes the row to the appropriate shard. This makes the construct backward compatible to how MySQL’s auto_increment property works.

Since sequences are unsharded tables, they will be stored in the commerce database. Here is the schema:

CREATE TABLE customer_seq (id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
INSERT INTO customer_seq (id, next_id, cache) VALUES (0, 1000, 100);
CREATE TABLE order_seq (id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
INSERT INTO order_seq (id, next_id, cache) VALUES (0, 1000, 100);

Note the vitess_sequence comment in the create table statement. VTTablet will use this metadata to treat this table as a sequence.

  • id is always 0
  • next_id is set to 1000: the value should be comfortably greater than the auto_increment max value used so far.
  • cache specifies the number of values to cache before vttablet updates next_id.

Larger cache values perform better, but will exhaust the values quicker, since during reparent operations the new master will start off at the next_id value.

The VTGate servers also need to know about the sequence tables. This is done by updating the VSchema for commerce as follows:

{
  "tables": {
    "customer_seq": {
      "type": "sequence"
    },
    "order_seq": {
      "type": "sequence"
    },
    "product": {}
  }
}

Vindexes #

The next decision is about the sharding keys, or Primary Vindexes. This is a complex decision that involves the following considerations:

  • What are the highest QPS queries, and what are the WHERE clauses for them?
  • Cardinality of the column; it must be high.
  • Do we want some rows to live together to support in-shard joins?
  • Do we want certain rows that will be in the same transaction to live together?

Using the above considerations, in our use case, we can determine that:

  • For the customer table, the most common WHERE clause uses customer_id. So, it shall have a Primary Vindex.
  • Given that it has lots of users, its cardinality is also high.
  • For the corder table, we have a choice between customer_id and order_id. Given that our app joins customer with corder quite often on the customer_id column, it will be beneficial to choose customer_id as the Primary Vindex for the corder table as well.
  • Coincidentally, transactions also update corder tables with their corresponding customer rows. This further reinforces the decision to use customer_id as Primary Vindex.

There are a couple of other considerations out of scope for now, but worth mentioning:

  • It may also be worth creating a secondary lookup Vindex on corder.order_id.
  • Sometimes the customer_id is really a tenant_id. For example, your application is a SaaS, which serves tenants that themselves have customers. One key consideration here is that the sharding by the tenant_id can lead to unbalanced shards. You may also need to consider sharding by the tenant’s customer_id.

Putting it all together, we have the following VSchema for customer:

{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "customer": {
      "column_vindexes": [
        {
          "column": "customer_id",
          "name": "hash"
        }
      ],
      "auto_increment": {
        "column": "customer_id",
        "sequence": "customer_seq"
      }
    },
    "corder": {
      "column_vindexes": [
        {
          "column": "customer_id",
          "name": "hash"
        }
      ],
      "auto_increment": {
        "column": "order_id",
        "sequence": "order_seq"
      }
    }
  }
}

Since the primary vindex columns are BIGINT, we choose hash as the primary vindex, which is a pseudo-random way of distributing rows into various shards. For other data types:

  • For VARCHAR columns, use unicode_loose_md5 or unicode_loose_xxhash.
  • For VARBINARY, use binary_md5 or xxhash.
  • Vitess uses a plugin system to define vindexes. If none of the predefined vindexes suit your needs, you can develop your own custom vindex.

Apply VSchema #

Applying the new VSchema instructs Vitess that the keyspace is sharded, which may prevent some complex queries. It is a good idea to validate this before proceeding with this step. If you do notice that certain queries start failing, you can always revert temporarily by restoring the old VSchema. Make sure you fix all of the queries before proceeding to the Reshard process.

Using Helm #

helm upgrade vitess ../../helm/vitess/ -f 301_customer_sharded.yaml

Using Operator #

vtctlclient ApplySchema -sql="$(cat create_commerce_seq.sql)" commerce
vtctlclient ApplyVSchema -vschema="$(cat vschema_commerce_seq.json)" commerce
vtctlclient ApplyVSchema -vschema="$(cat vschema_customer_sharded.json)" customer
vtctlclient ApplySchema -sql="$(cat create_customer_sharded.sql)" customer

Using a Local Deployment #

vtctlclient ApplySchema -sql-file create_commerce_seq.sql commerce
vtctlclient ApplyVSchema -vschema_file vschema_commerce_seq.json commerce
vtctlclient ApplyVSchema -vschema_file vschema_customer_sharded.json customer
vtctlclient ApplySchema -sql-file create_customer_sharded.sql customer

Create new shards #

At this point, you have finalized your sharded VSchema and vetted all the queries to make sure they still work. Now, it’s time to reshard.

The resharding process works by splitting existing shards into smaller shards. This type of resharding is the most appropriate for Vitess. There are some use cases where you may want to bring up a new shard and add new rows in the most recently created shard. This can be achieved in Vitess by splitting a shard in such a way that no rows end up in the ‘new’ shard. However, it’s not natural for Vitess. We have to create the new target shards:

Using Helm #

helm upgrade vitess ../../helm/vitess/ -f 302_new_shards.yaml

Using Operator #

kubectl apply -f 302_new_shards.yaml

Make sure that you restart the port-forward after you have verified with kubectl get pods that this operation has completed:

killall kubectl
./pf.sh &

Using a Local Deployment #

for i in 300 301 302; do
 CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
 SHARD=-80 CELL=zone1 KEYSPACE=customer TABLET_UID=$i ./scripts/vttablet-up.sh
done

for i in 400 401 402; do
 CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
 SHARD=80- CELL=zone1 KEYSPACE=customer TABLET_UID=$i ./scripts/vttablet-up.sh
done

vtctlclient InitShardMaster -force customer/-80 zone1-300
vtctlclient InitShardMaster -force customer/80- zone1-400

Start the Reshard #

This process starts the reshard operation. It occurs online, and will not block any read or write operations to your database:

# With Helm and Local Installation
vtctlclient Reshard customer.cust2cust '0' '-80,80-'
# With Operator
vtctlclient Reshard customer.cust2cust '-' '-80,80-'

Validate Correctness #

After the reshard is complete, we can use VDiff to check data integrity and ensure our source and target shards are consistent:

vtctlclient VDiff customer.cust2cust

You should see output similar to the following:

Summary for customer: {ProcessedRows:5 MatchingRows:5 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}
Summary for corder: {ProcessedRows:5 MatchingRows:5 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}

Switch Reads #

After validating for correctness, the next step is to switch read operations to occur at the new location. By switching read operations first, we are able to verify that the new tablet servers are healthy and able to respond to requests:

vtctlclient SwitchReads -tablet_type=rdonly customer.cust2cust
vtctlclient SwitchReads -tablet_type=replica customer.cust2cust

Switch Writes #

After reads have been switched, and the health of the system has been verified, it’s time to switch writes. The usage is very similar to switching reads:

vtctlclient SwitchWrites customer.cust2cust

You should now be able to see the data that has been copied over to the new shards:

mysql --table < ../common/select_customer-80_data.sql
Using customer/-80
Customer
+-------------+--------------------+
| customer_id | email              |
+-------------+--------------------+
|           1 | alice@domain.com   |
|           2 | bob@domain.com     |
|           3 | charlie@domain.com |
|           5 | eve@domain.com     |
+-------------+--------------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku      | price |
+----------+-------------+----------+-------+
|        1 |           1 | SKU-1001 |   100 |
|        2 |           2 | SKU-1002 |    30 |
|        3 |           3 | SKU-1002 |    30 |
|        5 |           5 | SKU-1002 |    30 |
+----------+-------------+----------+-------+

mysql --table < ../common/select_customer80-_data.sql
Using customer/80-
Customer
+-------------+----------------+
| customer_id | email          |
+-------------+----------------+
|           4 | dan@domain.com |
+-------------+----------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku      | price |
+----------+-------------+----------+-------+
|        4 |           4 | SKU-1002 |    30 |
+----------+-------------+----------+-------+

Cleanup #

After celebrating your second successful resharding, you are now ready to clean up the leftover artifacts:

Using Helm #

helm upgrade vitess ../../helm/vitess/ -f 306_down_shard_0.yaml

Using Operator #

kubectl apply -f 306_down_shard_0.yaml

Using a Local Deployment #

for i in 200 201 202; do
 CELL=zone1 TABLET_UID=$i ./scripts/vttablet-down.sh
 CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-down.sh
done

In this script, we just stopped all tablet instances for shard 0. This will cause all those vttablet and mysqld processes to be stopped. But the shard metadata is still present. After Vitess brings down all vttablets, we can clean that up with this command:

vtctlclient DeleteShard -recursive customer/0

Beyond this, you will also need to manually delete the disk associated with this shard.