Resharding
Preparation #
Resharding enables you to both initially shard and reshard tables so that your keyspace is partitioned across several underlying tablets. A sharded keyspace has some additional restrictions on both query syntax and features such as auto_increment
, so it is helpful to plan out a reshard operation diligently. However, you can always reshard again if your sharding scheme turns out to be suboptimal.
Using our example commerce and customer keyspaces, lets work through the two most common issues.
Sequences #
The first issue to address is the fact that customer and corder have auto-increment columns. This scheme does not work well in a sharded setup. Instead, Vitess provides an equivalent feature through sequences.
The sequence table is an unsharded single row table that Vitess can use to generate monotonically increasing IDs. The syntax to generate an id is: select next :n values from customer_seq
. The vttablet that exposes this table is capable of serving a very large number of such IDs because values are cached and served out of memory. The cache value is configurable.
The VSchema allows you to associate a column of a table with the sequence table. Once this is done, an insert on that table transparently fetches an id from the sequence table, fills in the value, and routes the row to the appropriate shard. This makes the construct backward compatible to how MySQL’s auto_increment
property works.
Since sequences are unsharded tables, they will be stored in the commerce database. Here is the schema:
CREATE TABLE customer_seq (id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
INSERT INTO customer_seq (id, next_id, cache) VALUES (0, 1000, 100);
CREATE TABLE order_seq (id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
INSERT INTO order_seq (id, next_id, cache) VALUES (0, 1000, 100);
Note the vitess_sequence
comment in the create table statement. VTTablet will use this metadata to treat this table as a sequence.
id
is always 0next_id
is set to1000
: the value should be comfortably greater than theauto_increment
max value used so far.cache
specifies the number of values to cache before vttablet updatesnext_id
.
Larger cache values perform better, but will exhaust the values quicker, since during reparent operations the new master will start off at the next_id
value.
The VTGate servers also need to know about the sequence tables. This is done by updating the VSchema for commerce as follows:
{
"tables": {
"customer_seq": {
"type": "sequence"
},
"order_seq": {
"type": "sequence"
},
"product": {}
}
}
Vindexes #
The next decision is about the sharding keys, or Primary Vindexes. This is a complex decision that involves the following considerations:
- What are the highest QPS queries, and what are the
WHERE
clauses for them? - Cardinality of the column; it must be high.
- Do we want some rows to live together to support in-shard joins?
- Do we want certain rows that will be in the same transaction to live together?
Using the above considerations, in our use case, we can determine that:
- For the customer table, the most common
WHERE
clause usescustomer_id
. So, it shall have a Primary Vindex. - Given that it has lots of users, its cardinality is also high.
- For the corder table, we have a choice between
customer_id
andorder_id
. Given that our app joinscustomer
withcorder
quite often on thecustomer_id
column, it will be beneficial to choosecustomer_id
as the Primary Vindex for thecorder
table as well. - Coincidentally, transactions also update
corder
tables with their correspondingcustomer
rows. This further reinforces the decision to usecustomer_id
as Primary Vindex.
There are a couple of other considerations out of scope for now, but worth mentioning:
- It may also be worth creating a secondary lookup Vindex on
corder.order_id
. - Sometimes the
customer_id
is really atenant_id
. For example, your application is a SaaS, which serves tenants that themselves have customers. One key consideration here is that the sharding by thetenant_id
can lead to unbalanced shards. You may also need to consider sharding by the tenant’scustomer_id
.
Putting it all together, we have the following VSchema for customer
:
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
}
},
"tables": {
"customer": {
"column_vindexes": [
{
"column": "customer_id",
"name": "hash"
}
],
"auto_increment": {
"column": "customer_id",
"sequence": "customer_seq"
}
},
"corder": {
"column_vindexes": [
{
"column": "customer_id",
"name": "hash"
}
],
"auto_increment": {
"column": "order_id",
"sequence": "order_seq"
}
}
}
}
Since the primary vindex columns are BIGINT
, we choose hash
as the primary vindex, which is a pseudo-random way of distributing rows into various shards. For other data types:
- For
VARCHAR
columns, useunicode_loose_md5
orunicode_loose_xxhash
. - For
VARBINARY
, usebinary_md5
orxxhash
. - Vitess uses a plugin system to define vindexes. If none of the predefined vindexes suit your needs, you can develop your own custom vindex.
Apply VSchema #
Applying the new VSchema instructs Vitess that the keyspace is sharded, which may prevent some complex queries. It is a good idea to validate this before proceeding with this step. If you do notice that certain queries start failing, you can always revert temporarily by restoring the old VSchema. Make sure you fix all of the queries before proceeding to the Reshard process.
Using Helm #
helm upgrade vitess ../../helm/vitess/ -f 301_customer_sharded.yaml
Using Operator #
vtctlclient ApplySchema -sql="$(cat create_commerce_seq.sql)" commerce
vtctlclient ApplyVSchema -vschema="$(cat vschema_commerce_seq.json)" commerce
vtctlclient ApplyVSchema -vschema="$(cat vschema_customer_sharded.json)" customer
vtctlclient ApplySchema -sql="$(cat create_customer_sharded.sql)" customer
Using a Local Deployment #
vtctlclient ApplySchema -sql-file create_commerce_seq.sql commerce
vtctlclient ApplyVSchema -vschema_file vschema_commerce_seq.json commerce
vtctlclient ApplyVSchema -vschema_file vschema_customer_sharded.json customer
vtctlclient ApplySchema -sql-file create_customer_sharded.sql customer
Create new shards #
At this point, you have finalized your sharded VSchema and vetted all the queries to make sure they still work. Now, it’s time to reshard.
The resharding process works by splitting existing shards into smaller shards. This type of resharding is the most appropriate for Vitess. There are some use cases where you may want to bring up a new shard and add new rows in the most recently created shard. This can be achieved in Vitess by splitting a shard in such a way that no rows end up in the ‘new’ shard. However, it’s not natural for Vitess. We have to create the new target shards:
Using Helm #
helm upgrade vitess ../../helm/vitess/ -f 302_new_shards.yaml
Using Operator #
kubectl apply -f 302_new_shards.yaml
Make sure that you restart the port-forward after you have verified with kubectl get pods
that this operation has completed:
killall kubectl
./pf.sh &
Using a Local Deployment #
for i in 300 301 302; do
CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
SHARD=-80 CELL=zone1 KEYSPACE=customer TABLET_UID=$i ./scripts/vttablet-up.sh
done
for i in 400 401 402; do
CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-up.sh
SHARD=80- CELL=zone1 KEYSPACE=customer TABLET_UID=$i ./scripts/vttablet-up.sh
done
vtctlclient InitShardMaster -force customer/-80 zone1-300
vtctlclient InitShardMaster -force customer/80- zone1-400
Start the Reshard #
This process starts the reshard operation. It occurs online, and will not block any read or write operations to your database:
# With Helm and Local Installation
vtctlclient Reshard customer.cust2cust '0' '-80,80-'
# With Operator
vtctlclient Reshard customer.cust2cust '-' '-80,80-'
Validate Correctness #
After the reshard is complete, we can use VDiff to check data integrity and ensure our source and target shards are consistent:
vtctlclient VDiff customer.cust2cust
You should see output similar to the following:
Summary for customer: {ProcessedRows:5 MatchingRows:5 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}
Summary for corder: {ProcessedRows:5 MatchingRows:5 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}
Switch Reads #
After validating for correctness, the next step is to switch read operations to occur at the new location. By switching read operations first, we are able to verify that the new tablet servers are healthy and able to respond to requests:
vtctlclient SwitchReads -tablet_type=rdonly customer.cust2cust
vtctlclient SwitchReads -tablet_type=replica customer.cust2cust
Switch Writes #
After reads have been switched, and the health of the system has been verified, it’s time to switch writes. The usage is very similar to switching reads:
vtctlclient SwitchWrites customer.cust2cust
You should now be able to see the data that has been copied over to the new shards:
mysql --table < ../common/select_customer-80_data.sql
Using customer/-80
Customer
+-------------+--------------------+
| customer_id | email |
+-------------+--------------------+
| 1 | alice@domain.com |
| 2 | bob@domain.com |
| 3 | charlie@domain.com |
| 5 | eve@domain.com |
+-------------+--------------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 1 | 1 | SKU-1001 | 100 |
| 2 | 2 | SKU-1002 | 30 |
| 3 | 3 | SKU-1002 | 30 |
| 5 | 5 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
mysql --table < ../common/select_customer80-_data.sql
Using customer/80-
Customer
+-------------+----------------+
| customer_id | email |
+-------------+----------------+
| 4 | dan@domain.com |
+-------------+----------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 4 | 4 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
Cleanup #
After celebrating your second successful resharding, you are now ready to clean up the leftover artifacts:
Using Helm #
helm upgrade vitess ../../helm/vitess/ -f 306_down_shard_0.yaml
Using Operator #
kubectl apply -f 306_down_shard_0.yaml
Using a Local Deployment #
for i in 200 201 202; do
CELL=zone1 TABLET_UID=$i ./scripts/vttablet-down.sh
CELL=zone1 TABLET_UID=$i ./scripts/mysqlctl-down.sh
done
In this script, we just stopped all tablet instances for shard 0. This will cause all those vttablet and mysqld
processes to be stopped. But the shard metadata is still present. After Vitess brings down all vttablets, we can clean that up with this command:
vtctlclient DeleteShard -recursive customer/0
Beyond this, you will also need to manually delete the disk associated with this shard.