Photo by Markus Spiske on Unsplash
Updates, Order, & the Binlog
A theoretical exercise to realize the purpose of the binlog
Imagine you had two separate databases, a read-write master and a read-only slave, and you wanted to keep a table between them synchronized. Writes would occur on the master table and should eventually appear on the slave table. To synchronize the databases, you could just write a cron job which diffs the two tables and updates the slave so it properly reflects the master. Simple enough.
Let's start with a table populated in master but not in slave (the id is your primary key):
master_tbl
----------
id, letter
1 A
2 B
3 C
slave_tbl
----------
id, letter
So the script runs some INSERT .... ON DUPLICATE KEY UPDATE
type statement to update or insert the missing rows. You will end up with this:
master_tbl
----------
id, letter
1 A
2 B
3 C
slave_tbl
----------
id, letter
1 A
2 B
3 C
Now you run UPDATE master_tbl SET letter = 'A' WHERE id = 3 LIMIT
to get:
master_tbl
----------
id, letter
1 A
2 B
3 A
slave_tbl
----------
id, letter
1 A
2 B
3 C
Then when the sync runs:
master_tbl
----------
id, letter
1 A
2 B
3 A
slave_tbl
----------
id, letter
1 A
2 B
3 A
No problem.
Now lets go back to before the update statement where our tables looked like:
master_tbl
----------
id, letter
1 A
2 B
3 C
slave_tbl
----------
id, letter
1 A
2 B
3 C
Now let's introduce a very normal complication - a uniqueness constraint forcing letter to be unique. With our new constraint, UPDATE master_tbl SET letter = 'A' WHERE id = 3 LIMIT
would now fail. But we can perform a series of updates to achieve the same result.
UPDATE master_tbl SET letter = 'D' WHERE id = 3 LIMIT
UPDATE master_tbl SET letter = 'C' WHERE id = 1 LIMIT
UPDATE master_tbl SET letter = 'A' WHERE id = 3 LIMIT
Now we end up with a master_tbl which looks like:
master_tbl
----------
id, letter
1 A̷ C
2 B
3 C̷ D̷ A
Now when the syncer goes to update the slave. It tries to update row 1 to C - this would throw an error since C already exists in row 3.
To allow row 1 to update to C, you have to update row 3. So let's do the update on row 3 first, then row 1. However, row 3 cannot be updated to A since that will also throw a uniqueness error
And now you realize the crux of the issue. Without a uniqueness constraints our syncer could mindlessly update/insert slave rows such that they matched the master. But, now with the uniqueness constraint, such operations cannot occurs.
To circumvent this issue, perhaps you could use a transaction, and do all the updates at the same time to prevent any duplicate values, but languages like MySQL do not defer foreign key constraints (see here).
If somehow the syncer knew row 3 was first changed to D then row 1 to C, and then row 3 to A then the updates could happen. That is exactly what a binlog is - it records every transaction and then the slave just runs through the transaction in the same order that occurred on the master!
Running through this theoretical exercise made me realize why the binlog actually exits and how diffing two tables is not enough. Hopefully, you realized the same!