Comparing Data Between Master and Slave on Mysql

Friday, February 4, 2011

Comparing Data Between Master and Slave on Mysql

In a replicated environment it is quite common that during the initial stages of setup, frequent replication breaks happen. The break happens because the Slave resulted differently from what was observed in the Master. This usually happens with statement-based replication which is due to many reasons like duplicate keys, partial success of a bulk update that was not encapsulated in a transactional clause, deadlocks and many more.

Often the solution is to skip the error and restart replication to the next position in the log. This will potentially catch off guard the unwary database administrator or developer until it has become too late - the master tables are no longer in sync with the slaves. This event is particularly a huge problem if the system,

1) can't be put offline,
2) have huge data set,
3) have slaves that are located remotely

So, the best thing to do is make sure that critical tables are always in Sync. The solution - comparing the slave proactively and regularly to detect anomalies even before they get too big to handle. And, in order to get an efficient comparison without having to expensively iterate over your records, you would only need two things:

Get the Number of Records per Table

Get the Checksum or Digest of the Table Derived from the Individual Checksums of Each Row Within the Table

Related Links Widget for Blogspot

No comments:

Post a Comment

Author Interests

Amazon.com Widgets

Parallel and Distributed Computing
Service-Oriented Architecture
Application Optimization
Network and Application Security
Process Automation
Data Warehousing
Data Visualization
Artificial Intelligence
Open Source Software

This blog is here because I realized that there is no better way to ensure knowledge assimilation for myself and education to the netizens about the things I discover, invent and learn from anything about computing, especially information technology in the cloud, system administration and anything interesting more than writing. Blogger is free so I don't have to worry about publication.

I would be writing mostly on open-source solutions to real-world IT and computing problems. I would also like to write on topics about simplifying, analyzing and aggregating sparsely distributed information, natural language and human behavior whenever possible. I will start by discussing concepts then theories proceeding on practical application or a proof with the aim to provide a model solution.

I'd also like to note that I do have substantial knowledge and experience with Microsoft products but as a matter of preference, I won't be discussing any of those things as long as I can avoid.

I might jump to other topics depending on my mood.

Friday, February 4, 2011