Recovering from corrupted InnoDB Pages
I recently encountered an issue with various InnoDB pages becoming corrupted on the database that plays host to my JIRA install. It was - to some extent - a mess of my own making for mixing production and development databases (or more precisely, for hosting that production database on a dev machine).
Lesson learnt, sure, but I still needed to address the issue so that I could get JIRA up and running again.
This documentation details the steps to follow - it won't resolve every case of corruption, but it resolved the issues I was seeing
How Do You Know It's Corrupted Pages
There are two signs that you're suffering from corrupted pages, one concrete, one pretty incidental:
- Low MySQL uptime
- Log Entries
The former occurs because the MySQL process will have to restart every time it encounters the corruption. It also writes a page dump out to the logs, so on a busy server your logs get big quite quickly.
The key give-away though, is in the MySQL error log (or the main log if you haven't defined a separate error log). Mine was at /var/log/mysqld.log
140129 23:53:50 InnoDB: Page checksum 3244463932, prior-to-4.0.14-form checksum 3365547667
InnoDB: stored checksum 1349822260, prior-to-4.0.14-form stored checksum 3365547667
InnoDB: Page lsn 0 95653297, low 4 bytes of lsn at page end 95653297
InnoDB: Page number (if stored to page already) 568,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Page may be an index page where index id is 0 369
InnoDB: (index "cl_username" of table "JiraDB"."columnlayout")
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 568.
MySQL has pretty much told us outright what the issue is. The problem is, how do we recover from it?
Find the Cause
There's no one-size fits all answer to this. In my case the corruption occurred as a result of playing with a database that I knew was performing badly, and trying to push the system to it's limit (as I said before, a mess of my own making). The reasons for your corruption may be hardware related, or simply the way you've got the system configured (are we reporting disk writes as complete before they actually are?)
Whatever the root cause, you must identify and rectify it before proceeding. There's little point in recovering the DB if it's just going to get corrupted again!
Recovering Our Data
Now obviously we take regular backups, but unless the corruption has occurred just after a backup has run, they're likely to be at least a little out of date. So, we're going to try and recover the data. We can't just use mysqldump as we'll likely see
Lost connection to MySQL server
What we need to do, is to put MySQL into recovery mode. Before we begin, take note of the following
- This will affect every DB on the server
- While our changes are active, all INSERT and UPDATE queries will fail
So we edit my.cnf (likely to be either /etc/my.cnf or /etc/mysql/my.cnf depending on your server) to add the following under the [mysqld] section
innodb_force_recovery=1
Next we restart MySQL
service mysqld restart
Now, InnoDB should skip over the corruption, so we should be able to run
mysqldump -u USERNAME -p JiraDB > jiradb_prerecovery.sql
Now we want to clear the corrupted pages, so
echo "DROP DATABASE JiraDB; CREATE DATABASE JiraDB" | mysql -u USERNAME -p
We'll also need to re-define any permissions that users previously had
echo "GRANT ALL PRIVILEGES ON JiraDB.* TO 'JiraUser'@'localhost';" | mysql -u USERNAME -p
Next we need to allow INSERT and UPDATE queries again, so reverse the changes to my.cnf and restart MySQL again, followed by
mysql -u USERNAME -p < jiradb_prerecovery.sql
Checking Other Databases
Depending on the root cause of the corruption (mine was an unholy amount of load), other databases may have been affected. We need to identify them, so run the following
for a in `echo "SHOW DATABASES" | mysql -u root --password='$DBPASSWORD'`; do mysqldump -u root --password="$DBPASSWORD" $a > "${a}.sql"; echo $a; sleep 5; done
Keep an eye on the MySQL logs, but any database that reports LOST CONNECTION is likely suffering. Follow the procedure above for that DB