Recovering from corrupted InnoDB Pages

I recently encountered an issue with various InnoDB pages becoming corrupted on the database that plays host to my JIRA install. It was - to some extent - a mess of my own making for mixing production and development databases (or more precisely, for hosting that production database on a dev machine).

Lesson learnt, sure, but I still needed to address the issue so that I could get JIRA up and running again.

This documentation details the steps to follow - it won't resolve every case of corruption, but it resolved the issues I was seeing

 

How Do You Know It's Corrupted Pages

There are two signs that you're suffering from corrupted pages, one concrete, one pretty incidental:

  • Low MySQL uptime
  • Log Entries

The former occurs because the MySQL process will have to restart every time it encounters the corruption. It also writes a page dump out to the logs, so on a busy server your logs get big quite quickly.

The key give-away though, is in the MySQL error log (or the main log if you haven't defined a separate error log). Mine was at /var/log/mysqld.log

140129 23:53:50 InnoDB: Page checksum 3244463932, prior-to-4.0.14-form checksum 3365547667
InnoDB: stored checksum 1349822260, prior-to-4.0.14-form stored checksum 3365547667
InnoDB: Page lsn 0 95653297, low 4 bytes of lsn at page end 95653297
InnoDB: Page number (if stored to page already) 568,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an update undo log page
InnoDB: Page may be an index page where index id is 0 369
InnoDB: (index "cl_username" of table "JiraDB"."columnlayout")
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 568.

MySQL has pretty much told us outright what the issue is. The problem is, how do we recover from it?

 

Find the Cause

There's no one-size fits all answer to this. In my case the corruption occurred as a result of playing with a database that I knew was performing badly, and trying to push the system to it's limit (as I said before, a mess of my own making). The reasons for your corruption may be hardware related, or simply the way you've got the system configured (are we reporting disk writes as complete before they actually are?)

Whatever the root cause, you must identify and rectify it before proceeding. There's little point in recovering the DB if it's just going to get corrupted again!

Recovering Our Data

Now obviously we take regular backups, but unless the corruption has occurred just after a backup has run, they're likely to be at least a little out of date. So, we're going to try and recover the data. We can't just use mysqldump as we'll likely see

Lost connection to MySQL server

What we need to do, is to put MySQL into recovery mode. Before we begin, take note of the following

  • This will affect every DB on the server
  • While our changes are active, all INSERT and UPDATE queries will fail

So we edit my.cnf (likely to be either /etc/my.cnf or /etc/mysql/my.cnf depending on your server) to add the following under the [mysqld] section

innodb_force_recovery=1

Next we restart MySQL

service mysqld restart

Now, InnoDB should skip over the corruption, so we should be able to run

mysqldump -u USERNAME -p JiraDB > jiradb_prerecovery.sql

Now we want to clear the corrupted pages, so

echo "DROP DATABASE JiraDB; CREATE DATABASE JiraDB" | mysql -u USERNAME -p

We'll also need to re-define any permissions that users previously had

echo "GRANT ALL PRIVILEGES ON JiraDB.* TO 'JiraUser'@'localhost';" | mysql -u USERNAME -p

Next we need to allow INSERT and UPDATE queries again, so reverse the changes to my.cnf and restart MySQL again, followed by

mysql -u USERNAME -p < jiradb_prerecovery.sql

 

Checking Other Databases

Depending on the root cause of the corruption (mine was an unholy amount of load), other databases may have been affected. We need to identify them, so run the following

for a in `echo "SHOW DATABASES" | mysql -u root --password='$DBPASSWORD'`; do mysqldump -u root --password="$DBPASSWORD" $a > "${a}.sql"; echo $a; sleep 5; done

Keep an eye on the MySQL logs, but any database that reports LOST CONNECTION is likely suffering. Follow the procedure above for that DB