Implementing Encrypted Incremental Backups with S3cmd

I've previously detailed howto use S3cmd to backup your data from a Linux machine. Unfortunately, because of the way that s3cmd works, if you want an incremental backup (i.e. using 'sync') you cannot use the built in encryption.

In this documentation I'll be detailing a simple way to implement an encrypted incremental backup using s3cmd, as well as a workaround if you're unable to install GPG - instead using OpenSSL to encrypt the data. Obviously we'll also be exploring how to decrypt the data when the backups are required

It's assumed that you've already got s3cmd installed and configured to access your S3 account (see my earlier documentation if not

 

 

The Aim

Essentially, we need to solve the very problem that prevents s3cmd from allowing encryption with the sync command.

The way sync works, is to generate a checksum of the local file, retrieve a checksum of the remote file and then upload the file if the two differ.Obviously, when the remote file is encrypted, the checksum will always differ and so every file will be uploaded every time.

To work around this, we're going to store some checksums locally and then use 'put' to upload a file if we detect it's changed since the last backup

We'll look at two methods of encryption

Using GPG helps keep our backup script simple, however s3cmd uses a symmetric key, so anyone who gains access to your server can take your s3cmd config and remotely retrieve and decrypt your backups, giving potential perpetual access.

The OpenSSL method uses Public key cryptography, so the same attacker would be unable to decrypt your backups unless they manage to gain a copy of your Private Key (which should never be stored on the server that it relates to).

You can grab commented copies of the scripts used in this article (and some others) from my BackupEncryption Scripts Repo on GitHub.

 


With GPG

The functionality built into s3cmd utilises GPG for encryption, so if you haven't already got it installed, you'll need to do so

If, for whatever reason, that's not an option, skip down to Without GPG

Preparation

We need somewhere to store our hashes. As there may be multiple backups, it seems best to give each individual backup area it's own directory within a master directory. Change the path to suit your needs

mkdir /var/backuphashes

We also need to make sure s3cmd has the details it needs for encryption, test by running (foo doesn't need to exist)

s3cmd -e put foo s3://randombucket/
ERROR: Encryption requested but no passphrase set in config file.
ERROR: Please re-run 's3cmd --configure' and supply it.

Set up encryption by running configure (the defaults will match your current config where possible)

s3cmd --configure

Enter an encryption password when prompted

 

The Code

You'll need to adjust the initial variables to suit your set up (in this example, the directory I'll be backing up is called 'Notes')

# Backup directory specific
loca="Notes" 
basedir="/mnt/files"
s3path="benbackup/Server1"

# Largely static (server specific at least)
hashdir="/var/backuphashes" 


cd $basedir/$loca
if [ ! -d "$hashdir/$loca" ]
then
        mkdir -p "$hashdir/$loca"
fi

# Find all files within this directory and it's subdirs
find * -type f | while read -r a
do

        fnamehash=`echo "$a" | sha1sum | cut -d\  -f1`
        filehash=`sha1sum "$a" | cut -d\  -f1`
        source "$hashdir/$loca/$fnamehash" 2> /dev/null

        if [ "$filehash" != "$storedhash" ]
        then
                /usr/bin/s3cmd put -e $a s3://$s3path/$loca/$a
                echo "storedhash='$filehash'" > "$hashdir/$loca/$fnamehash"
        else
                # Hashes match, no need to push
                echo "$a unchanged, skipping......"
        fi
done

  

Decrypting

Decrypting is as simple as telling s3cmd to retrieve the files from S3 (if you're in a disaster recovery situation, you might need to configure s3cmd with the same password)

s3cmd --recursive get s3://benbackup/Server1/Notes

 

 


 

Without GPG

An alternative method to using GPG is to use OpenSSL to encrypt the files. There are various techniques for doing this, but my preference is to encrypt each file with a unique One-Time-Password, which is then encrypted using our public key

 

Preparation

As before, we want to create the hashstore directory

mkdir /var/backuphashes

We also need to generate ourselves an RSA keypair

openssl genrsa -out backupkey.key 4096
openssl rsa -in backupkey.key -out backupkey-public.pem -outform PEM -pubout

You can put backupkey-public.pem on whatever servers you want, but backupkey.key must be kept secret.

 

The Code

We've slightly more to do this time, so it's a little more complex

# Backup directory specific
loca="Notes"
basedir="/mnt/files"
s3path="benbackup/Server1"
# Number of characters to use in the OTP
keylen="256"

# Largely static (server specific at least)
hashdir="/var/backuphashes"

# Public key to be used for encryption
keyfile="~/keys/backupkey-public.pem"

# What should we use as a temporary directory?
tmpdir="/tmp/"

# Change to urandom if issues with /dev/random blocking are experienced
rand="random"


cd $basedir/$loca
if [! -d "$hashdir/$loca" ]
then
        mkdir -p "$hashdir/$loca"
fi

find * -type f | while read -r a
do
    fnamehash=`echo "$a" | sha1sum | cut -d\  -f1`
    filehash=`sha1sum "$a" | cut -d\  -f1`
    
    source "$hashdir/$loca/$fnamehash"
    
    if [ "$filehash" != "$storedhash" ]
    then
       echo "Encrypting $a"
       fname=`basename $a`
    
      # Create the OTP
      cat /dev/$rand | tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?='|fold -w $keylen| head -n 1 > $tmpdir/key.txt

      # Encrypt the file
      openssl enc -aes-256-cbc -salt -pass file:$tmpdir/key.txt -in "$a" > "$tmpdir/$fname.enc"
    
      # Encrypt the key
      openssl rsautl -encrypt -pubin -inkey $keyfile -in $tmpdir/key.txt -out $tmpdir/key.txt.enc
    
      cd $tmpdir
      # Package the two together
      tar -cf "encrypted.enc.tar" key.txt.enc "$fname.enc"
    
      /usr/bin/s3cmd put encrypted.enc.tar "s3://$s3path/$loca/$a.enc.tar"
      echo "Tidying up"
      rm -f encrypted.enc.tar key.txt.enc key.txt "$fname.enc"
      echo "storedhash='$filehash'" > "$hashdir/$loca/$fnamehash"
      cd $basedir/$loca
    
    else
      # Hashes match, no need to push
      echo "$a unchanged, skipping......"
    fi
done

 

Decryption

Decryption is a little more involved than when using GPG. First, you'll need to make a copy of the Private key available to the system needing to decrypt the backup

Create a script 'openssldecrypt.sh' with the content below

#!/bin/bash
#
# Use OpenSSL to decrypt a file, or find and decrypt encrypted files within a directory structure
#
# Copyright (C) 2014 B Tasker
# Released under GNU GPL V2 - See http://www.gnu.org/licenses/gpl-2.0.txt
#
# Arguments
#
# [-f file] - File to decrypt
# [-d directory] - base directory to find encrypted files in
#
# One of -f or -d must be set
#
# -k [public key file] - the PEM encoded RSA key to use for encryption [Default = ~/privkey.key]
# -c [encrypted filename] - The filename used within the tarball for the original file [Default = encryptedfile.enc]
# -r - Should the encrypted file be removed?


# Perform the decryption
decryptFile(){
file=$1
keyfile=$2
deffile=$3
remove=$4

origfile=`echo -n "$file" | sed 's/.enc.tar//g'`
here=`pwd`
echo "Decrypting $here/$origfile"
tar xf "$file"
if [ -e "$origfile.enc" ]
then
cryptfile="$origfile.enc"
elif [ -e "$deffile" ]
then
cryptfile="$deffile"
else
echo "Can't find a file to decrypt, use -c to define the filename"
fi

openssl rsautl -decrypt -inkey "$keyfile" -in key.txt.enc -out key.txt
openssl enc -aes-256-cbc -d -pass file:key.txt -in "$cryptfile" > "$origfile"
rm -f key.txt key.txt.enc "$cryptfile"

if [ "$remove" == "1" ]
then
rm -f $file
fi
}

while getopts "f:d:k:c:r" flag
do
case "$flag" in
f) file="$OPTARG";;
d) directory="$OPTARG"; recurse=1;;
k) keyfile="$OPTARG";;
c) crypfile="$OPTARG";;
r) remove=1;;
esac
done

if [ "$file" == "" ] && [ "$directory" == "" ]
then
cat << EOM
Usage $0 [-f file] [-d directory] [-k private key file] [-c encrypted filename]
Example: $0 -f myfile -k ~/public.pem -l 256 -t /tmp -D random

Arguments:
[-f file] - File to encrypt
[-d directory] - Directory structure to find encrypted files in
[-k private key file] - the PEM encoded RSA private key to use for encryption [Default = ~/privkey.key]
[-c encrypted filename] - The filename used within the tarball for the original file [Default = encryptedfile.enc]
[-r] - Remove the encrypted version of the file [Default = Off ]

EOM
exit 1
fi

# Set the defaults
keyfile=${keyfile:-"~/privkey.key"}
crypfile=${keyfile:-"encryptedfile.enc"}

if [ "$recurse" == "1" ]
then
if [ ! -d $directory ]
then
echo "ERROR: $directory doesn't appear to be a directory (did you mean to use -f?)"
exit 1
fi

cd $directory
cur=`pwd`
find ./ -name *.enc.tar -type f | while read -r dir
do
cd $(dirname $dir)
decryptFile $(basename $dir) "$keyfile" "$crypfile" $remove
cd "$cur"
done
else
if [ ! -f "$file" ]
then
echo "ERROR: $file doesn't appear to be a file (did you mean to use -d?)"
exit 1
fi
decryptFile "$file" "$keyfile" "$crypfile" $remove
fi

 

We can now use this script to decrypt either individual files, or recurse through a directory structure decrypting any files we know to be encrypted

# Decrypt a specific file
./openssldecrypt.sh -f testfile.enc.tar -k /root/backupkey.key

# Search a directory structure and decrypt any files we find
./openssldecrypt.sh -d recovered_backup -k /root/backupkey.key

# Search a directory structure and decrypt any files we find, remove the encrypted version of each file afterwards
./openssldecrypt.sh -r -d recovered_backup -k /root/backupkey.key

 


Conclusion

At first glance, s3cmd's inability to use encryption for incremental backups is quite a limitation, but it's fairly straight forward to implement workarounds for the behaviour.

The exact encryption mechanism you use will depend on your wants and needs.

Using OpenSSL makes backup scripts a little more complex, and introduces an additional step into recovery. However, because it includes Public key cryptography, you get the security advantage of not having the decryption key stored on your server.

Whichever encryption mechanism you use, you need to ensure that you have a (safe) record of whatever details you'll need to decrypt the data in a Disaster Recovery situation. That may mean a record of the password used for GPG, or may be a securely stored copy of your private key.