Rotating Docker Container Logs To Comply With Retention Policies

Docker's default configuration doesn't perform log rotation.

For busy and long running containers, this can lead to the filesystem being filled with old, uncompressed logging data (as well as making accidental docker logs $container invocations quite painful).

It is possible to configure docker to rotate logs by editing daemon.json, but the rotation threshold options are fairly limited:

  • max-size: size at which to rotate
  • max-file: max number of rotated files

Whilst these options do help to reduce filesystem usage, being purely size based they fail to support a number of extremely common log rotation use-cases

  • Log rotation at a specific time based interval (e.g. daily log rotation)
  • Maximum retention periods (to comply with GDPR retention policies etc)

Unfortunately, json-file isn't the only logging driver to suffer from this limitation, the local driver has the same restrictions. It looks like there's an implicit decision that anyone who wants to follow common rotation practices should just forward logs onto syslog, journald or some other logging infrastructure (such as logstash). In practice, there are a variety of use-cases where this may be undesirable.

However, as json-file simply writes loglines into a logfile on disk, it's trivial to build a script to implement the rotation that we need.

This documentation details how to set up interval based log rotation for docker containers


The basic command

The meat and bones of our rotation solution is a loop which

  • Lists running containers
  • Uses docker inspect to identify where their logs are
  • Copies the log to a predefined destination
  • Truncates the original
  • Compresses the copy
  • Removes any rotated logs older than n days

We can achieve this with the following

# Where do we want to archive logs to?
LOGDIR="/var/log/docker"

# Date to use in rotated filenames
DATESTR=`date +'%Y%m%d-%H%M'`

# ensure the logdir exists
mkdir -p "$LOGDIR"

for container in `docker ps --format '{{.Names}}'`
do
    logpath=`docker inspect --format='{{.LogPath}}' "$container"`
    logdest="${LOGDIR}/${container}-${DATESTR}.json.log"

    # Copy the logfile
    cp "$logpath" "$logdest"

    # Truncate the original
    truncate -s 0 "$logpath"

    # Compress the copy
    gzip -f "$logdest"    
done

# tidy out logs older than 90 days
find "$LOGDIR" -name '*gz' -mtime +90 -exec rm {} \;

We copy and truncate rather than moving the logfile because docker will continue to use it's original file-handle (meaning it won't write into a replacement logfile unless the container is restarted).


Collecting Statistics

We could just put the above into a shell script, add it to a crontab and call it job done.

But, I generally think it's better to collect statistics at the same time: it means logrotation activities can be graphed to make it easier to spot when something unexpected happens.

My preference is to collect stats and write them into InfluxDB, which we can achieve with the following script

#!/bin/bash
#
# From https://www.bentasker.co.uk/posts/documentation/linux/periodically-rotating-docker-container-logs.html
#

# Where do we want to archive logs to?
LOGDIR=${LOGDIR:-"/var/log/docker"}

# Set this to "" to disable stat submission
INFLUX_HOST=${INFLUX_HOST:-"http://127.0.0.1:8086"}
INFLUX_USER=${INFLUX_HOST:-""}
INFLUX_PASS=${INFLUX_PASS:-""}
INFLUX_DB=${INFLUX_DB:-"telegraf"}
INFLUX_LOG_TAG=${INFLUX_LOG_TAG:-"docker"}

# Containers to rotate logs for
#
# If specifying manually, space seperate them
CONTAINERS=${CONTAINERS:-""}

function writeStats(){
    if [[ "$INFLUX_HOST" == "" ]]
    then
        return
    fi

    # Build the point
    POINT="log_rotate,host=$HOSTNAME,logs=$INFLUX_LOG_TAG total_t=${TOTAL_TIME}i,purge_t=${PURGE_TIME}i,purged_files=${PURGE_COUNT}i,rotate_t=${ROTATE_TIME}i,rotate_count=${X}i,skipped_files=${SKIPPED}i,rotated_lines=${LINECOUNT}i $NOW"

    auth="X-Foo: bar"
    if [[ ! "$INFLUX_USER" == "" ]]
    then
        auth="Authorization: basic `echo -n "$INFLUX_USER:$INFLUX_PASS" | base64`"
    fi

    curl -X POST "${INFLUX_HOST}/write?db=${INFLUX_DB}&precision=s" \
    -H "$auth" \
    -d "$POINT"
}

START=`date +'%s'`
DATESTR=`date +'%Y%m%d-%H%M'`

# ensure the logdestination exists
mkdir -p "$LOGDIR"

# Initialise some counters
SKIPPED=0
X=0
LINECOUNT=0

# This could have been included in the definition above
# but apparently doing so breaks syntax higlighting on my site
# will have to fix that...
if [[ "$CONTAINERS" == "" ]]
then
    CONTAINERS=`docker ps --format '{{.Names}}'`
fi


for container in $CONTAINERS
do
    logpath=`docker inspect --format='{{.LogPath}}' "$container"`

    if [ ! -f "$logpath" ]
    then
        SKIPPED=$(( $SKIPPED + 1 ))
        continue
    fi

    logdest="${LOGDIR}/${container}-${DATESTR}.json.log"

    # Copy the logfile
    cp "$logpath" "$logdest"

    # Truncate the original
    truncate -s 0 "$logpath"

    # Add an informative logline
    echo "{\"log\" : \"`date +'%Y/%m/%d %H:%M:%S'` [info] Log rotated. See $LOGDIR for older logs\\n\", \"stream\":\"stdout\",\"time\":\"`date +'%Y-%m-%dT%H:%M:%SZ'`\"}" >> "$logpath"

    # Increment the line counter
    LINECOUNT=$(( $LINECOUNT + `wc -l "$logdest" | cut -d\  -f1`))

    # Compress the copy
    gzip -f "$logdest"

    # Increment the counter
    X=$(( $X + 1 ))
done

ROTATE_END=`date +'%s'`


# tidy out old logs
PURGE_COUNT=`find "$LOGDIR" -name '*gz' -mtime +90 -print | wc -l`
find "$LOGDIR" -name '*gz' -mtime +90 -exec rm {} \;
PURGE_END=`date +'%s'`

# Calculate some stats
TOTAL_TIME=$(( $PURGE_END - $START ))
PURGE_TIME=$(( $PURGE_END - $ROTATE_END ))
ROTATE_TIME=$(( $ROTATE_END - $START ))

# Write to InfluxDB (if enabled)
writeStats

# Write to stdout
cat << EOM
Docker log rotation completed.

Files Rotated: $X
Files Skipped: $SKIPPED
Lines rotated: $LINECOUNT

Old logs purged: $PURGE_COUNT

Total time: $TOTAL_TIME

EOM

This performs the rotation, but also provides some additional statistics

  • total_t : time spent processing
  • purge_t : time spent purging old logs
  • purged_files : number of files purged
  • rotate_t : time spent rotating
  • rotate_count : number of files rotated
  • skipped_files : number of files skipped (because the file was not found)
  • rotated_lines : how many loglines were in this rotation

With the statistics safely stored in InfluxDB, we can trivially create a dashboard using Flux queries like

from(bucket: "telegraf/autogen")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "log_rotate")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "rotate_count")
  |> group(columns: ["logs"])
  |> aggregateWindow(every: 1d, fn: sum)

Scheduling Rotation

Once we've got a script we're happy with, it's simply a case of saving it on the server (I called it docker_logs_rotate.sh) and scheduling it in cron. The following will have the job run once daily at midnight

echo "0 0 * * * root INFLUX_HOST='https://myinfluxdbhost:8086' /path/to/docker_logs_rotate.sh" | sudo tee /etc/cron.d/docker_logs_rotate

Reading Docker logs directly

Each of the lines within docker's log is a JSON encapsulated object:

{"log" : "2022/05/30 00:00:03 [info] Log rotated. See /var/log/docker for older logs\n", "stream":"stdout","time":"2022-05-30T00:00:03Z"}
{"log":"206.189.120.26 - - [30/May/2022:00:00:05 +0000] \"GET /categories/i2p.xml HTTP/1.1\" 304 0 \"-\" \"feedparser/6.0.8 +https://github.com/kurtmckee/feedparser/\"\n","stream":"stdout","time":"2022-05-30T00:00:05.884810401Z"}

This isn't particularly convenient if you're trying to review loglines - especially those which have are full of escaped quotes etc.

However, they can be converted back to a more human readable form by doing

cat $PATH_TO_LOG  | jq -r '.log'

In the example above, this'll print

2022/05/30 00:00:03 [info] Log rotated. See /var/log/docker for older logs

206.189.120.26 - - [30/May/2022:00:00:05 +0000] "GET /categories/i2p.xml HTTP/1.1" 304 0 "-" "feedparser/6.0.8 +https://github.com/kurtmckee/feedparser/"


Conclusion

Docker's default approach to logging isn't particularly ops friendly: logs aren't rotated by default and even when rotation is enabled, the default logging driver only supports time based thresholds (which is problematic for any operator who has to observe time based retention periods).

However, implementing proper rotation of container log files is simply a case of creating a small script to copy files over.