Overview

Something is coming soon to Docker that will drastically simplify hardware replacement and rolling hardware upgrades (among other things). It’s called CRIU (Checkpoint/Restore In Userspace). I’m going to to describe the basic functionality and how it will commonly be used, talk a bit about some more speculative applications of the technology that we might see down the road, and then give a simple demonstration of how to use it.

What is CRIU?

CRIU provides support for checkpointing the complete state of a container, including memory, much like a VM snapshot saves the entire state of a virtual machine. Unlike a VM snapshot, however, a CRIU checkpoint can be significantly faster (and smaller) which introduces a number of potentially interesting new uses.

CRIU is a general Linux application snapshot facility and is thus being developed independently of Docker, but support for checkpointing Docker containers is included as an experimental feature in current Docker releases. All necessary kernel support is present in recent Linux distributions so all that is needed to try it out is installing the CRIU package for your distribution (and enable “experimental” mode for Docker).

What is CRIU good for?

The most direct applications involving migrating containers between hosts to:

  • Patch or upgrade a host
  • Replace a failing host
  • Move to a more powerful host

For any application that keeps significant state in memory, which would be costly to recreate, this is a real boon. I once worked on a system using a RETE rule engine that took quite a while to restart as it had to re-insert each object into the engine to reconstruct its state. This sort of snapshot capability would have been a tremendous help.

Other applications described on the CRIU wiki include:

  • Slow-boot service speed up

Any service that requires a long initialization sequence can be started and then checkpointed. After that, it can be started from the checkpoint.

  • Debugging a hung application/service

Often if a service gets into a hung state, we’re not in a position to just let it sit there. Often we need to restart it to restore functionality. With CRIU, the hung service can be checkpointed before restarting and debugging can be done against a copy created from the checkpoint.

  • Update dry run

Complicated update procedures are perilous because it can be difficult to restore service if the update fails. With CRIU, the service can be checkpointed and the update applied to a copy. If it succeeds, it can then be applied to the mainline service.

An additional use case, not mentioned on the CRIU site, is speculative execution or “what if” processing. Intelligent applications often need to search through possible responses and their implications before deciding on a course of action. This is typically dealt with via backtracking in systems that support it. Supporting backtracking can significantly increase the complexity of a system, particularly if the application makes changes to the filesystem or other data stores.

With CRIU, this could largely come for free via simple cooperation between the intelligent application and the Docker host. Imagine the application doing something like:

if (what_if(some_elaborate_operation_that_modifies_state_in_complicated_ways())) {
do_x();
} else {
do_y();
}

This would be relatively easy to set up just by using inotify on the host and container and having “what_if” do something like:

Legend:
--- original container ---
+++ copy +++
... host ...
--- original ---
- write /etc/hostname to in volume that host is
inotifywaiting on
- wait for host to touch in volume that
container is waiting on
---
... host ...
checkpoint container here and start copy container
(checkpointing stops the original container)
touch that copy is waiting on
...
// both original and copy run the 'if' test (much like fork())
if (hostname in != contents of /etc/hostname) {
+++ copy +++
run command and write results into
touch for host
+++
... host ...
at this point host removes , restores original
container and touches that original is
waiting on
...
} else {
--- original ---
remove
read and return results from in volume
---
}

I haven’t prototyped this yet myself. One reason will be discussed in the “Limitations” section below.

Current Docker Checkpoint API

With the caveat that CRIU is experimental in Docker and its API is subject to change, the current Docker API is very simple. There is one new top level Docker command “checkpoint” with “create”, “ls”, and “rm” subcommands:

docker checkpoint

and the “container start” command can now take “–checkpoint” and “–checkpoint-dir” arguments:

`docker container start –checkpoint-dir=

–checkpoint=`The “checkpoint-dir” argument is optional. If it is not specified, the Docker runtime will manage a container’s checkpoints, deleting them when the container goes away. You only need to specify a directory if you want to create multiple containers from the same checkpoint. We’ll see examples of this in the next section.

A Simple Example (automatic persistence)

I’d like to demonstrate how CRIU can be used to provide automatic state backups for a container. In this example, I’m going to use a Redis container because:

  • It’s easy to interact with
  • The persistence model I’m going to demonstrate is much like the original Redis persistence model; it’s interesting to consider how little work is required to provide something similar.

Demonstration Setup

I’m running my code in a vanilla Ubuntu 16.04 VM with the CRIU package and docker-ce installed. I’ve also installed the Redis module for python in order to interact with the Redis container.

Basic Idea

I have a bash script that will periodically checkpoint the Redis container maintaining a maximum of five checkpoints. Once five checkpoints have been created, it will periodically replace the oldest checkpoint with a new one. I’ll connect to the container interactively via Python and show how the in-memory state is maintained across container restarts and container replacement.

Checkpoint Script

This script will run a container and periodically checkpoint it. It takes at least three parameters:

  • Any arguments to pass to “docker run” as a single string
  • The name of the container to create
  • The image to run

All extra parameters are passed to the container.

There are two main parts to the script:

  • A backups() function that does the checkpointing
  • Code to create a container if it does not exist and start either the existing container or the newly created one; if a checkpoint exists, it is passed to the container start command

Once the container is started, the backups() function is run to begin the periodic checkpointing.

Let’s look at the container creation code first.

# Create a container if it does not yet exist
docker container inspect ${container_name} &>/dev/null \
|| docker container create --name ${container_name} ${run_args} ${container_image}

We start by using the Docker “container inspect” command to find out if the container already exists. If it doesn’t exist, we create it with the specified name, run arguments, and image.

# Find the most recent checkpoint
latest=$(ls -t ${datadir} 2>/dev/null | head -1)

if [[ "${latest}" != "" ]]; then
# If a checkpoint exists, use it to start the container
docker container start --checkpoint=${latest} --checkpoint-dir=${datadir} ${container_name}
else
# If no checkpoint exists, start the container without one
docker container start ${container_name}
fi

Next, we use “ls” to find the most recent checkpoint (if any). If a checkpoint is found, we start the container specifying both the checkpoint directory and the name of the checkpoint. If no checkpoint is found, we simply start the container.

Now the backup code…

# Create a rolling set of backup checkpoints for the running container
backups() {
while true; do
sleep 5

# Check status of container and exit with error if no container exists
status=$(docker container inspect -f '{{.State.Status}}' ${container_name} 2>/dev/null) || exit 1

# If container is stopped, exit without error
[[ "${status}" != "running" ]] && exit 0

# Find the current number of checkpoints
cpcount=$(ls ${datadir} 2>/dev/null | wc -w)

# If we've reached max, remove oldest checkpoint and
# set next checkpoint number to number of oldest
if [[ ${cpcount} -ge ${maxcheckpoints} ]]; then
oldest=$(ls -rt ${datadir} 2>/dev/null | head -1)
i=$([[ ${oldest} =~ ${checkpoint_re} ]] && echo ${BASH_REMATCH[1]})
docker checkpoint rm --checkpoint-dir=${datadir} ${container_name} ${oldest}
else
# otherwise, set next checkpoint number to next open number
i=$((${cpcount} + 1))
fi

docker checkpoint create --checkpoint-dir=${datadir} \
--leave-running=true ${container_name} checkpoint${i}
done
}

The backups() function checkpoints every five seconds. It uses “container inspect” to find out if the container still exists and, if so, whether or not it is running. If a running container is not found, we exit; otherwise we start creating checkpoints.

First we find out how many checkpoints already exist using “ls” and “wc”. If we haven’t yet reached our maximum number of checkpoints (5), we set the index of the next checkpoint one higher than the previous highest index (if we stopped at checkpoint2, we’ll start at checkpoint3). Once we’ve reached our max, we find the oldest existing checkpoint, use “checkpoint rm” to delete it and set the next checkpoint index to the index of the removed checkpoint.

Then, we create a new checkpoint using the new index. Notice the “–leave-running” flag passed to “checkpoint create”. If this flag is left off, the container will be stopped in the checkpoint process.

… and that’s it! We now have the ability to maintain persistent state for a container. Let’s run the script for a Redis container:

./run_with_checkpoints '--net=host' redis redis
8964f51a8f8910499a4ddb3224a4a0ee68c834ca72010744187f712790ce2811
redis
checkpoint1
checkpoint2
checkpoint3
checkpoint4
checkpoint5
checkpoint1
checkpoint2
...

A couple of things to notice:

  • We pass ‘–net=host’ to the docker run command so that we’ll be able to talk to the container from python on the host.
  • The checkpoints cycle as we discussed above (docker checkpoint prints out the name of the created checkpoint – if we were going to use this script in anger, we’d likely redirect that output to /dev/null).

Now let’s connect to the container from Python:

ubuntu:~/sandbox> python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
> import redis
> r = redis.StrictRedis(host='localhost', port=6379, db=0)
>

We’ve create a Redis connection. Now let’s store some data:

> r.get('data_to_persist')
> r.set('data_to_persist', 'persisted_data')
True
> r.get('data_to_persist')
'persisted_data'
>

Nothing is returned when we first ask, but it’s available after we set it. Now, in a different window let’s stop our Redis container:

ubuntu:~/sandbox> docker stop redis
redis
ubuntu:~/sandbox>

Once we do this, our checkpoint script exits as there is no longer a running container. If we try our Redis lookup from Python now, we get:

> r.get('data_to_persist')
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 880, in get
return self.execute_command('GET', name)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 578, in execute_command
connection.send_command(*args)
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 563, in send_command
self.send_packed_command(self.pack_command(*args))
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 538, in send_packed_command
self.connect()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 442, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.
>

Now, let’s start things up again:

ubuntu:~/sandbox> ./run_with_checkpoints '--net=host' redis redis
checkpoint3
...

and try connecting:

> r.get('data_to_persist')
'persisted_data'
>

Just like that!

Limitations

Not everything is sunshine and roses just yet, however. Though CRIU in Docker provides a lot of good functionality already, it’s important to be aware of a few things:

  • X applications running against a real X server can’t be checkpointed as some of an application’s state is maintained in the server.
  • Interactive containers with TTY access also can’t be checkpointed (yet).
  • Filesystem changes are not automatically picked up by checkpointing. This is the reason I alluded to above as to why I hadn’t prototyped the “what if” capability. Currently, you need to use “docker commit” to save any filesystem changes and then use the new image when restoring from checkpoints. This issue is the source of most CRIU bug reports and is under active development.

However, if you have a container that maintains state in memory and interacts with the outside world via sockets, you can already get a lot of leverage out of the checkpointing functionality.

Summary

  • CRIU provides experimental support in Docker for saving and restoring container state
  • CRIU is under active development but already provides useful functionality
  • The CRIU API is simple and easy to use
  • Efficient container snapshot support enables a lot of powerful use cases

Thanks for reading!


Interested in learning more about Yipee.io? Sign up for free to see how Yipee.io can help your team streamline their development process.