I’ve been working with Link Digital from the Berlin office over the last 18 months, mostly within the CKAN Tech Team to assist the regular operations of supporting, improving, and shipping CKAN for it’s community of users.  My time has been spent helping out with CKAN issues and PR’s mainly in the DevOps area, Docker installs, packaging CKAN releases, testing PR’s and new releases, plus documentation. However, I have recently been investigating some platform tools for Link Digital.  As part of this, I have the following to share, which I think might benefit the community of CKAN folks also working on platform build techniques and methods.

Background

The CKAN software stack can be installed as Docker containers. This is useful when you are enhancing CKAN code as you can be assured the CKAN stack you are building will work in the same way no matter what environment (OS, VM or physical server) hosts the CKAN stack, as long as the Docker Engine is running. It’s a way of controlling how your CKAN software additions/enhancements will operate in all environments (Dev, Test and Production). This can dramatically reduce the changes needed for each environment if you were rolling out CKAN enhancements to a more traditional setup.

Aim

  • To run the CKAN stack as Docker containers using Docker Compose
  • To analyse where efficiencies can be made to reduce the time to build the CKAN containers
  • To implement those efficiencies and record the time it takes to build and run the CKAN stack
  • Once implemented it will be straight-forward to move to an orchestrated container environment using either
  • Docker Swarm
  • Kubernetes
  • amazee.io’s Docker-based Lagoon environment

CKAN software stack

The CKAN software stack contains

  • CKAN Web (v2.9.2 – to run the web UI)
  • CKAN Workers (v2.9.2 to run jobs and harvest workers)
  • PostgreSQL (for CKAN and Datastore)
  • Solr
  • Redis

The initial build

  • Both CKAN images use ubuntu:20:10 as the base image
  • System packages are then installed
  • Python virtual environment installed
  • pip, setuptools installed
  • CKAN and CKAN extension software installed
  • config management software installed
  • PostgreSQL uses million/postgis:11 as the base image
  • Solr uses solr:6.6.6 as the base image
  • Redis uses amazeeio/redis:6-latest as the base image

Time to build: 19 mins, 24 secs (most of this was building both CKAN containers)

After efficiencies made

The obvious area to explore reducing time-to-build is both CKAN container builds (ckan-web and ckan-workers) as there is a large double-up of steps.

A new CKAN base image is now pre-built which is based on the uselagoon/python-3.8 image and can be downloaded from DockerHub.

The uselagoon/* images use Alpine as the base image which is a robust, security-oriented, lightweight Linux distribution.

The new CKAN base image includes:

  • system packages, 
  • a Python virtual environment, 
  • installs of pip3 and setuptools 
  • installation of CKAN
  • Supervisor installation

The 2 CKAN images (ckan-web and ckan-worker) are now derived from this new CKAN base image and they include:  

  • certain CKAN extensions (xloader, harvest, syndicate and scheming)
  • config management (using crudini)
  • setup files 
  • file/directory creating and permission changes
  • health checks (CKAN web)

The new PostgreSQL, Solr and Redis builds are based on based on uselagoon/* images so are very lean and add to the overall efficiencies made. A new NGINX container was also added to the CKAN stack and the base image for this was uselagoon/nginx

Time to build: 4 mins, 51 secs ie: more than 4x faster)

The repository for the CKAN stack after the efficiencies were made is located at: https://github.com/DataShades/ckan-lagoon

I wasn’t expecting the build time to reduce so drastically with these changes. However, by using a pre-built Docker image for the base CKAN configuration tasks we can re-use a large portion of the build steps for multiple containers, greatly simplifying the build process.

At Link Digital we provide fully managed open data platforms based on CKAN. If you need support for your own data portal please feel free to get in touch and we can find a way to help out.