Introduction

One of the things I have noticed when researching CKAN running on containers is there does not seem to be any blog post or any article describing the CKAN stack running on a local Kubernetes cluster using pure yaml manifest files and not using Helm for the application deployment. The aim of this blog post is to simplify creating CKAN on Kubernetes. Helm (or any other templating engine/overlay software) is not a requirement for application deployment to Kubernetes.

This article is the first of two – the first one is running a simple Kubernetes Development environment with one container per CKAN service and not worrying about maintaining state on the PostGreSQL, Redis or Solr components. This environment is very similar to the current official CKAN Docker Compose installation. It will be good enough to see how a simple, local CKAN environment using Kubernetes can be up and running quite quickly.

The second article will include stateful services, running multiple replica’s of all services and can potentially be used as a base for running CKAN on a kubernetes cluster in Production. Both Dev and Prod environments will be running locally and both implementations will use exactly the same container images.

Approach

I have created a number of Kubernetes manifest yaml files that will be ‘kubectl applied’ through running a simple bash script. Together these manifest files are known as a declarative configuration of the kubernetes environment. The “state” of what we would like the environment to look like is specified in the files. The kubernetes implementation will make sure this happens.

Details of the local environment, Kubernetes software used and containers are as follows:

Local Machine:

  • Apple Mac Mini (2018) running MacOS BigSur (11.5.2)

Kubernetes:

  • k3s
  • k3d

Containers:

  • Load Balancer (nginx)
  • Ingress
  • PostgreSQL
  • Solr
  • Redis
  • DataPusher
  • CKAN

GitHub Repository: https://github.com/DataShades/ckan-k8s

Details of the Implementation

Firstly I have used the default namespace. Ideally, I should have defined a specific namespace for Dev and for Prod however I want to focus on the simplicity of these Kubernetes clusters. All Pods and ReplicaSets are created using Kubernetes Deployments. The production environment Pods and ReplicaSets for the components needing to maintain state will be created with StatefulSets. I have used ConfigMaps for all environment variables and Secrets for all passwords. Secrets are just vanilla Kubernetes secrets in manifest files however for Production you would need to consider some sort of encryption if the Secret manifest files reside in Git (an external vault or SealedSecrets).

Additionally, there is a Kubernetes Job that runs and mounts a ConfigMap that executes some SQL required for the initialisation of the Datastore database used by DataPusher. It runs the SQL remotely in a separate container against the PostgreSQL container.

I have used a NodePort for the PostgreSQL service type and assigned a specific ClusterIP address. No reason to do this other than to see how this works. The rest of the services all use ClusterIP’s.

Access from a browser to the running Cluster will go through a local Kubernetes LoadBalancer which talks to the ingress that accesses the back-end pod running the CKAN container. Communication between the rest of the components will go through the ClusterIP address of the service each has been assigned.

The k3d command to create the cluster and set up the load balancer is as follows:

k3d cluster create ckan-k8s --port 8080:80@loadbalancer --port8443:443@loadbalancer 

This says that ports 8080 and 8443 on my local machine map to 80 and 443 respectively on the load balancer running in the local Kubernetes cluster On my local machine the URL to access CKAN is:

http://localhost:8080/ or https://localhost:8443/   

The CKAN container image is similar to how the Open Knowledge docker-ckan/ckan-base/2.9 image is built and could be replaced by this image. The current version of CKAN in this image is 2.9.5. Same with the DataPusher image used, it is similar to the Open Knowledge image.

Installing Kubernetes Software

1. k3s – (k3s.io) – A very lightweight Kubernetes distribution perfect for local Kubernetes clusters and certified by CNCF. It is included with Rancher Desktop (rancherdesktop.io)

curl -sfL https://get.k3s.io | sh -  

2. k3d – (k3d.io) – A lightweight wrapper to run k3s clusters again, certified by CNCF

curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash

Creating a Kubernetes cluster

The following command will create a simple 1-node k3s kubernetes cluster:

k3d cluster create ckan-k8s 

You can view this new cluster with the following commands:

kubectl cluster-info , kubectl get all

What Makes up this Kubernetes Cluster?

This diagram provides a high-level representation of how the Kubernetes infrastructure and application components are laid out. The CKAN component is described in a bit more detail than the other components, which are roughly put together the same way:

Implementation

Running the script will produce the following output:


At the end of the script a the following command is run:

kubectl get pods

We can take a look at the running services using:

kubectl get svc -o wide:

In Kubernetes we use Volumes in a similar way to how we use them in traditional docker environments. We can look at the Volume implementation using:

kubectl get sc -o wide
kubectl get pv -o wide
kubectl get pvc -o wide

Here we see one Storage Class used local-path and 2 volumes (claims), one for CKAN and one for the PostgreSQL database
.

We can take a look at the CKAN pod logs using:

kubectl logs ckan-cb7dc4b5b-84qk4:

We can also see configuration details of the CKAN pod using:

kubectl describe pod ckan-cb7dc4b5b-84qk4:

If you would like to learn more about CKAN and how you can establish developer operations to support your own data portal projects please get in touch