At Link Digital we work on a huge number of CKAN projects. CKAN is an open-source data management system (DMS) that is used to establish data hubs and data portals that make it easy for organisations to publish, share and work with data. With CKAN you can catalogue, store and access datasets via a rich front-end that is enabled with powerful platform features, visualisation tools and customisable plugins. You can also leverage its API to access the catalogue of datasets and data resources via a wide variety of other applications.
CKAN – The powerful data platform
My role is to help with the developer operations and infrastructure, and I’ve been working to expand our team’s capability to host and support CKAN on Microsoft Azure. To understand how CKAN’s components work together to make a powerful data platform, we need to identify those basic CKAN components and match them to relevant Azure services.
CKAN components require a relational database system (PostgreSQL), search platform (Solr) and web platform (Python). With an understanding of each separate component within the system, we can decide the topology to match the size requirements of the service instances.
Deploying CKAN on Azure to meet business goals
The first option, which is suitable for some of our clients, is to place all components inside a single virtual machine (VM).
Placing all components inside an instance could be a good choice for a small client wanting to build a small hub or portal. Thus, we can make an easy setup with the database platform, search platform and web platform (with CKAN’s Python codebase) in a single VM.
The main limit to such a topology, which many clients will experience, is its poor handling of large volumes of traffic generated from end-users. The single VM topology will take a lot of effort to maintain, upgrade and scale up.
The second option is to build and deploy service instances separately, to provide some convenience for the system admins during maintenance. Furthermore, service instances can be built as a cluster that can provide high IO and HA. While making a service instance as a dependent node, a Web instance (Python) could be scheduled to scale out to multiple nodes during high peak hours or a small number of nodes on non-high peak hours.
The deployment in this case will aid large systems that handle much more traffic over the internet. The drawback, of course, is the cost of making multiple nodes for both Web & Service instances.
Wait a minute, do we have a hybrid option that combines both? Well, I would say “no” but yes. While many clients should want to share the Service instance to reduce the cost for maintenance but still get a good IO for their service.
With these options, we provide such software as a service that contains database services and search services. Then clients can pay for their web instance, and pay an extra fee for our hosting services. They could consider this topology with the first one, and with an additional payment, clients can get a better IO from the database & search service. Because their hosts only handle web instances that can be extended easily.
AWS & Azure cost comparison
Based on CKAN requirements, CKAN should be deployed in 2 nodes.
Before making a comparison, we would need to understand the size of the system and what CKAN requires for the resource.
Get in touch with us here for more information about pricing your project.
Building your CKAN in the real world