This blog post was co-authored by Andrew Nette, Technical Writer/Analyst at Link Digital
At Link Digital, we work on many data management projects using the open source data management system known as CKAN – the Comprehensive Knowledge Archive Network. But Link Digital often uses CKAN in combination with a range of other tools, one of which is Azure. My role at Link Digital is to help with the developer operations and infrastructure, and I’ve been working to expand our team’s capability to host and support CKAN on Microsoft Azure.
The following article will explore what Azure is and how organisations can use it with CKAN to build powerful data solutions.
What is Azure?
Azure is Microsoft’s cloud computing platform. Link Digital utilises it to host open data platforms and enterprise data solutions using CKAN – software used to power internal data catalogues and public facing data portals that makes it easy for organisations to publish, share and work with data.
Azure has a diverse range of services for data management, processing, and analytics, as well as different highly scalable storage options from small-scale to data lakes capable of serving as a secure repository for big data analytics.
Azure can be easily integrated with CKAN to provide single sign-on functionality. CKAN’s application programming interface can be leveraged to interact with Azure services and custom extensions for CKAN can be developed and hosted on Azure.
CKAN and Azure together: the different options
To really understand how CKAN and Azure can work together to make a powerful data platform, we need to identify those basic CKAN components and match them to relevant Azure services. As a minimum, CKAN components require a relational database system (PostgreSQL), a search platform (Solr) and a web platform (with CKAN’s Python codebase). With an understanding of each separate component within the system, we can decide the topology to match the size requirements of the service instances.
Option 1: CKAN and Azure in the same instance
The first option, which is suitable for some clients, is to place all the components, the database system, search platform and web platform, into a single instance or virtual machine – a digitized version of a physical computer.
There are limitations to this set up however, which many clients will experience. First is the poor handling of large volumes of traffic generated from end-users. Second, while the single instance typology is easy to build, it will take a lot of effort to maintain, upgrade and scale up.
Option 2: CKAN and Azure in separate instances
These problems can be fixed with the second option, deploying the components on separate instances and using the CKAN web application to connect them together. These service instances can also be built as a cluster. This provides convenience for the systems administrators during maintenance, as one instance can be repaired and upgraded, without disturbing the work of operation of the others. While making a service instance as a dependent node, a Web instance (Python) could be scheduled to scale out to multiple nodes during high peak hours or a small number of nodes on non-high peak hours.
This second option is preferred for big data or enterprise-level operations, because it can handle much more traffic and the maintenance and/or upgrading of one instance does not take the whole service offline or interfere with the operations of the others. This provides:
- High input and output. Input is the data sent to the computer system, while output is the data that is processed and comes from the system. The more nodes in an instance the better the communication between an information processing system, such as a computer and another information processing system.
- High availability for business critical functions. This is the reduction or elimination of single points of failure to enable web applications to continue to operate even if one of the components it depends on fails or needs servicing.
The drawback of option two is the cost.
Link Digital utilised this second option in work for two clients.
The first client was the Energy and Resources Markets Branch of New Zealand’s Ministry of Business, Innovation and Employment. The country’s regulatory steward for resources and energy market systems, the Branch holds a huge amount of data, including field seismic data on tape, and databases relating to mines, petroleum production, and data and documents relating to royalties, fees and permit related documents, including significant spatial data.
The project involved replacing the Branch’s existing online data storage platform, which included an internal asset register, as well as a public facing portal. Link Digital helped to create separate instances for each of the main components, Postgres, Solr and Redis (an open source, in-memory data structure store that functions primarily as a NoSQL database, cache, and message broker). We also used Azure’s Active Directory Single Sign-On to allow for convenient log in to web based software applications.
The second client was Médecins Sans Frontières/Doctors Without Borders (MSF). MSF has a global federated structure comprising 26 self-governed associations, each of which is affiliated with one of five MSF Operational Centres. The organisation wanted to streamline and improve the sharing of research data collected by the organisation’s staff, including its field practitioners, both internally and with external third parties. After an initial exploration of the MSF’s internal data sharing capacity, the offices in the Netherlands and Japan jointly decided to develop a data sharing platform.
Working with MSF staff, Link Digital augmented a CKAN-based data sharing infrastructure with Azure. This involved two instances: one combining Postgres, Redis, and Solr, and another containing the CKAN core and nginx, a web server, reverse proxy, content cache, load balancer, and mail proxy server. The build also utilised Azure’s single sign-on to enable users to conveniently connect to and use cloud-based apps over the Internet. You can read more about Link Digital’s work with MSF here.
Whatever option organisations choose to adopt, clients can pay for their web instance and pay an extra fee for our hosting services. And with an additional payment, clients can get improved input and output from the database and search service. To get an idea of the cost Link Digital would need to understand the size of any intended system and what is required to resource it.
Want to know more about how Azure and CKAN can work together?
Get in touch with us for more information about pricing your project.