A large international development agency working to support food security and promote sustainable agricultural development.
The client wanted to modernise its enterprise level public data catalogue to make it more secure and efficient. Using CKAN – the Comprehensive Knowledge Archive Network – the catalogue collects data on agriculture and sustainable farming from a wide range of sources and shares it with stakeholders in Africa, Asia, Europe, Latin America, and the Middle East. The catalogue acts as a secure means of storing and sharing this data, turning it into actionable information that can contribute to economic growth while promoting the sustainable use of natural resources.
Link Digital upgraded the existing CKAN instance, which is used for the data cataloguing component, and refactored a critical custom-built integration with Google Cloud and, in particular, Google Cloud BigQuery, a data warehouse and analytics centre that enables users to store, manage, and analyse large volumes of structured and semi-structured data. We adapted the client’s current data pipeline procedures and practices to mesh with this new configuration, including modifications which enabled users to search and preview very large datasets. The project showcases Link Digital’s in-depth CKAN expertise. It is also a textbook example of CKAN’s flexibility in terms of working with other software components, and the ‘no vendor lock-in’ promise of open source software more generally.
Link Digital was subcontracted to provide specialist services to the project, over the period January to June 2024. The work was executed using an Agile methodology, with bi-weekly sprints. Regular status meetings ensured transparency and excellent communication between the various stakeholders. The final production deployment took place on schedule, with Link Digital’s expert senior developers remaining on standby for real-time support.
The project involved the following technological components:
The client’s catalogue is a public facing application designed to provide global open access to agroinformatics related data. Agroinformatics is the use of data-driven decision making in relation to agriculture, to assist farmers make more informed decisions and ensure greater productivity and environmental sustainability in relation to food production.
The data is sourced from a diverse range of sources, and includes topics ranging from crop planting, soil health and fertiliser use, to pest control, supply chain optimisation and measures to help with disaster preparedness and climate change abatement and mitigation. It offers a range of tools for customising data visualisation and producing data maps and statistical time series, to help configure data into actionable insights. These are used by governments and other stakeholders, including digital agriculture experts, economists, national governments, not for profits, and farmers groups, including in regional and remote areas. Stakeholders can search for information by keyword, organisation, country, resource type and theme.
The catalogue contains over two million data layers, thousands of statistics series and approximately 4,000 metadata records. It includes data in CSV, XLS and XML formats, as well as geospatial data, including advanced geospatial modelling. This is aggregated from a vast range of sources, including businesses, not for profit organisations, inter-governmental agencies, academics, economists, and space agencies.
The core problem faced by the client – the need to update the key software components to modernise the catalogue’s operations – is a common one in the lifecycle of enterprise software. Nonetheless, it required some complex customisation work. Not only did Link Digital upgrade the existing CKAN instance, and its Python interface, to make the catalogue more functional and secure, we implemented several plugins to make the benefits of CKAN operable within Google.
We connected CKAN to BigQuery, developing a BigQuery-csv resource type that allowed users to upload a CSV file, which would then be ingested into a BigQuery table via a cloud function. Ensuring all information was fetched ‘on the fly’ by Google Cloud, rather than being stored in CKAN, helps to maintain a single source of truth. The platform relied on a highly specific CKAN extension, ckanext-jsonschema, to integrate with Google BigQuery.
This was not a standard off the shelf feature but had to be custom built. A standard CKAN upgrade would likely break the link between CKAN and BigQuery, thus disrupting data workflows and the functionality crucial to the platform’s mission.
CKAN has powerful capacity in relation to metadata, including enabling users to customise their metadata schema with additional information. However, the client had a completely different schema for metadata. We focused on improving the problems inside their scheme rather than insisting that they change it.
Another major challenge was the size of some of the client’s datasets, literally hundreds of gigabytes of data. This necessitated allowing a user on the CKAN side to search datasets stored in the client’s Cloud and download only part of the data using Google BigQuery, to enable a quick overview. In this way, the CKAN front end of the catalogue helps users to get a quick glance at what is available. Via CKAN, a user can then request the full dataset and is notified by email when it is available. They can then download one or more files from GoogleBigQuery. We also worked on resumable uploads. The catalogue’s data managers are now able to upload very large files to the Google Cloud through CKAN, pause the upload process and return later to resume it.
The client needed a vendor not just with general software development skills, but a deep, specialist expertise in CKAN’s architecture and extension development to successfully adapt the custom code. As a company with more than a decade of CKAN experience and a co-steward of the software, Link Digital showcased its deep knowledge of the software by successfully reverse-engineering and upgrading a complex integration they did not originally build, within the assigned schedule.
The project is a textbook case study in the adaptability and flexibility of open source software. The fact that the client was using CKAN meant they were not locked into using a single vendor but were able to procure specialised expertise from the open market to support the customisation and evolution of their system, even though the original developers are not involved. The project was successfully delivered on time and on budget, ensuring the continuity and modernisation of a critical data platform, leaving the door open for future collaboration between Link Digital and the client.
The client’s catalogue is now on a modern, secure, and supported version of CKAN, enabling it to undertake its important work and well-positioning it for future evolution. The catalogue is now fully interoperable with global data standards such as DCAT, SDMX, WMS, and ISO19115.
The project showcases CKAN’s vital role as a Digital Public Good (DPG). While CKAN has long been recognised as a powerful open-source data management system, that can facilitate the distribution of data across various sectors, in June 2023, the Digital Public Goods Alliance, a multi-stakeholder UN-endorsed initiative, added CKAN to its official DPG Registry. This status is granted to open-source solutions that adhere to the DPG Standard, which includes privacy best practices, doing no harm, and contributing to the UN’s Sustainable Development Goals (SDGs). The recognition of CKAN as a DPG is also associated with its use by multilateral UN organisations, as well as by philanthropic organisations and grassroots civil society movements.
Link Digital is ready to start a conversation with you and your team about all your open data requirements. Contact us and tell us about your project.