Who migrated IDB's data portal to CKAN?

Link Digital migrated IDB's proprietary open data catalogue to CKAN, handling 12 million row datasets with multilingual support across English, Spanish, Portuguese, and French, achieving full DCAT compliance.

Why did the Inter-American Development Bank (IDB) choose CKAN for its open data portal?

The IDB chose CKAN to transition from a proprietary system to an open-source solution that offers greater flexibility, scalability, and community support. This allows the organisation to share economic and social development data across Latin America and the Caribbean more effectively.

What were the main technical challenges in this CKAN migration?

The project required migrating a massive volume of data under a tight 6-month deadline. Key technical hurdles included modifying CKAN’s core code to handle exceptionally large datasets and integrating the platform with AWS S3 for scalable storage and PoolParty for semantic interoperability.

How does the IDB open data portal support multiple languages?

Link Digital customized the CKAN instance to provide full multilingual functionality. The portal is fully accessible and searchable in the Bank’s four official languages: English, Spanish, Portuguese, and French.

How was search discoverability improved for the IDB datasets?

Discoverability was enhanced by integrating the PoolParty semantic suite. This implemented a controlled vocabulary and semantic tagging system, ensuring that datasets are consistently labeled and easier for users to find through advanced search filters.

Case study: CKAN-powered open data catalogue for the Inter-American Development Bank → Link Digital

Key summary

Impact: IDB Global Data Transformation

Strategic ROI: Migrated from high-cost proprietary systems to scalable, open-source CKAN, eliminating vendor lock-in.
Unmatched performance: Achieved a 26x increase in processing speed for datasets containing over 12 million rows.
Regional reach: Fully deployed in 4 official languages, enabling data accessibility for 48 member countries.
Regulatory readiness: Guaranteed 100% DCAT compliance and met WCAG 2.1 accessibility standards for global interoperability.
Innovation: Link Digital’s contributions to CKAN core code future-proof the IDB’s roadmap

Organisation

Inter-American Development Bank

Context of the project

The Inter-American Development Bank (IDB) partnered with Link Digital to revamp its existing open data catalogue, considerably extending its functionality to bring it into line to meet the expectations of modern data users and the challenges of the rapidly evolving open data ecosystem. A complex project successfully completed within a very tight deadline, the new open data catalogue provides advanced, multilingual search and sharing capabilities for the organisation’s open datasets.

Technological services provided

The project involved the following technology services:

Managed hosting.
Transition of the IDB’s existing catalogue from its proprietary software to the open source CKAN – Comprehensive Knowledge Archive Network.
Customisation of the CKAN instance with the IDB’s internal workflow systems through the installation of a range of CKAN extensions.
Integration of CKAN with third party software and cloud services, including PoolParty sematic suite for the implementation of a controlled vocabulary to ensure datasets are consistently labelled, more discoverable, and semantically interoperable, and Amazon Web Services S3 bucket for a scalable storage solution.
Significant hands-on work readying the IDB’s data, particularly some of their very large datasets, to be loaded into CKAN and enabling them to be easily searchable, viewable and downloadable. This included changes to CKAN’s core code to enable it to deal with datasets significantly larger in size than those the software usually handles.
Modifications to CKAN to expand its language functionality and enable all data to be displayed in the Banks’ four official languages: Spanish, English, Portuguese, and French.

A screenshot of the IDB open data portal built by Link Digital — *Screenshot of the IDB Open Data Portal*

Timeline and deployment

Link Digital began work on the site in October 2024 and the new catalogue went live in April 2025. The tight project turnaround was necessary because the IDB contract with the proprietary software provider managing the catalogue was ending. Link Digital is now working with the IDB on phase II of the project and providing ongoing managed hosting for the new catalogue.

Link to the site

https://data.iadb.org

About IDB

Founded in 1959, the IDB is an international development finance institution headquartered in Washington, D.C. It is the largest provider of development financing for Latin America and the Caribbean. With 48 member countries—including 26 borrowing members from the region and 22 non-borrowing members from Asia, Europe, and North America—the IDB is dedicated to improving lives by providing financial and technical support to national and sub-national governments and institutions. The Bank also delivers policy advice, cutting-edge research, and training. Together with IDB Invest, which supports the private sector, and IDB Lab, the innovation laboratory that tests and scales entrepreneurial solutions for inclusion and sustainability, the IDB Group mobilises capital, knowledge, and partnerships to address the region’s most pressing challenges and promote inclusive, sustainable development.

Objectives of the data catalogue

The primary objective of the new open data catalogue was to maximise the IDB’s value as a knowledge bank by generating and disseminating research and insights that help foster development in addition to increasing access to key IDB datasets and improving the sustainability, searchability and interoperability of IDB data. The project updates a previous open data catalogue, built in 2015. A major refresh and reimagining were desired to bring the catalogue in-line with modern standards and user expectations in order to meet the challenges arising from both a rapidly evolving open data ecosystem and increasing data needs from governments and institutions.

Other key requirements identified by the IDB included:

A stronger and more robust data storage and retrieval system, including increased capacity to deal with some of Bank’s very large datasets.
More structured metadata and categorisation, to improve data discoverability and standardisation.
Increased language support. The previous catalogue was available in English and Spanish, but IDB required to expand this to the Bank’s other two official languages – Portuguese and French.
Optimised API architecture to improve scalability and support for larger datasets.

Data catalogue migration

Another major driver of the project was the strong desire to migrate the data catalogue from a proprietary to an open-source platform. The previous catalogue was built on a proprietary, closed-source system that limited scalability and customisation capabilities, unlike the new open-source solution. The conclusion of the contract with this provider offered the opportunity to make major changes.

The IDB’s open data team undertook an extensive process to scope out what was required in the new catalogue. This included interviewing external and internal stakeholders, a comparison of data catalogue technology solutions used by other organisations, including their look and what features they offered, and an analysis of the various open source software technologies on offer.

CKAN emerged as the option that most aligned with the IDB’s vision. The IDB’s open data team were particularly attracted by CKAN’s wide range of extensions and easy customisation, which gave them the ability to do some of the work on the catalogue themselves. It also gave the IDB flexibility to make further changes and scale the catalogue up to meet the emerging demands of data users and/or shifts in the Bank’s situation. The IDB’s open data team was also energised by the prospect of contributing to the CKAN software’s growing international community and has even included CKAN in its open source software catalogue.

The result is an attractive, high-impact open data catalogue designed to enhance the discoverability and easy reusability of IDB datasets. It achieves this by leveraging a scalable architecture and adhering to best-practice metadata standards to ensure international interoperability with data harvesters.

Data catalogue specifics

The catalogue currently hosts over 200 datasets. Much of this data focuses on economic and social development, including detailed information on macroeconomic and social indicators for the Latin American and Caribbean region. Most of the data is tabular, provided in formats such as CSV, Excel spreadsheets (.xls, .xlsx), Stata files (.dta), and R data files (.rds, .RData). There are also numerous PDF documents. The data includes several extremely large datasets, which presented significant challenges for the CKAN software (discussed later).

In terms of overall presentation, the new catalogue has been configured into four categories.

Datasets: Raw data, macroeconomic datasets, indicators and surveys, etc.
Datasets linked to publications: Datasets that directly support IDB research publications, provided to facilitate the validation and replication of findings.
Data portals: Dashboards and interactive stories created by the IDB to highlight trends and changes in key indicators across Latin America and the Caribbean.
Indicators: These resources are linked to the datasets and were created in response to user feedback on the previous data catalogue. Users previously had to download entire datasets and search through them to locate specific indicators. The indicators make data more discoverable by enabling users to go into the catalogue and search for a specific term or topic they are interested in, i.e., ‘unemployment’ and easily access all available data from different datasets on it.

Objective	Solution	Outcome
Scalability	CKAN core optimisation	26x Faster API Response
Interoperability	DCAT Metadata Mapping	Global Data Harvesting
Usability	PoolParty Integration	Reduced Data Discovery Time
Inclusivity	Multilingual Logic Core	Expanded Reach in LATAM/Caribbean

Innovative features

CKAN operates as the open data catalogue’s spine, providing advanced metadata support and effective cataloguing and searching. Link Digital worked closely with the IDB open data team on several features, which all contribute to bringing the catalogue in line with global best practice.

Full DCAT compliance

The catalogue is fully Data Catalog Vocabulary (DCAT) compliant.

DCAT is an RDF vocabulary designed for describing datasets to enhance interoperability between systems that manage and expose data, such as data portals and data catalogues. It provides a standardised metadata model and vocabulary for data publishers to describe datasets and data services, enabling these systems to better exchange and integrate metadata, thereby making data more readable by a wider range of applications and, thus, more discoverable and reusable. While uptake is by no means universal, DCAT is arguably the closest thing that currently exists to a worldwide metadata standard for describing datasets in data portals and catalogues, and it is widely used by many open data portals and catalogues globally, including extensively in Latin America.

The IDB’s open data team gave Link Digital an application profile based on the DCAT standard to describe their data. Link Digital then designed a data schema to accommodate this and mapped it onto CKAN’s existing internal schema, using ckanext-DCAT. This extension allows CKAN to export and publish metadata in DCAT format and enables CKAN-based data catalogues to be integrated with other data catalogues or platforms that use DCAT metadata.

Controlled vocabulary

A controlled vocabulary is a carefully selected and standardised list of terms used to consistently tag, describe, and index datasets via metadata for improved discoverability and retrieval. The IDB’s open data team uses PoolParty, a non-open-source semantic suite, to manage the controlled vocabularies they use to categorise datasets within the catalogue, which Link Digital integrated into the CKAN application.

The IDB open data team were keen to introduce a predetermined list of metadata terms that those uploading data to the catalogue can use. This reduces possible confusion caused by unstructured language and variant and incorrect spellings of things like social and statistical terms, countries, etc, by ensuring that only the preferred term can be used and that it corresponds to a single subject. Having standardised themes also enables users to search for and retrieve the dataset(s) they need more easily.

Automating Digital Object Identifier registration

A Digital Object Identifier (DOI) is a unique combination of letters and numbers used to provide a permanent link to the location of an object, such as a document, image, dataset, or research paper, on the Internet, even if its URL changes.

Using an extension called ckanext-doi, Link Digital was able to seamlessly migrate legacy DOIs to the new CKAN powered catalogue and connect it to the registration agency DataCite to make the process of minting new DOIs automatic. The fact that users will always have access to the catalogue’s resources encourages openness and keeps the site aligned with the IDB’s policy of promoting persistent access to knowledge resources. The use of DataCite also makes datasets on IDB’s catalogue findable in other repositories, for example DataCite Commons, increasing their discoverability.

Multilingual functionality

The default language of the old open data catalogue was English, with some Spanish language functionality. The IDB were keen to expand this to cover all four official Bank languages: English, Spanish, French and Portuguese. While basic language support is a standard out of the box feature of CKAN, the software’s default ability in this area was limited. It did not support complete translations and did not allow users to toggle between multiple languages in the one CKAN instance.

The IDB’s open data team undertook the time consuming task of translating everything on the catalogue – not just site headings and language and metadata but all the datasets as well – and Link Digital extended CKAN’s core functionality to make all the content on the site accessible in four languages and enable users to toggle between them. This removes a major barrier to data discoverability and reuse by significantly expanding the accessibility of the site to non-English speakers in Latin America and the Caribbean.

Improved auditing functionality

This is the first catalogue to use Link Digital’s new auditing functionality. Basically, every event in the catalogue is now being sent to the Amazon CloudWatch, providing Link Digital with a log of everything that occurs, which will improve maintenance and make it easier to maintain security.

Technical challenges

Meeting the four-month deadline for the catalogue’s completion – necessitated by the approaching end of contract with an existing software provider – was made possible by the considerable preparation undertaken by the IDB data team before Link Digital commenced work. The IDB open data team created approximately one thousand new indicators for the catalogue and did a great deal of work cleaning and restructuring the data, including aligning it for compliance with the DCAT standard.

Nonetheless, the catalogue was a complex project that required considerable hands-on work and out of the box thinking on Link Digital’s part. These challenges primarily related to the size of some of the IDB’s datasets, which are far larger than those typically hosted on CKAN catalogues and open data portals.

Million rows of data

For example, the IDB’s Social Indicators of Latin America and the Caribbean dataset alone comprises approximately 12 million rows of data. It was very much a hands-on process to optimise code and customise indexes so that the performance of the catalogue when working with very large datasets was more than twenty times faster than the default.

Accessing these large datasets using the existing CKAN architecture was challenging, as the ckanext-xloader extension normally used to load them was failing. Beyond that, just having some of these datasets in the catalogue was going to present challenges for querying. The number of rows and the size of the data meant that the types of queries users were going to run could take tens of seconds to run or they would not work at all because the API would just time out. Fixing these problems required Link Digital to improve the catalogue’s API and the performance of CKAN for previewing large datasets more generally.

CKAN’s Feature: New Table Designer

Link Digital was required to do other hands-on work with the IDB’s data to get the data loaded into the CKAN site. Link Digital used the new Table Designer feature for these very large datasets. This allowed the IDB to push partial updates for some of the large datasets rather than reloading the entire file, by giving them an API that allowed them to only update records that needed to be changed and added. And by maintaining that table in the database the way that Table Designer does, it allowed Link Digital to add custom indexes for the particularly large datasets.

Indexes are a feature of Postgres that lets developers create a fast way to look up a record based on some of the filters that they have provided. Link Digital looked at the queries the IDB needed and chose combinations of columns to index for each of these datasets that has these queries run far faster. Loading the data was something that Link Digital had to help with initially because there was so much to load, but now the IDB open data team have procedures on their side to do it using the API. But creating the indexes was very much a hands-on process that Link Digital had to do directly to the Postgres DataStore database, tweaking it so that the performance would be much better than had the IDB used the default procedure.

Pages on the IDB’s Open Data catalogue now load almost immediately and API calls that are part of the page when you are getting information about the dataset, which were extremely slow, have been fixed as well so they are faster. This work necessitated several changes to CKAN’s core code to optimise how the software handles very large files. These optimisations will be included in the next CKAN update, release 2.12, so that all the software’s users can benefit from them.

A smaller challenge related to the IDB’s open data team wanting the functionality to prepare a whole zip file of a dataset for users to download. This could have led to potential misuse of the catalogue if several users with malicious intent opened the hit button to create a zip file at the same time, resulting in a spike in load that could have serious problems for the back end of the catalogue. To get around this, Link Digital pre-built zip files in advance and uploaded these to the Amazon cloud service for the IDB open data team to manage the download. If there is no change to the dataset once it is compiled and uploaded to the catalogue, the pre-built zip file remains on the site and can be easily downloaded.

Data migration from the old open data catalogue was also tricky and took a considerable amount of time because of the number of datasets.

The benefits of the IDB portal

Despite being relatively new, the catalogue is already showing positive impact, with several Latin American governments expressing interest in the platform and exploring ways to make use of it. The IDB’s open data team has also found the workflow process involved in publishing new datasets to the catalogue is easier and has far less friction.

The introduction of controlled vocabulary is proving particularly helpful. The publishing process will continue to improve as the IDB data team finalises work on their new internal publishing workflow.

The CKAN community also benefits and here are some of the core CKAN contributions that came directly from work on IDB open data:

https://github.com/ckan/ckan/pull/8603: makes the table preview of some of the largest datasets on IDB open data load 26x faster
https://github.com/ckan/ckan/pull/8597: makes the table previews an additional 3x faster
https://github.com/ckan/ckan/pull/8590: makes a common API call 24x faster for some of the largest datasets on the IDB open data catalogue, including when users view the dataset page.

Future plans

Link Digital is already working with IDB on phase 2 of the project to continue expanding the benefits of the solution. This includes:

Further improving API response times.

Search engine optimisation for mobile devices.
Work to enhance the catalogue’s user interface. The user interface design will continue to evolve based on user feedback collected since the catalogue went live. This includes the addition of some pages in HTMX. A new feature in CKAN, HTMX is an open-source front-end JavaScript library that allows developers to create a faster and more interactive UI experience.

The implementation of additional accessibility features to bring the catalogue fully into line with Web Content Accessibility Guidelines (WCAG) 2.1.

Case study: CKAN-powered open data catalogue for the Inter-American Development Bank

Key summary

Impact: IDB Global Data Transformation

Organisation

Context of the project

Technological services provided

Timeline and deployment

About IDB

Objectives of the data catalogue

Other key requirements identified by the IDB included:

Data catalogue migration

Data catalogue specifics

Innovative features

Full DCAT compliance

Controlled vocabulary

Automating Digital Object Identifier registration

Multilingual functionality

Improved auditing functionality

Technical challenges

Million rows of data

CKAN’s Feature: New Table Designer

The benefits of the IDB portal

Future plans

Details

Shares

We've encountered an error