CKAN Advanced Query Filter

This new filtering capability – is what helps power the new pagination, but it also unlocks a new world of possibilities for search and data discovery. You'll be able to build complex, sophisticated queries directly. This feature allows for: Range queries: Find all data where 'age' is greater than 24 or 'year' is between 2010 and 2019. Complex logic: Easily combine filters with nested AND / OR conditions for truly granular results. A unified language: A common, powerful way to filter data across CKAN, making it more predictable and developer friendly.

Known CKAN experts globally

Link Digital is the world's most trusted CKAN expert. They serve as co-stewards for the project. The majority of today's CKAN update was made possible by Link Digital's CKAN tech team. Get in touch today.

Fifteen times faster: CKAN 2.12 to revolutionise large dataset downloads

This article is co-authored by Ian Ward, Senior Solutions Architect/CKAN Expert at Link Digital.

The 30-minute nightmare
A 15x leap forward
Advanced query filters
What this means for CKAN users

Something major is happening for CKAN users. If you’ve ever tried to download a massive dataset from the Datastore, you know the frustration: long waits, spinning wheels, and dreaded timeouts.

This is why Link Digital is so excited to announce a milestone update coming in CKAN 2.12 that will deliver a 15x performance boost to large data downloads.

CKAN 2.12: Now 15x faster for million dataset downloads

The 30-minute nightmare

Let’s talk about a real-world example: a social indicators dataset with 13 million records. This was one of the datasets involved in a recent project, upgrading the Inter-American Development Bank’s (IDB) open data catalogue, including migrating the platform from proprietary software to a CKAN-powered solution.

On a standard CKAN instance, attempting to download this file via the Datastore dump could take 30 minutes or more. And that’s if it didn’t fail with a timeout first.

This wasn’t just a minor bug; it was a fundamental limitation in how CKAN retrieved data that was identified by one of our developers, Yan Rudenko. The problem was a method called offset pagination. This method, in simple terms, told the database to skip the first X million rows and give me the next batch. As the download progressed, the number of rows to skip grew larger and larger, forcing the database to do more and more work for each new page. This process becomes exponentially slower and grinds the server to a halt on massive exports.

A 15x leap forward

That 30-minute nightmare is now a 2-minute task.

Thanks to a groundbreaking new implementation, that same 13-million-record dataset now downloads in just two minutes. This 15x improvement is achieved by replacing the old, inefficient method with keyset pagination.

Instead of slowly counting millions of rows to skip, keyset pagination is much smarter. It uses the database’s indexed ID field to instantly jump to the correct starting point for the next page of data. It’s efficient, it’s scalable, and it means the time to get the last page of data is just as fast as getting the first.

Advanced query filters

This massive performance boost is just the beginning. This work was integrated with an even more powerful feature also coming in CKAN 2.12: the new Advanced Query Filter specification.

This is a game-changer for anyone who interacts with data on CKAN. For the first time, you’ll be able to build complex, sophisticated queries directly. This new feature allows for:

Range queries: Find all data where ‘age’ is greater than 24 or ‘year’ is between 2010 and 2019.
Complex logic: Easily combine filters with nested AND / OR conditions for truly granular results.
A unified language: A common, powerful way to filter data across CKAN, making it more predictable and developer friendly.

This new filtering capability – the result of work by Link Digital’s Senior Solutions Architect Adrià Mercader, is what helps power the new pagination, but it also unlocks a new world of possibilities for search and data discovery.

What this means for CKAN users

This is more than just increasing speed with which CKAN downloads data; it’s about reliability and power. For data publishers and users, this means:

No more timeouts: Large dataset downloads via the Datastore dump will just work.
Faster access to data: Users will have the data they need, when they need it, without the frustrating wait.
Deeper, smarter search: The new filters will unlock new ways to discover and analyse data.

These powerful features have been brought together by Link Digital’s Senior Solutions Architect Ian Ward and are slated to be part of the upcoming CKAN 2.12 release.

Get ready for a faster, smarter, and more reliable CKAN.

Get in touch

Want to talk about how CKAN can meet your data needs? Get in contact and tell us about your project, and one of our data experts will be in touch.

Fifteen times faster: CKAN 2.12 to revolutionise large dataset downloads

The 30-minute nightmare

A 15x leap forward

Advanced query filters

What this means for CKAN users

Recent Projects

Helping the design department of a major European University improve internal data discoverability

Columbia Basin Water Hub – Living Lakes Canada

Making economic and social development data from Latin America and the Caribbean more accessible: building a CKAN-powered open data catalogue for the Inter-American Development Bank

Future-proofing an enterprise-level data catalogue that shares information on food security

Shares

We've encountered an error