Image of CKAN Insider with Andrew Nette and Ian Ward - Link Digital

For over a decade, Link Digital has been using the Comprehensive KnowledgeArchive Network (CKAN) to build data infrastructure globally. Since 2019, we have been a co-steward of the CKAN software. But what exactly does it mean to be a co-steward? And when organisations evaluate a vendor like Link Digital as a potential partner in their data operations, does it matter whether the vendor contributes back to CKAN’s core software?

One person who can answer these and other questions is Link Digital’s Senior Solutions Architect, Ian Ward. Ian is also part of the CKAN Tech Team, which is responsible for the technical vision and the core software.

As the CKAN community gears up to celebrate the software’s 20th anniversary, I spoke to Ian about what co-stewardship means, the day-to-day work that takes place to maintain CKAN’s core functions, and why contributing back to the software is not just good for the CKAN community but good business sense.

Learning to appreciate open source

While Ian has been involved with CKAN for the last decade, he started to appreciate the advantages of open source over 25 years ago. For him, open source technology gives users control over what’s running on their computer. “You don’t always have to get permission for everything or pay for everything necessarily and you can actually have freedom to build what you want.”

Ian’s involvement in open source software took a major leap when the opportunity came up to work on the federal Canadian government’s national data portal, Open Canada, which was built using CKAN and launched in early 2011. “To work on an open data platform that was also built on open-source technology… was exciting due to the ability to collaborate with other governments and technology professionals doing the same work.”

Working ‘under-the-hood’ of CKAN

Much of Ian’s CKAN work focuses on the ‘under-the-hood’ architecture and developer experience of CKAN.

“I’m very interested in how all this stuff works and making improvements that I find interesting. The cool and exciting things that people are doing with CKAN often are about integrating [it] into big existing systems that are publishing datasets or consuming datasets, that are creating visualisations, that are tying it into larger processes or workflows.”

This often means that in addition to working on exciting integrations related to visualisations and large system workflows, Ian focuses on making smaller changes to the core CKAN software to prevent limitations that could hinder developers and to maximise the software experience for all users. “Some of the stuff that I do might seem very boring or not super flashy. But it’s all about trying to enable people that are doing those more exciting things so that they can use CKAN to do that.”

CKAN co-stewardship in practice

What exactly does it mean to be a CKAN co-steward?

“For me… I’m helping review pull requests. I’m helping to submit changes that I think would be better for CKAN in the future to unblock functionality that maybe hasn’t been as easy to do in the past. I’ll be providing feedback as part of reviewing those pull requests and regularly attending the developer meetings and trying to make a space that’s inviting for new developers to come and join and to have a place where people can have a conversation about some of the changes that are being made and influence decisions as we’re making them.”

Co-stewardship also means contributing a lot of important fixes to the CKAN core to improve the software’s performance. These fixes often originate from work being done for Link Digital clients. One example is the performance enhancements Ian assisted with for handling very large data store tables, which you can read about in more detail on our site here.

“Some of the limitations that we’ve run into for Link’s clients are in terms of the performance when working with very, very large DataStore tables. And this is something that we’ve been working on to make it possible to have millions and millions of records in the datastore and still have queries running quickly [and] still have the data being possible to update quickly.”

CKAN co stewardship activities by Link Digital

The in-built flexibility of CKAN

Ian believes that one of CKAN’s major strengths is its in-built flexibility.

“It’s very good at integrating with other data systems. And really everything that any business, any government is doing is powered more and more by data. So, having a sensible way to track that data, view changes on it, store things that you need like provenance information or licensing information or specific metadata fields that are customised for your organisation, all that stuff is key.”

And while open data portals are what brought CKAN to the fore of data management, its wider applicability is huge. This includes tools for internal data organisation.

“The fact is that you can customise every bit of the workflow, the way everything is connected, you can integrate it with any existing system very easily. The documentation is very good, especially for an open-source project. So, when you’re getting new people coming on or trying to build extensions to it, the learning curve is smooth.”

CKAN, beyond data portals. how it tracks data

Contributing back to CKAN’s core

But when an organisation evaluates a vendor as a CKAN partner, is it important whether that potential partner contributes back to CKAN’s core?

In this respect, while the argument against secrecy and unnecessary competition in terms of software development has been largely accepted in the government space, Ian believes it also makes good sense in other areas, including the private sector. By sharing the solutions created with open source tools, he argues, “we all become better at our jobs. It doesn’t become a disadvantage.”

“By working on this stuff together, we’re able to have a lot more potential ideas, potential eyes on problems. Things can be solved in ways that you might not expect. If we want to do something one way but someone else has got a better idea or they’ve got something that works differently, then there is that competition and it’s done in the open and everybody can benefit from that.”

And, in this regard, Ian believes selecting partners that contribute back to the CKAN core not only demonstrates their understanding of the community’s value and fosters collaboration on shared problems. It ensures that functionality is maintained and allows all parties to benefit from reusable and improved code.

“Things that are built don’t quietly disappear when a customer goes away. They’re maintained in the community. Anybody can pick them up and run with them. And better yet, where there’s opportunity to create reusable functionality, there’s a real benefit to doing that and sharing that work so that the next time we need to build something similar or anybody else needs to build something similar, they can reuse that, potentially improve it.”

Helping guide the transition to CKAN 3.0

Ian is heavily involved in helping to guide the transition of CKAN toward its 3.0 release. Why is this release important and what will it offer?

“3.0 will give us a chance to say, ‘Okay, we’ve looked at the way that we’ve been doing it for 20 years and now we would like to reorganise things, to simplify them, to make it a little bit easier to customise, to make it a little bit more modern.’ And at the same time, we can look at breaking some of the internal assumptions that were made about the way certain data structures are organised because the earlier decisions made sense at the time, but they might be holding us back in terms of flexibility or performance or for some other reasons.”

“The opportunity to step away from some of the things in the past, I think, is a very exciting time for the project. And while it does potentially mean a little bit more work for people to make that upgrade, it’s going to clear the way for us to implement some of the exciting features that people are hoping for in the future.”

Also read: CKAN roadmap – stability, speed, and 3.0

Future challenges for CKAN

“CKAN, like any open-source project, definitely faces the challenge of how we keep the community going and continuing to grow. I would love to see more events or more structure being built up around the project to try to continue to grow the community.”

Another challenge is the rapid rise of AI. “I think that we’re definitely seeing more people wanting the types of tools that they see coming out of major vendors. They want to be able to interact with their data catalogues more like they’re interacting with ChatGPT.”

“When we’re building tools that are based on data, I believe it’s not the best idea to feed the raw data into a LLM (large language model) and ask it to do stuff with the data. So, the biggest opportunities are around having those tools provide code or connect other existing tools that are deterministic that then work with the data. And CKAN, by providing access to the data for those tools, makes it possible to have agents that are doing things in an intelligent way which is usually not having the LLM do the work, but having a custom tool do the specific work that’s required, whether that’s generating queries on data or connecting different systems to solve a particular problem.”

“I see CKAN as something that empowers systems that are going to be built on AI tools and all of this stuff needs data to function and the best way to know which data that you need to use is by having good metadata and that’s exactly what CKAN’s about.”

You can watch the full interview with Ian on our YouTube channel here.