Success with CKAN requires viewing the platform as an evolving framework rather than a one-time product launch.

Link Digital’s Chief Revenue Officer, Scott Lingard, talks to clients and prospects about the Comprehensive Knowledge Archive Network (CKAN) every working day. Each of them is at a different stage of their data journey and has their own challenges when it comes to data infrastructure.

This gives Scott an excellent insight into their key concerns and common misconceptions about CKAN. It also means he has an excellent understanding of the preparation that needs to be undertaken before jumping into the implementation of new data infrastructure and the differences between successful CKAN projects and those that struggle.

In the first of a new series to mark the 20th anniversary of CKAN, CKAN Insider – Link Digital’s Chief Marketing Officer Kate Sampher spoke to Scott about these issues.

Major challenges and concerns

Scott made clear from the outset there’s no such thing as “a typical client”. They’re a broad spectrum of organisations, from large not for profits, to niche research organisations and national level government bodies, working with us to implement, maintain and grow their CKAN instances

One thing most of them have in common is that “they’re at a data crossroads. They’ve outgrown a basic file sharing system or legacy portal that’s become basically a graveyard for all their data and all their information, and they realise they need a full professional ecosystem not just a website.”

Often, they are struggling with new mandates that necessitate them having to be more transparent, or [they] are facing challenges like the growing complexity of their metadata or an increase in the volume of their users. And these things have shown up the technical or other failures of their current data infrastructure.

A comparison of website vs data a portal

“That’s when we really see them come to us, and the types of organisations are varied, but they’re always at that breaking point, and they always need something, and they need it to sort of manage things as they move.”

The top questions that organisations have when they first start evaluating a potential data infrastructure project using open source software such as CKAN are around sustainability and control.

“They’re tired of having been locked in with a big global corporate vendor, and they want to know, if they build on CKAN, do they own it? Can they move it? And who supports it [the new infrastructure] if the original dev team, all of a sudden, leaves?”

Scott’s response to these concerns is to emphasise the benefits of the open source nature of CKAN. The software not only eliminates vendor lock-in by giving users the freedom to modify, upgrade, tailor or combine software solutions, it also guarantees data sovereignty, meaning they will own the code, their data, and their platform.

A comparison of working with proprietary vendors vs open source

This links to another common misconception about CKAN:

“The belief it is a plug-and-play application like Microsoft Word that can solve all problems. CKAN is a framework, like a commercial kitchen providing the industrial stove and prep tables, but the organisation must still decide on the menu and hire the chefs. CKAN is intentionally minimalist to allow for customisation, not because it is missing features.”

CKAN, like many open source software products, is designed to be modular, allowing users to swap one component for another without having to rebuild their entire data infrastructure.

Also read: Planning your data infrastructure? The benefits of open source compared to proprietary software.

Preparation is key

While an organisation cannot figure everything out in advance of starting a major data project, preparation in some key areas is crucial.

Foremost of these, according to Scott, is that they’ve got to understand what their metadata and metadata schema is.

“They need to know what is on their shelves before they start building the shelves.” The consequences if they skip this process is that “they end up with a messy portal that no one can search effectively. And you’re just moving the mess from one room of the house to another effectively, without being able to know what’s on those shelves.”

Overestimating and underestimating needs

Another issue that Scott notices with a lot of organisations putting into place new data infrastructure, is that they overestimate what they will need in terms of customised code. The wide array of existing CKAN extensions can usually handle 90% of their requirements and “they don’t need to do a lot of custom work”.

Conversely, they often underestimate the importance of user experience and governance. This can result in too much data that is not organised well, particularly the metadata. A data portal is useless, notes Scott, if a non-technical researcher cannot find a data set in three clicks. In this respect, “product failure is less about technology and more about the lack of an internal workflow for approving and updating data.”

Matching the correct tech solution to the client’s needs requires honest conversations. “If they just need to host five PDF reports a year, they don’t need CKAN, they can use content management software. If you need a private collaborative sandbox for messy drafts, you need a document management system, not a data portal.”

Also read:Frequently asked questions about end-to-end CKAN services.

The metadata mirror

“I think some of the biggest challenges you see during that transition is what I refer to as the metadata mirror.  Migrating data [infrastructure] acts as a giant spotlight sometimes on years of bad data entry. And you’re not just moving those files… you’re mapping messy legacy metadata into a clean new standard.”

This presents real challenges, and can make the process much longer because once a client is in the midst of the data transfer, they have to work with the developers to go back and fix up all the past errors.

Scott also has advice for any organisation considering or in the middle of planning a change in their data infrastructure to CKAN.

“If there’s something that’s not going to add any value into your portal and make it more complex, don’t migrate it across. Just leave it where it is. And then there’s also the sort of the 80/20 rule, where you identify the 20% of your data that is going to get 80% of the traffic and you move that first, get that perfect, and then you decide if the rest even needs to be migrated or just archived.”

A diagram of data federation with CKAN

Case study of how Link Digital developed a CKAN-powered open data catalogue for the Inter-American Development Bank.

CKAN projects that succeed and those that struggle

What are the early warning signs in a project that predict trouble down the line?

“The projects that struggle are the ones that treat the launch of the portal as the finish line. But the projects that succeed treat that launch of the portal as day one. They know it’s an ongoing thing… and they can add more.”

“The beauty of using something like CKAN is you can keep evolving it. You can keep making it better. It’s not a set and forget. And that’s where I sort of see that project versus product. So, it’s not the finished product on day one that’s the end result. That’s the start and you can start iterating and developing further and further from there.”

Another warning sign, according to Scott, is if a CKAN data portal is owned solely by the tech people or the IT department of the organisation implementing it. The entire organisation – not just its technical people – needs to be in lockstep with the project.

“It [also] needs to be owned by the business and even the policy side of the organisation. So, they know what’s moving across and how the data is collected, how the data is going to be shown, what data is going to be in there, and how it’s going to drive engagement with what is in that portal.”

An evolution, not a big bang

Scott has one piece of advice for those at the very beginning of their seeking journey:

“Don’t let the perfect be the enemy of the good.” Start with a solid core CKAN implementation, “get your best data in there and let user feedback guide your growth. It’s really an evolution. It’s not a big bang.”

Want to discuss your data needs?

Reach out to Link Digital’s sales team here for a no pressure conversation.