One topic that we have touched on throughout the CKAN Insider series is how AI is changing the way people interact with data. For CKAN Insider #5, I talk to one member of the Link Digital team who is directly working on this issue, Oleksandr Cherniavksy. Oleksandr is a Full Stack Developer at Link Digital and one of our CKAN experts. He has also been a key player in the design of our new Ask AI feature – an AI-powered layer on top of CKAN that lets users query CKAN data portals using natural language.
In the final instalment of the CKAN Insider series, Oleksandr discusses what is broken about the discovery experience on many data portals, and how our new Ask AI feature can help fix the situation.
You shouldn’t have to be a data expert to use a portal
While open data portals have matured significantly over the past decade, with many organisations now operating CKAN instances containing hundreds, if not thousands of datasets, Oleksandr maintains something is broken about the discovery experience on many of them. Specifically, the friction users experience when searching through large volumes of datasets without knowing specific keywords.
“Most data portals are built for people who already know what they are looking for. If you are visiting a data portal, and it has for example one thousand data sets, or ten thousand data sets and you don’t know exactly what you are looking for, you have a few key words, but you can’t find the text that represents your key word. You [also] tried a search, and it didn’t return any results. So, this is sometimes a bit of a lottery, when you try to find something.”
This situation is especially true for a regular person who is trying to find data but does not know anything about the specifics of the data portal they are on. There is also an issue with browsing. “If you are looking for something inside thousands of data sets, browsing won’t help you. You can’t do it.”
The limitations of manual scrolling and exact keyword matching motivated Link Digital to create its new ‘Ask Ai’ feature. It understands not key words but semantic similarity in searching and marks the words that are somehow similar in a search, in such a way as to interpret user intent and group related concepts like ‘auto’ and ‘car’.
How Ask AI works
Ask AI introduces an AI-powered layer on top of CKAN to address this gap. The goal is straightforward: to make data portals conversational, contextual, and semantically searchable.
“Imagine that you have a dataset. We read it carefully and programmatically and transform it into some kind of unique fingerprint that this dataset has. And when a user searches, we do the same thing with the user’s search query. We transform it into a fingerprint and then compare those two fingerprints and say that this query search looks like this fingerprint of a data set, and then we have a result.”
If you query “Ask AI “what is this data set about or can you find me datasets related to this topic, the application will do exactly the same. It will just take the query, hook into the storage that holds all the information and retrieve the results.”
Oleksandr also emphasises that Ask AI:
- Relies on local hardware, such as the GPU and CPU, rather than cloud-based large language models.
- Can assist in filling out or improving metadata for datasets.
- Can interact with different file formats. For PDF resources, the tool provides a summary of the content, including information on individual fields and imagery. For CSV files, the feature provides data set summaries based on available metadata and allows for table previews (although how comprehensive the responses are, depends on the metadata provided for the dataset in question).
- Has the capacity to build charts and graphs based on real data requests. Users have the option to download these generated charts as images or export the underlying chart data in CSV format.
The importance of data security
Ask AI also addresses another major concern for many organisations and government agencies, data sovereignty.

Ask AI uses local models such as Ollama and stores vector data within a PostgreSQL database. “This is very important for data sovereignty, when you understand that all your information stays inside your local network and, for example, the vector database, also lives in your environment because we are using an extension for the Postgres database. When you create so-called fingerprints you use a local small language model, and you store it in the local database environment.”

“We can plug in different providers. We have implemented the support only for Open AI now, but we want to look at the client’s response and what they need, but it is totally doable for different solutions like Anthropic.”
Ask AI also respects CKAN’s existing permission labels to ensure that users only access data they are authorised to view.
“When you, for example, ask something about the datasets [or] searching datasets, you are using CKAN permission labels. Each dataset has its permission labels. If , for example, it is a public dataset, you have a permission label called public. If it is a dataset that is not public but private, it will have a permission label that is private and that says that only, for example members of a certain organisation can use it. And when users ask a question or search something, we calculate the permission label for the specific user. So, even here, we are not showing any results that users shouldn’t see.”
“If a user does not have access to a dataset, it is neither retrieved nor referenced in responses. When deployed with local AI models, data remains entirely within organisational infrastructure. For institutions with strict data residency requirements, this is a critical capability.”
You can watch the full interview with Oleksandr, which includes his demo of the Ask AI feature here.
If you’d like to learn more about how we can support your data portal to better integrate with modern AI technologies, message Link Digital here.