CKAN users can now directly ask questions instead of manually filtering datasets. This is a new CKAN extension created by Link Digital.

CKAN extension: Ask AI (Knowledge Assistant)

Ask AI is a conversational interface layer for CKAN data portals. Instead of requiring users to manually filter datasets, download CSV files, or use complex SQL queries.

The modern data challenge

Open data portals have matured significantly over the past decade. Many organisations now operate CKAN – Comprehensive Knowledge Archive Network – instances containing hundreds, if not thousands of datasets. Metadata is structured, application programming interfaces are exposed, and search is functional.

Yet a persistent friction remains: discovery and interpretation.

Users still struggle to locate the right dataset when they don’t know the exact terminology. They manually inspect metadata. They download files to understand the structure. In practice, access exists, but usability often lags far behind.

Ask AI, a new Link Digital feature, introduces an AI-powered layer on top of CKAN to address that gap. The goal is straightforward. To make data portals conversational, contextual, and semantically searchable.

Moving beyond keyword search

Traditional CKAN search matches keywords in titles, descriptions, and other metadata fields. This works well when users know the exact phrasing used by publishers. But real-world users often search differently. A person looking for ‘police reports’ might miss a dataset labelled ‘crime statistics’. Someone interested in ‘ocean pollution’ may never type ‘marine ecosystem degradation’.

Ask AI introduces semantic similarity search. Instead of matching words, it matches meaning. Behind the scenes, dataset and resource metadata – along with supported document content – are converted into vector embeddings and stored in PostgreSQL using an extension called pgvector. When a user performs a search, the system retrieves results based on semantic proximity rather than simple text overlap.

The effect is immediate:

Broader recall without sacrificing relevance
Discovery across inconsistent terminology
Reduced dependence on perfect metadata alignment

For organisations managing large portals, this materially improves dataset discoverability without requiring publishers to change workflows.

semantic search on datasets on CKAN — Semantic search

‘Ask AI’ at the dataset level

Each dataset page includes an ‘Ask AI’ interface. Instead of manually parsing descriptions, formats, or publisher information, users can ask direct questions:

What is this dataset about?
Who published it and when?
What formats are available?

Dataset metadata is indexed and embedded in the background whenever a dataset is created or updated. This indexing process runs asynchronously to avoid impacting user performance. Updates are synchronised as needed to ensure AI responses remain aligned with current portal content.

The Ask AI assistant uses structured prompt templates to generate consistent, context-aware responses. It does not operate blindly but receives specific metadata context relevant to the dataset being viewed.

CKAN extension: Ask AI (Knowledge Assistant) by Link Digital

Document-level intelligence

Many portals host more than structured metadata. Reports, PDFs, spreadsheets, and presentations often contain critical context. Ask AI extends indexing beyond metadata to selected resource file types:

PDF
DOCX
TXT
Markdown
HTML
CSV
XLSX / XLS
PPTX

Supported documents are processed and embedded into the vector store. Users can then query the content directly, including asking

To summarise a report
What are the key findings?
Is there any mention of a specific topic?

Instead of downloading a 100-page PDF and searching it manually, users receive contextual responses grounded in the document itself. For organisations publishing analytical reports or policy documents, this significantly improves accessibility.

Querying tabular data without writing SQL

One of the most technically impactful features is the integration with CKAN’s DataStore extension. When users ask analytical questions about tabular data, Ask AI can generate SQL queries through the DataStore API. Retrieved records are then provided as structured context to the language model to produce a response.

For example, a simple analytical request may trigger a generated query such as:

SELECT fields
FROM resource_table
LIMIT N

Users do not see or write SQL. The system constructs queries dynamically based on schema awareness.

This lowers the barrier for non-technical users while preserving the integrity of the DataStore. It transforms the portal from a static repository into an interactive analytical surface. In addition, the assistant does not rely solely on live querying for every interaction. For DataStore-backed resources, we precompute lightweight statistical summaries, such as column types, value ranges, distinct counts, and basic aggregates. These summaries are stored and used as contextual hints during prompt construction.

This serves two purposes. First, it improves performance by reducing the need for exploratory SQL queries when the model only requires structural understanding. Second, it guides the model toward generating more accurate and efficient SQL, since it already has awareness of the dataset’s shape and characteristics.

Architecture overview

The architecture is intentionally conservative, and infrastructure aligned.

Embeddings are stored in PostgreSQL using pgvector.
Indexing runs in background jobs and synchronises on updates.
No specialised caching layer is introduced; the system relies on model-level caching where applicable.
Prompt templates ensure structured and repeatable interactions.
Access is restricted to registered users for agent-based queries.

This design minimises operational complexity while remaining scalable within existing CKAN deployments. Organisations do not need to adopt an entirely new stack. The assistant integrates directly into the CKAN ecosystem.

Different organisations have different risk profiles and infrastructure constraints. Ask AI supports multiple backend options:

Ollama (local models) for privacy-sensitive, on-premises deployments
OpenAI for high-quality cloud-based responses

CKAN extension: Ask AI — Ask AI settings

Configuration is handled via the CKAN configuration file, and several runtime parameters can be adjusted through the Admin Panel UI:

Model temperature
Request timeouts
Semantic search top-k values
Query expansion toggles
DataStore result limits
Context window settings (for Ollama)

This allows administrators to tune performance, cost, and response behaviour without redeploying the system.

Security and permission awareness

Security alignment with CKAN is fundamental. Ask AI respects CKAN’s native permission model. Private datasets remain inaccessible to unauthorised users. Permission labels are applied during filtering to ensure that search and AI responses only include datasets visible to the querying user.

If a user does not have access to a dataset, it is neither retrieved nor referenced in responses. When deployed with local AI models, data remains entirely within organisational infrastructure. For institutions with strict data residency requirements, this is a critical capability.

From Portal to Intelligent Interface

The transformation can be summarised simply:

Before:

Keyword-only search
Manual inspection of metadata
SQL required for deeper analysis

After:

Semantic similarity search
Conversational dataset exploration
LLM-generated DataStore queries

This shift repositions CKAN from a repository to an intelligent interface layer over public sector knowledge assets.

A practical step towards AI-enabled open data

Artificial intelligence is often discussed abstractly in the open data ecosystem. Ask AI takes a pragmatic approach. It does not attempt to reinvent CKAN or introduce excessive architectural overheads. It focuses on usability, discovery, and interaction.

For organisations managing large data portals, the question is no longer whether AI will influence user expectations. It already has. The question is whether your portal will remain a searchable archive, or evolve into a conversational, intelligent gateway to public data.

Ask AI represents one practical path forward.

Sign up for a demo

Sign up for a demo, and we can talk about your project and how Ask AI can improve your users’ interactions with data . Get in touch to discuss how we can help.

Ask AI: Link Digital’s AI layer for smarter CKAN portals

The modern data challenge

Moving beyond keyword search

‘Ask AI’ at the dataset level

Document-level intelligence

Querying tabular data without writing SQL

Architecture overview

Security and permission awareness

From Portal to Intelligent Interface

A practical step towards AI-enabled open data

Sign up for a demo

Recent Projects

CKAN-powered data portal delivers secure, efficient climate projections for NSW Government

Case study: CKAN implementation for Norwegian Public Roads Administration (NPRA)

Case study: CKAN-powered open data catalogue for the Inter-American Development Bank

Case study: improved internal data discoverability to help the design department of a major European University

Shares

We've encountered an error