Earlier this year, Link Digital announced funding for a new initiative to enhance the core capabilities and reach of CKAN – the Comprehensive Knowledge Archive Network. A major focus of this initiative has involved an extensive rewrite of one of CKAN’s key extensions, ckanext-DCAT. This has enabled easier use, deeper customisation, and greater support for the latest version of DCAT and its various profiles, such as DCAT-AP, the European Union’s DCAT profile, and DCAT-US, the DCAT profile for the United States.
What is DCAT?
But what exactly is DCAT, why are we excited about enhanced CKAN-DCAT compatibility, and why should those working on data catalogs and open data portals be paying attention to it?
DCAT or Data Catalog Vocabulary, is a World Wide Web (W3C) recommendation designed to facilitate interoperability between data published on the web, for example, in open data catalogues or public facing data portals. DCAT uses what is called Resource Description Framework—a standard model for data interchange on the web—to enable a data publisher to describe datasets and data services using a standard model and vocabulary. This makes data catalogues and portals more readable by a wider range of applications in such a way as to make the data on them more discoverable and reusable.
The first official iteration of DCAT, DCAT Version 1, originated in early 2014 out of discussions about how to make data less prone to being siloed and more findable by humans and machines. This led to working on a method to allow data providers to describe their datasets in a standard way that can be understood by different data catalogues and portals in different parts of the world, to facilitate greater interoperability between them. The latest version is DCAT 3, introduced in 2022.
DCAT and the European Union
DCAT has assumed particular importance in the context of the European Union (EU), where there are many nation states with an even larger number of different government public data portals, all of which have their own governance frameworks. The need for public sector data to be searchable and discoverable across these different geographic, administrative and legal jurisdictions makes interoperability between these portals a high priority. The 2020 document Towards a European Strategy on Business to Government Data Sharing for the public interest states the challenge thus: ‘With the continuously growing amount of data, interoperability is increasingly a key issue in exploiting its full value when combining data from different data sources, reusing it for multiple purposes and across sectors.’[1]
While DCAT has been developed as a general standard, it is designed to enable users to build what are known as profiles to make it more applicable to a specific geographic region or to a domain subject.The EU has developed its own profile for European data portals known as DCAT Application Profile or DCAT-AP. It is designed to help standardise how metadata about public sector datasets is published in Europe, with the aim of making it easier for different European data portals to talk and better integrate with each other.
Economic and trade development are key factors driving the EU’s uptake of DCAT, particularly the need to create a single market for data to ensure Europe’s global competitiveness. But creating a set of common standards is also aimed at enhancing the user experience by making data sharing more efficient and streamlining data management. In addition, DCAT is key to what is known as The European Data Strategy 2020 and the creation of Common European Data Spaces in which there will be a more robust attempt by governments to balance the need to make data more easily discoverable and shared, with balancing privacy and security concerns for business and individuals.
DCAT across the rest of the world
Another country to develop its own specific DCAT profile is the United States. This profile, DCAT-US, is used to standardise metadata across the datasets published by the government’s national open data portal, Data.gov. Canada is another country that has adopted DCAT and uses it to structure and standardise metadata across all its data platforms
But while DCAT may be the closest thing that currently exists to a worldwide metadata standard it is by no means universally accepted. Even within the EU there are many public organisations that prefer their own metadata standards to DCAT. Moving away from Europe, the United States and Canada, it is also an unresolved question what standard, if any, will become dominant, as governments in China and India have very different visions for their own digital future.[2]
China has not officially adopted DCAT and appears to be pushing ahead with developing its own profiles and standards for the data management of its open government data (OGD) portals. ‘However, these metadata elements in different OGD portals vary and there is no specific and well-defined provenance description scheme for OGD in China.’[3] India has also not formally adopted DCAT as a national standard, although it does engage in data management practices that have elements of alignment with DCAT.
Australia and DCAT
Australia has what can be described as a very limited uptake of DCAT. The Australian Government’s 2015 Public Data Policy Statement, a foundational document in terms of its directions in the data and digital space, encourages data sharing and accessibility, which broadly aligns it with DCAT’s intent. The national open data platform, data.gov.au, uses the DCAT standard to describe datasets, although it is not clear whether this is uniformly applied across state and territory data portals. The Australian Government Record Keeping Metadata Standard also incorporates some DCAT-compliant principles. And some Australian universities and government bodies use DCAT and its extensions for better integration of data catalogs across different subjects and research datasets.
Australia is a relatively small player in the global data and digital space and lacks the geographical and legal jurisdiction diversity of Europe. But while aligning more with DCAT standards may not create immediate value for Australian data catalogues and open data portals, there are nonetheless significant benefits that could flow to us from adopting a more robust DCAT adherence.
Foremost of these are returns in terms of our international trade. Ensuring that Australia is using a common standard for our data will help those researching potential commercial opportunities in Australia, foster innovation, and could facilitate investment.
One standard for metadata avoids costly crosswalks – the mapping of specific elements in one metadata standard to those in another standard – between data sets. This will save overseas researchers from different jurisdictions time and ensure that Australian datasets can be combined in whatever work they are doing without extensive reformatting or transformation and vice versa. Along with the increased discoverability of datasets on public data portals, this will promote global engagement in terms of research and cross border policy initiatives. If a researcher at an American university, for example, wants to harvest data from an Australian data portal, they will have a much easier time if the portal has a DCAT compliant version of its metadata.
Diplomatically, greater interoperability between Australian data catalogs and portals and those in other national jurisdictions also helps to align us with an open, democratic and inclusive data culture
Conclusion
Both DCAT and CKAN are designed to support the management and publication of open data. CKAN aligns with DCAT in several important ways. For example, the ckanext-DCAT extension mentioned at the opening of this post allows CKAN to export and publish its metadata in DCAT format and enables CKAN-based data catalogs to be integrated with other data catalogs or platforms that use DCAT metadata. It also enables CKAN users to easily define and create their own DCAT application profiles.
Get in touch
If your organisation is interested in exploring how a CKAN open data catalog or portal can be integrated with efforts to standardise your data with DCAT, contact us and tell us about your project.
[1] European Commission: Directorate-General for Communications Networks, Content and Technology, Towards a European strategy on business-to-government data sharing for the public interest – Final report prepared by the High-Level Expert Group on Business-to-Government Data Sharing, Publications Office, page 65, 2020, https://data.europa.eu/doi/10.2759/731415
[2] Gordon LaForge and Patricia Gruver, Governing the Digital Future, New America, October 2023, page 65, https://www.newamerica.org/planetary-politics/reports/governing-the-digital-future/
[3] Chunqiu Li, Yuhan Zhou, and Kun Huang, 2019, ‘A Survey of Metadata Elements for Provenance Provision in China Open Government Data Portals,’ paper presented at International Conference on Dublin Core and Metadata Applications, Seoul, South Korea, September 23-25, 2019, ACM Digital Library https://dl.acm.org/doi/abs/10.5555/3379957.3379968