As a data steward in a large research organisation, you face many challenges. Not only do you have to manage a seemingly ever-expanding amount of data. You are responsible for overseeing data quality and how to enable the finding and sharing of data quickly and securely internally, externally, or both, across its life cycle. Whatever your organisation’s data focus, effective metadata management will be critical.
Link Digital develops metadata catalogues using CKAN – the Comprehensive Knowledge Archive Network – a leading open-source data management platform. These use metadata to organise and structure your internal data, supporting better data governance and discovery. A metadata catalogue can also align your research organisation with the globally recognised FAIR Principles.
Let’s look in more detail at how a CKAN metadata catalogue can help your research organisation better manage its data.
The importance of metadata management
Metadata is meaningful and useful data about your data, i.e., its origins, history, location, ownership and versions. It plays a vital role by offering additional information about a dataset that informs users of the data’s content and context. In many respects, it is the thread that helps data stewards create a comprehensive and coherent picture of their organisation’s data assets.
Metadata is vital to enabling organisations to organise and extract insights and value from data, to inform research, policy analysis and decision making. In turn, metadata can help reduce the time researchers spend searching for data or, in the worst case, recreating data assets that already exist but lack internal visibility. In addition, metadata is important in assisting research organisations to comply with legal and regulatory requirements, particularly relating to personal or sensitive data.
Metadata also plays a crucial important role in bringing research organisations into alignment with the FAIR principles. First articulated in a 2016 article in the open access journal, Scientific Data, the FAIR Principles are Findability, Accessibility, Interoperability, and Reuse of digital data. These principles have gained considerable traction amongst the international research community and adherence to them is now widely seen as a best practice in terms of data management.
How can a CKAN metadata catalogue help?
Metadata catalogues – essentially an organised collection of details that describe the various data assets within an organisation – are becoming a core component of modern data management infrastructure. They give research organisations the ability to develop a centralised and comprehensive index of data assets that goes beyond the basic collection and storage needs of their primary use cases. CKAN has wide ranging functionality in relation to metadata catalogues. This includes searching datasets by any metadata field, data previews for different formats, extra fields for additional information, and the ability to track changes to metadata and view whether a dataset is available under an open licence and, if not, who it is owned by.
In reference to the FAIR Principles, a CKAN metadata catalogue offers the following:
Findability
Rich metadata improves the findability and sharing of data, both across different platforms and domains, as well as within an open data portal or an internal data management platform, by enabling users to search for and discover data assets based on various criteria, such as keywords and tags. Users can customise metadata to capture subject and organisational level vocabulary, to a very detailed level. In addition, CKAN provides a powerful API for close integration with other systems, as well as support for common metadata standards like DCAT.
Accessibility
Data stewards can set permission settings to manage who can view, edit or use the metadata and associated data sets. This enables data to be shared in a highly secure and controlled access environment. Accessibility is also strengthened through the ability to make metadata openly accessible even if the dataset attached to it has access restrictions. The creation of a streamlined approval procedure for data sharing improves transparency and enables data stewards to discern and report on data usage patterns over time and across different levels and functionalities of an organisation. The other unique aspect of CKAN is that being open source, it can be customised without restriction to allow data access to be tailored very specifically and appropriately to whatever ‘better’ access means for an organisation, without any compromise to its core value in the area of machine access.
Interoperability
Detailed and consistent metadata is a key component of machine actionability – the capacity of computers to locate, access, interoperate, and reuse data with none or minimal human intervention. This includes the ability to connect with various data sources and systems to automatically ingest or update data across different systems and platforms, including legacy systems. CKAN supports all the key metadata standards for consistent description and features such as customised metadata templates save time and provide consistent metadata descriptions.
Reusability
Rich metadata, including detailed information about the content, context and ownership of each dataset, makes it easier for researchers to find and reuse data. It also helps to ensure that reuse is carried out properly and adheres to any necessary regulatory and legal frameworks, including data usage rights and licensing. This not only saves time but builds confidence in the quality and relevance of your data, facilitating greater data use, reuse and collaboration.
And because CKAN is open source, there is no vendor lock in and no licensing fees. Instead, the costs are associated with hosting and customisation by a third-party service provider. This makes it more accessible and cost effective. No vendor lock-in also means that users keep complete control of their data and metadata.
Case study: Canadian Watershed Information Network
Managed by University of Manitoba’s Centre for Earth Observation Science, the Canadian Watershed Information Network (CanWIN) is an internal data management platform that collects and manages a large array of climate data and shares it with researchers. Link Digital developed a detailed metadata catalogue for the site as part of an overhaul of it in 2020. This has enabled researchers to undertake far finer grained exploration of datasets and tell far more complex stories about the data used in their work.
The changes have also powered greater collaboration and enabled researchers to better collate and access data for analysis from a hyperlocal to global scale. According to Claire Herbert, CanWIN Coordinator, it has also created ‘a more reproducible data as all details of data collection and analysis can be recorded, and creates a more collaborative, open environment.’ Read more about our work for CanWin here.
Do you want to improve your research organisation’s metadata management and better align with the FAIR principles? If so, contact us to tell us about your needs and one of our data experts will be in touch.