About the client
The Canadian Watershed Information Network (CanWIN) is a Canadian spatial research data infrastructure (SRDI) system hosted at the University of Manitoba (UM); and managed by the Centre for Earth Observation Science (CEOS) within the Faculty of Environment, Earth and Resources. CEOS is a leading multidisciplinary, and collaborative research centre focused on understanding how the Earth will respond to climate change. CEOS researchers conduct their fieldwork all over the world; however, the Arctic freshwater marine system is a unifying focus of activity because climate change affects this region more acutely than anywhere else in the world. Their arctic research is also one of three signature research areas for the University, with the other two areas being in medicine.
Objectives
Over the last 20 years, environmental and climate change, including priority concerns such as water security, has moved to the forefront of scientific and societal concerns. Projections of climate change, identification of emerging issues, and the development of evidence-based adaptation and approaches are urgent. Addressing the Arctic and freshwater change at an ecosystem scale requires access to large sets of heterogeneous data and necessitates cooperation across disciplinary, cultural, and political boundaries. This comprehensive approach requires a focused research data infrastructure (RDI) to accelerate sharing, discovery, visualisation, and analysis of diverse data, including potentially sensitive Indigenous Knowledge.
The original CanWIN database was developed by the federal government department Environment and Climate Change Canada (ECCC) to share freshwater research in the Lake Winnipeg Basin. The original database was created as a relational database using Visual Studio 2008, which ultimately was not able to capture and share the complexity of the multi-disciplinary data produced. After much consultation with users and data providers, we realised the CanWIN ecosystem required a non-structured database that was flexible, modular, had an easy-to-use user interface, was open-source and had good community development support.
In 2020, CanWIN had the opportunity to redevelop and redesign its current CKAN implementation to a) implement a distributed ecosystem consisting of multiple software platforms with CKAN as the central hub and b) redesign CKAN’s UI in line with the UM’s visual identity.
Challenges
The biggest challenge for CanWin was around showing the complex relationships between the data and the programs, sub-projects, facilities, instruments etc., the data is related to. In addition, data is collected, processed and packaged into datasets, publications or collections which contain a lot of metadata that is very useful to the data consumers and required for reproducibility. A key challenge and the initial focus of the redesign was on linking all these records in a way that allowed users to find all related information when searching and using the data, thereby minimising metadata duplication, and allowing users to view and download metadata in a human-readable format as well as make the metadata machine-readable so it is accessible to data aggregators like Google Dataset search. Another challenge was related to ensuring all these records are easily discoverable by users in a user-intuitive and easy way.
After engaging Link Digital, the collaboration between CanWIN and Link’s team began to address such challenges.
Our solution
Link Digital completely overhauled the data portal and added a Drupal CMS component alongside CKAN. The development work was undertaken on Link Digital’s development environment, and then the overhauled portal was deployed to a staging environment for user acceptance testing before releasing it to production.
FAIR (findable, accessible, interoperable and reusable): The relationships between different types of datasets and their integration into the enhanced search bar and spatial display make the data findable, and providing the data in an open-access portal makes them accessible. Furthermore, creating detailed templates mapped to internationally recognised standards (e.g. schema.org and ISO-19115) and providing multiple export formats (human and machine-readable) makes the data accessible and interoperable. Finally, the creation of “data packages”, which provides users with a zipped package containing the data, the metadata in human and machine-readable formats and the data dictionary, along with any codebooks and cookbooks created, maximises the reusability of the data.
Seamless integration with various data storages (like ERDAPP or Geonode/Geoserver) has been added to allow the data to be previewed and downloaded in multiple formats without the need to duplicate and maintain the data. ERDDAP also creates and allows the export of data files in NetCDF format, which embeds key metadata into the data file itself and is considered a “gold standard” for reproducibility in a file format. The ERDDAP data is linked into CKAN to allow users to download multiple formats for data using the user-friendly CKAN interface.
With the CKAN data portal now extended with a Drupal CMS, data custodians can feature and promote data and tell stories about the data. The CMS also connects users seamlessly to tools that visualise the datasets published on the portal.
Benefits
Link Digital ensured the solution incorporated CanWIN’s commitment to CARE, OCAP and FAIR principles for data sharing – the ethical sharing of open data in a way that considers both people and purpose in open data advocacy.
The need for data to be openly accessible was at the very core of the database design, specifically for data to be open broadly without restriction and on an equal basis (with exemptions for ethical reasons), to be discoverable via web-enabled standards, to have the ability to be used and understood in the context of other data both by humans and machines and to be reproduced easily to facilitate an understanding of how the data was collected and who should be contacted for additional information. The client also required that the data be built into the existing secure hosting infrastructure to safeguard against corruption and loss.
To successfully achieve these benefits, Link Digital built a platform using Python on the CKAN system and utilised data schemas so data could be aggregated by federated metadata aggregators (including Google dataset search and the Canadian Federated Research Data Repository), controlled vocabularies, the Canadian Consortium for Arctic Data Interoperability (CCADI) cross-walking for common metadata standards, as well as CKAN’s native restful APIs.
The end result was a distributed data ecosystem that was FAIR (findable, accessible, interoperable and reusable) and easy to use.
Global Benefits
CanWIN is part of the Canadian Consortium for Arctic Data Interoperability (ccadi.org) which is a pan-Canadian collaboration between 6 universities and multiple Indigenous, governmental and non-profit organisations to develop an Arctic Research Data Infrastructure (ARDI).
In collaboration with additional global partners (including POLDER, Australia), the CCADI is building a federated data aggregator which will allow all partner repository data to be searchable from one website, as well as provide tools highlighting how the standardisation of data allows researchers to easily analyse multi-disciplinary datasets. The data available through the CCADI site can be used for a myriad of purposes including examining the impacts of climate change at local, regional and global scales, gaining an understanding of current data gaps and provide the information required to make science-based decisions and shape policy creation that aims to support the balance of ecosystem function and human activities.
This was an exciting project for Link Digital, and we were thrilled to be involved in a project of such significance.
—
About a similar project
We are ready to start a dialogue with you and your team on your open data management needs. Contact us and tell us about your project.