The Resources Markets Branch of New Zealand’s Ministry of Business Innovation and Employment
CKAN, Azure, Solr, geospatial tools such as ArcGIS and FME Workbench
2020 – Mid 2023, with ongoing support
https://geodata.nzpam.govt.nz/
The Resources Markets Branch of New Zealand’s Ministry of Business, Innovation and Employment’s (MBIE) is the country’s regulatory steward for resources market systems. Through its regulatory activities under New Zealand’s resources legislation, MBIE has responsibility for the collection and management of petabytes of geoscience data, including that gathered under mandatory industry reporting. This includes:
This data is stored and managed in a variety of different formats: tabular data, PDF reports and document archives, imagery & geospatial data, partitioned data sets, and specialised mining and petroleum-related data in geospatial and geodata formats.
MBIE required a replacement for their existing online data storage platform, which was nearing the end of its support lifecycle. This existing data setup included an internal data catalogue and a public-facing data portal.
The client felt strongly that the existing software components and operating and database management systems were not meeting the rapidly changing needs of users and stakeholders. MBIE was data rich but found it difficult to make connections and gain insights from the data in its possession. In addition, the existing setup was operated with an end-of-life proprietary software, which they found provided far more functionality than was needed.
There was also a strong desire to adopt open source technology solutions (the license under which the previous system worked was operated by a private sector provider using proprietary software) and cloud solutions. This was in line with the New Zealand Government’s drive to improve the discovery and reuse of its open data, and its support for the FAIR principles of Findability, Accessibility, Interoperability, and Reuse of digital data.
In the initial stages of the market analysis to replace its existing online data storage set up, MBIE became aware of work undertaken by Link Digital on Geological Survey of Queensland (GSQ)’s public facing geoscience open data portal. GSQ is the state’s custodian of mineral and energy resources data, including that gathered via mandatory industry reporting. The Queensland portal was a smaller, less complex version of what MBIE was trying to achieve, and they liked its clean, modern interface and focus on information discovery and access.
Taking the Queensland portal as a starting point but adding a range of significant customised features, the client wanted both a revamped internal and a new public-facing open data portal, both using CKAN implementations and connected so that data can be transferred between the two. The agency also wanted the portals to be scalable so that they could continue to be enhanced in the successive phases. Link Digital took part in several online workshops to understand the client’s needs and help them scope out workable solutions and the work required to achieve these.
Link Digital was one of three vendors contracted to work as a consortium on the project but the only vendor working on the CKAN aspects. It represents one of Link Digital’s most complex open data implementations. The development process was complicated, with multiple stakeholders and the need to formulate a method to test and integrate all the contributions across the different vendors. And MBIE was very concerned about data security during the build phase and we and other vendors had only restricted access to their system, which also presented challenges.
The project involved replacing MBIE’s existing online data storage platform with two interconnected CKAN implementations: an internal data geodata catalogue and a public-facing open data portal. These implementations were connected to data ingestion via the CKAN application programming interface.
The project involved a complex technical process of integrating the CKAN instances with multiple systems involved in MBIE’s data pipeline and workflow processes.
MBIE works under a complex and strictly legislated set of processes for the collection, processing and publication of data. After being collected, data is processed, including the population of metadata, and uploaded into the internal data portal. Once several conditions have been met, some of this data is then uploaded into the public portal. This process must be undertaken within strict deadlines set by legislation, which includes an embargo period under which data provided by commercial entities remains private before being made public.
MBIE’s existing data pipeline was migrated to FME Workbench, and its workflow processes were undertaken within Jira. The latter included a ticketing system for the various stages of the data’s journey to publication and any updates or modifications that are made to the status of a published data set. Link Digital configured and improved the new setup to integrate the CKAN instances with MBIE’s existing processes so that they did not have to be implemented again from scratch.
In addition to Jira and FME Workbench, the CKAN instances had to be integrated to work with various other systems, including:
Link Digital oversaw data migration from MBIE’s legacy infrastructure to the new CKAN system. This was a major task given the large amount of data involved and the need to migrate it without problems, with some migration work continuing even after launch.
In addition to developing a data schema and process for the migration, Link Digital created a staging location for the data on route to it being ingested by the CKAN system using Azure Blob Storage. This saved time and cost and made the process considerably less complex by enabling the client to upload and temporarily store the large amount of data on the same cloud storage server as they were currently using, on route to migration to the new CKAN system.
The new online data storage platform was launched in 2023, with Link Digital providing ongoing support and maintenance. This included, most recently, upgrading the setup to the latest version of CKAN.
The client has noted definite improvements with the new system, including evidence data users previously found hard to find is now easily discoverable. The public-facing data portal alone currently hosts over 60,000 datasets.
MBIE has plans to use the new setup to expose a very large quantity of offline seismic field data, which were previously stored on magnetic tape. These data resources will remain ‘offline’ in cloud cold storage, but the datasets will be discoverable in the public-facing open data catalogue. The data will then be able to be requested from within the catalogue, with delivery via Azure download or physical media.