Switching digital infrastructure from proprietary to open source software is still seen by some as fraught with complexity. One organisation that has undertaken this process was the Resources Markets Branch of New Zealand’s Ministry of Business, Innovation and Employment (MBIE), when it replaced the proprietary software used to run its online data storage platform, with the open source Comprehensive Knowledge Archive Network (CKAN).
Miles Dunkin, MBIE’s Manager Geoscience and Geospatial, is responsible for the team that manages the stewardship and release of geospatial and geoscience exploration data and reports collected by permit holders and the New Zealand government. Dunkin contends that not only was the switch “relatively straightforward,” the benefits in terms of increased engagement with the new platform, especially on the part of non-expert users, as well as improved workflow and understanding on the part of his data team behind the scenes, have been substantial.
I spoke to Dunkin recently about the challenges of his job at MBIE, what was involved in swapping from proprietary to open source, the benefits that have resulted from the shift, and what the plans might be for the MBIE’s data operations in the future.
One hundred and sixty years of data
The Resource Market’s Branch is responsible for the regulation and policy aspects related to the minerals and petroleum resources sector across New Zealand, including exploration for oil, gas, coal, gold, silver and many other minerals. This involves collecting, storing and managing huge amounts of data, including that gathered under mandatory industry reporting, and the sharing of this with the public.
“There is a requirement, under the permits that we issue, for the operators to supply data back. That’s what I’m responsible for, the ingestion and the curation and then, at the appropriate time, the sharing back with the industry and the public the relevant information that has been supplied.”
“There are a massive number of files and resources and data sets. Looking back, I think the oldest is 1866, so it is almost one hundred and sixty years of records. And then the other thing that is a challenge, because seismic is part of that, you get some very large datasets.”
The work to replace the existing data setup was undertaken from 2020 to 2023. It involved the installation of two interconnected CKAN portals: an internal geoscience data asset register and a public-facing open data portal. The MBIE’s data team works day-to-day in the internal portal with all the private data, while the public one is literally just a public open data site.
Three vendors were involved in the project, including Link Digital, which worked exclusively on the CKAN aspects of the project. This included CKAN customisation with the client’s internal data pipeline and workflow systems, data migration from legacy infrastructure to new CKAN instances, as well as providing ongoing support and maintenance. This included, most recently upgrading the set up to the latest version of CKAN.
Using a sledgehammer to crack a nut
When it comes to the rationale for switching to open source, Dunkin is very clear.
“Basically, we had a sledgehammer to crack a nut. We had industry specific software that was designed for the interchange across different government departments and organisations.”
It was meant to be set up almost as a data mart, which meant giving people the ability “to trade and exchange data, as well as regulate what was going on. It was very comprehensive but also very proprietary and we were probably using a miniscule amount of it.” It was also “quite expensive”. “So, we looked around and thought, what can we do? because it was quite a complex set of architecture, not just a single piece of software.”
The complexity and cost of the then existing proprietary software were not the only factors behind the decision, according to Dunkin. “Cloud storage was becoming a thing, rather than having to have your own tin sitting in a room somewhere on premise.” There was also strong desire to adopt open source technology solutions, in line with the New Zealand Government’s drive to improve the discovery and reuse of its open data, and support for the FAIR principles of Findability, Accessibility, Interoperability, and Reuse of digital data.
“While we were looking around, one of the organisations that we thought we were quite closely aligned to, was Geological Survey of Queensland (GSQ) and its public facing geoscience open data portal. It seemed that they were doing something, a couple of years ahead of us, that was probably pretty similar to what we needed.”
“And they kindly were very open and sharing with us what they developed. So, it was really the fact that they [GSQ] were ahead of us doing basically what we wanted to do and were willing to share that was a strong contributing factor and the fact that they were complimentary about Link Digital, which had been involved in the project.”
What do the interconnected portals do?
As noted earlier, the MBIE project consists of two interconnected portals, one public and the other private.
“They’re effectively almost exactly the same. Under the Crown Minerals Act, which is the act that we regulate under, there’s a requirement to release information under certain conditions. The main condition is that after 5 years, the data becomes publicly available. That’s the main driver of the two portals. One is where we store everything and the commercial sensitive material that has not reached release date. The ingest of the material goes into the internal portal, anything new that comes in. And then the system is set up so that when it hits the 5 year date, it triggers my team to go and make sure that its appropriate for it to be released and then to release it into the public portal.”
The release of data to the public portal happens semi-automatically. “There is always a human involved,” stresses Dunkin, to avoid the danger of inadvertent data release. “This was a key piece of the work with all the vendors, trying to figure out how all the backend pieces fit together, such that there was no way this could happen.”
The MBIE used Jira to support the workflow related to the portals, which Link Digital configured and improved to integrate the CKAN instances with MBIE’s existing processes, so that they did not have to be implemented again from scratch.
“So, CKAN lets Jira know when something is coming up, two weeks before its due for release, it builds a ticket into Jira and then we work in Jira to do all the checks and balances. And once that has happened and all that work has been completed, it makes the publish button available. It is hard wired into the system that the publish button will not be available until the appropriate date.”

Benefits of the new data set up
The main improvement that Dunkin has noticed with the transition to new data set up relates to improved findability.
“We’ve noticed a big uptick in the number of people that have registered when we went from the old system to the new system. It’s clearly more user friendly from a public perspective. So, the requirement for [improved] findability has definitely been met.”
This touches on a key concern of many data portal managers, how to get the data on their sites out to a larger group of stakeholders who aren’t data statisticians. “We know the experts can use it but how do you get it out to a larger group of people who are not data statisticians?”
“You know with the gold price going gang busters in New Zealand we’re getting a lot of interest. And some of these people just have a Ute and a sluice, may struggle with technology. Even some midsize organisations, they still struggle a bit with the technology side of things. So, we’ve tried consciously to simplify things as much as possible and the CKAN interface is far simpler than the old one. So that’s been a help.”
“In terms of the permit side of things, we’re breaking the previous year’s records of the number of permit applications. And I’m anticipating that that’s going to translate down the track when people start sending us reports and data back, to quite a big jump in the amount of material coming back to us. So, I think we’re in good shape before this hits.”
But there have also been benefits in terms of how the MBIE data interacts with the portal. As Dunkin puts it:
“We have more ability to understand what’s going on under the hood. You know, obviously we’ve got the vendors as the experts on this stuff, but it has improved the team’s knowledge of how the system works because it’s not proprietary. It’s not locked away. We’ve got a couple of people that really understand the workflow.”
Dunkin also notes that the new CKAN system also makes his job as data manager easier.
“The way it is set up and how CKAN has improved workflows, it is quite easy to manage the team workloads across the system. It is simple for [the team] to pick up each other’s work, so that you do not have one person sitting on a ticket and then that person goes on leave and the process stalls.”
Future plans
One thing flagged to improve in the very near future, is how to improve the download experience for people from the public portal.
“When there is a break or an interrupt, a download just stops. So, you can be partway through downloading a terabyte of information and you could be on the last byte to come, and your download fails, you’ve got to start again. Improving that is our priority.”
The other topic that is on the minds of MBIE’s data team is how they might use AI.
“Those of us who use the system a lot have got quite good at finding what we want quickly and CKAN, with its search capabilities, certainly helps in that regard. But we’re thinking, wouldn’t it be great if we have some sort of functionality that allows us to say, ‘find me all the files that relate to X’.”
“This is a challenge given the fact that the portals contain masses of unstructured data, everything from PDFs to who knows how many thousands of spreadsheets. That’s semi-structured data but there’s an awful lot of information there that would be super valuable but requires a large amount of effort to draw value from.”
Want to know more? Read the full case study of the MBIE data portal project on our site here.
