While making data ‘open’ for the common good arguably dates back as far back as the earliest shipping logs and trade registries, ‘open data science’ only began to take shape in the late 1950s. According to a 2023 article by Paul Connor, Executive Director of the Canadian Open Data Society, ‘open data’ first emerged as an agreed descriptor in 1995. Now a frequently used term, its most common manifestation is the open data portal, the first of which appeared in 2009 – data.gov, the United States’ vast online repository of open data, and datos.gob.es, the Spanish Government’s open data portal. And there are now, literally, thousands of open data platforms around the world. 

But although they may be ubiquitous, have open data portals met the promise that often accompanied their release and what is their future? 

These questions were discussed in a panel at the State of Open Data Con in London in early February, titled ‘Reimagining open data portals: towards a collaborative research agenda.’ The panel featured Elena Simperl, Professor of Computer Science at Kings College London and Director of Research at Open Data Institute, Doctor Jonathan Gray, Director of the Centre for Digital Culture at Kings College London and co-founder of Public Data Lab, and Link Digital’s recently appointed Chief Technology Officer and Head of Europe, Paul Walsh.

Gray’s contribution amplified on his research published in a March 2023 article in the journal Data and Policy. This analysed the evolution of open data portals, the veracity of claims made around their contribution to ‘innovation, public empowerment and government transparency,’ and some of the critiques of them that have surfaced. For example, among the criticisms specifically levelled at the United Kingdom’s data.gov.uk, are that its ‘datasets were “old,” “irrelevant” and infrequently updated, that important datasets were missing (e.g., election results), and the portal does not make data accessible for broader publics.’ 

“We are seeing a renewed interest in data portals, or data cataloguing systems more generally,” Walsh said in comments made after the panel. “I think that is tied to a general problem of ‘knowledge management’, rather than in the early stages where open data release was closely tied to transparency initiatives in government. However, governments still need to address fundamental issues around data: how it is shared, internally and externally, catalogued, and how data is made discoverable for a range of human and non-human actors.” 

Gray’s article cites a 2022 literature review on data portals that posits an ‘expectations gap’ between the different aims of portals and their ‘limitations, faults and unfulfilled promise,’ including shortcomings related to transparency and broader accessibility. “I think we can look at this from two perspectives, consumers of data and publishers of data,” Walsh maintained. “In my reading, the sector in general, and research in particular, is focussed on the consumer, the generic constituent or business that can leverage open data that a government publishes. This often leads to an assessment of the ways in which open data portals don’t deliver on their promise, due to lack of data reuse and so on.”

 “Going forward, we obviously need to create better user experiences for this audience, but we also focus more on programmatic access to data and metadata, and related functionality like access and sharing mechanisms, and data lineage information. As for publishers, there are huge amounts of friction in organisations like governments to publicly releasing data in the first place. Even just having an open data portal and a policy around it starts to create internal capability and processes to support this data release, and that in general is an outcome that is too often overlooked in analysis of data portals.” 

These thoughts were echoed by Simperl in her contribution to the State of Open Data Con panel. She argued that while open data portals remain a key form of open data infrastructure, they need to become more multifaceted and focus on enhanced data transferability and usability features. She mentioned several specific features that portals will need to align with in the future to be considered best practice in open data. These include: data sets that are accompanied by a comprehensive descriptive record going beyond a collection of basic metadata; providing supporting documentation that can be immediately accessed from within the dataset and is context sensitive so that users can immediately access information about a specific item of concern; allowing an extract of the data to be previewed for sense making; and ensuring that portals make recommendations for related data sets and allows users to review/rate the datasets. 

Simprel’s focus on best practice in regard to discoverability, improved standards to maximise interoperability, and the provision of quality metadata to enhance comprehension and re-use are very much in line with the ‘Third Wave of Open Data’, and its proposition that the open data ecosystem has moved beyond a focus on simplifying and disseminating data to how data can better be systematised and how governments, private corporations and civil society organisations can better know and understand the data they have and share it to provide improved products and services, and a more informed citizenry. As the Third Wave of Open Data Tool Kit puts it: ‘While the open data portal format will likely remain a common piece of technical infrastructure, new and sophisticated technological developments could facilitate greater collaboration and responsibility in data re-use.’ 

Put more simply, there is a need to move beyond the tick box approach to making data open that characterises some open data portals and ensure portals become more interactive, including enabling users to contribute data, collaborate on open data projects, and engage in real-time discussion about data. In the Australian context, this relates to comments Link Digital made in relation to the Metrics Framework that accompanied the Australian Government’s recently released Data and Digital Government Strategy. One of the metrics was ‘# of data assets discoverable and available for use’. While this was certainly an early metric adopted by the open data movement, it is highly malleable and ignores important issues such as data quality and relevance. The mere existence of an open data portal is not enough. We also need to ask questions such as, is it up to date, is the data high quality, and is the portal embedded in a deliberate strategy to engage, empower, and inform.

The State of Open Data Con panel also discussed future research to inform how open data portals might be designed and developed differently. “The main takeaway was a desire to do further research on how data portals are used and how that has changed over time,” Walsh said. He added that the Comprehensive Knowledge Archive Network (CKAN), a piece of open-source software launched in 2006 that can be configured and set up to function as an open data platform once it is deployed and hosted on a web server, will remain of central relevance. “I strongly believe open source matters for government. CKAN is by far the most widely adopted open source solution for data portals and for data cataloguing generally. CKAN and the community around it are uniquely positioned to be a driving change in better user experience for data consumers and publishers, and in developing novel approaches to data.” 

Data collections such as Link Digital’s Datashades.info and DataPortals.org, run by the Open Knowledge Foundation (OKF) will be of vital importance in this research. The OKF’s DataPortals.org lists 598 open data portals based on CKAN, and research carried out by the Pathways to Enable Open-Source Ecosystems project into CKAN, identified nearly 400 regularly updated CKAN data portals in 59 countries, ranging large to small portals. “Platforms like CKAN provide a technical foundation for addressing these issues, and this includes, but is well beyond, ‘the open data portal’ use case. There is so much more that can be done in terms of technical platforms like CKAN and capacity building, data literacy initiatives, to streamline data publication.” 

Topics like the one in this post are discussed in a series of forums by Link Digital on the last Thursday of every month, Australian EDT. These forums will connect you with like-minded experts who are passionate about the importance of open data and want to stay updated on the latest developments in the field. They are free to attend and open to everyone. Register today.