When looking at open access I've been considering what frictionless science might look like. I've been thinking about how to publish the full set of research artifacts needed to replicate and review work undertaken by labs, or to swap out data and reconstitute the research in a new context.


Note: This is a repost from discussion I started on the Open Knowledge forum.

I’ve been doing work with open Government data for the last few years, mostly around the platform capability of CKAN. As I’ve gone further into the various areas concerning data I’ve necessarily bumped up against project requirements for hosting a platform for research data. While CKAN needs some maturity in this space it has some immediate application in the area of open access catalogues; to make available both research papers and research datasets. It is less mature in the area of research management systems or managing ‘working data’, but I don’t think it is too much of a stretch to integrate or develop with these extended requirements in mind.

However, when looking at open access I’ve been considering what frictionless science might look like. I’ve been thinking about how to publish the full set of research artifacts needed to replicate and review work undertaken by labs, or to swap out data and reconstitute the research in a new context. That thinking, done only with little access to end users, has revealed the following short list of what might be published as a ‘dataset’ listing of ‘resources’.

  1. Paper – the summary narrative which explains all context for the work
  2. Data – any raw data used to test a hypothesis
  3. Code – and algorithms or open source codebases and configurations details used to work with raw data to produce insights or secondary analysis inputs
  4. Environment – any infrastructure orchestration scripts for automating the replication of data analysis in publicly available cloud environments.

With that all said, I thought to draw those interested in this idea to the following recent work:

Pyramids, pipelines and a can-of-sweave – work done by Florian Mayer for the WA Dept of Parks of Wildlife. I think this is a great example of how to cover the first three points above.

How to build a supercomputer on AWS with spot instances – work done by Link Digital (disclosure – this is my company) as a funded proof of concept thanks to Intel, AWS and the NCI (24th largest supercomputer facility in the world as of this afternoon).

You can review the work via the video demonstrations below.

author avatar
Steven De Costa
Steven has worked in the internet and multimedia industry since 1997, founding Link Digital in 2001. As the executive director at Link Digital, his focus is on the strategy and execution of complex digital projects over the long term, including the development of community and market opportunities surrounding the public cloud and open data. He is a co-steward of the CKAN open source project, a former national organiser of GovHack.org, and in 2018 stepped down from his position as Secretary and Treasurer for the Board of Open Knowledge Australia. A strong communicator, Steven holds a Bachelor of Economics from the Australian National University and is a long-standing contributor within open data and open knowledge initiatives, regularly co-organising and presenting at related events.