About this guide
To effectively structure and migrate the world’s expansive sources of knowledge into machine readable open data platforms of real value we need both technical and non technical data boffins to work together. Vast linked data platforms which span aspects of every human endeavour are on our collective horizon. The Internet of Things, of People, and of Humanity.
Let’s put data in its rightful place – First and Last
Twenty years ago we’d dismiss indecipherable data with throw away lines like “You only get out what you put in”. We didn’t use thoughtful terms like ‘stewardship’ when we talked about data. We labelled it ‘maintenance’ which meant hiring a temp with no prior knowledge of your business, products, services or goals, to half-heartedly scan a bunch of almost integrated spreadsheets that reportedly housed the same data. They’d be given a five minute brief, mostly explaining where the kitchen was, and instructed to delete double-ups in the spread sheets and get rid of anything that looked, well, wrong. Data was little more than a largely annoying and convoluted reporting tool, tacked onto the end of Board Meetings when everyone was just waiting for their cab to the airport. The importance of data as an asset was not well understood.
The repercussions of this attitude to data resonate today – because we didn’t think strategically in the collection, maintenance and integration of our data sets we ended up leaving this extraordinary asset not only in the hands of disengaged temps, but spread across fragmented and loosely coupled IT systems and a grubby patchwork of legacy software. Shame on us.
And as if that’s not enough guilt to fuel the downward spiral of data neglect, now there’s the whole open data movement creeping into mandated ICT strategies that not only expect everyone to keep useful, quality data, but to share it, with the public, without restriction!
The love of data
Thankfully, the value and potential of data is achieving recognition as a strategic consideration. World-class organizations and governments alike are mining vast troves of data to obtain actionable intelligence, leading to deep customer/population insights, improved business processes, higher profits and/or savings where applicable, political gain and even transparency. More and more we are considering the governance, quality and full life cycle of our data: including how it will retain value for future use. In a growing number of scenarios the value of opening up and sharing data more broadly is also being recognised: not least of all, as a means of getting the most mileage, which of course ties in nicely to trends in developing new sources of data and new storage and retrieval platforms.
Start with a tidy up
Think: consolidation, metadata; and stewardship
Once you understand the key strategic and operational questions you need your data to respond to, you can start the data cleanse, improvement and consolidation process. It’s easy enough to say that in one sentence but, yes, it does involve a lot of work at the get go (and a future blog piece). I’m not referring to the old process of hiring a temp. Your data is simply not worth holding onto if you don’t invest some resources up front in ensuring it is harmonised, integrated and defined by sound policy and architecture. Whether you bring in a consultant, seek out an all new consolidation platform or choose to go it alone, remember this is also the critical time to fix your metadata collection. Without the right metadata there’s simply no “Who, What, Where, When, and How” to describe your data. Such a lack of structure and meaning means no one can responsibly re-use or interpret the data and the whole exercise becomes invalid.
Effective data stewardship is more than verifying quality, security and protection of your data, it’s also about letting your data thrive. Even if you don’t wish to open things up to all and sundry, you may still seek to allow a degree of broader access privileges, at least in regard to non-proprietary data.
The pursuit of data
Now you know that data has such far reaching value and potential it may be time to extend and enhance the type of data sets you hold and the means by which you capture it. Sourcing universally, or at least via a network of like-minded collaborators (and depending possibly too on where your data sits within the whole Open Data movement), is an investment in the ongoing relevance of data. It’s an easy equation: people filling a database with what matters to them = a database full of what matters to the people. Additionally, allowing for a mix of media – photos, video, audio and other formats – can augment your collection and influence longevity.
Next, what should the solution look like?
In summary, a nice platform for the storage, access and retrieval of all your consolidated and/or new data needs to enable the following:
- Private and public user interface components as required
- A web based data upload facility – could be universal or access controlled
- A variety of media formats
- Compelling, or at the very least interpretable, data visualisation
- Intuitive and enduring search and browse functionality, faceted for narrowing a search
- Engaging design
- Agility, conducive to insights driven service delivery or information sharing
- Sound and relevant collection of mandatory and automatic metadata
- Readiness for integration with other applications
- Multi-tenant users, potentially unlimited
- Low cost media storage with a storage infrastructure component for transcoding large files and managing archiving of lesser used resources
Effective data stewardship with a view to the full data life cycle suggests that, while you may not wish to have all the above functionality switched on at the start, it should be possible for the future.
So if that’s all about putting data first, what do I mean by also putting it last?
Forget “first things first”, the best data platforms start by considering the end – or at least the end user, and the worth of your data to the end user down the track. As futurist Ray Kurzweil once said “an invention has to make sense in the world in which it is finished not the world in which it is started”. So to, the data we collect and store today has to be useful for retrieval tomorrow, although it also has to make sense today so, um, let’s end things here and not dissect the Kurzweil reference too much.
Except perhaps to just ask, is the end user a human or a machine?