Six Principles for Sound Data Management
Data makes the modern world go round.
Whatever sector you work in, data should always be driving decision making, growth and innovation. The more complete, reliable, intelligible and available that data is; the greater the growth. However, curating high-quality data is hard, especially where that data has been collected by an error-prone person instead of a machine.
What guiding principles can be identified that will lead us towards sound data management?
Since 2015, the Fjelltopp team have been working with the United Nations and national governments to help harmonise health data amongst low and middle income countries across the globe. We’ve seen and supported many national and international efforts to level-up the quality of health data management, because improving the quality of health data will improve the quality of health services and ultimately save lives.
Why do we want sound data management?
Despite an explosion of innovation in our sector over the past decade, we consistently see a lack of support and capacity for ongoing health data management at the lowest levels. The symptoms of this problem include, but are not limited to:
- Implausible data leading to unconvincing insights and results
- Slow and unusable data management systems
- Fragmented data in siloed systems, wasting valuable resources and time
- Data loss due to turnover of staff and equipment
- Poor security practises, including shared passwords, security by obscurity, and unencrypted email attachments.
- Data is not findable or reusable by partners and stakeholders
- Slow decision making resulting in a catastrophic loss of efficiency across the entire sector
Whatever your sector, and whatever your context, it is likely that some of the problems above are familiar to you. Improving the quality of your data management will help overcome these problems and increase the potential impact of your data, making it discoverable, accessible, intelligible, and reliable.
This article is the first in a series from Fjelltopp CEO, Jonathan Berry, introducing six principles for sound data management. Against these principles it should be possible to measure the efficacy of any given data management system or practise. These principles are written especially for individuals responsible for contracting and managing software and data engineers, or for new engineers joining the field.
This article will be followed by further articles addressing and motivating each principle in detail. We would love to hear your thoughts concerning the completeness of this list, especially in light of your own experiences in data management.
Centralise (where possible) in a feature-rich storage tool, protected with industry-standard security, flexible read/write access management for data and metadata, user-friendly interfaces, and API access.
Define standards for metadata management. Take time to configure your software tools to enforce these standards and auto-populate where possible to minimise overhead.
Automation may require significant initial investment, but data management is repetitive and prone to human error. This means your investments typically pay off quickly. The codebase also acts as documentation of your analysis, making it traceable and repeatable. Ensure there is API access for building automation.
Automate checks for data formatting, content and plausibility. Ensure these checks can be run frequently and as close to the source as possible to avoid cascading effects down the data pipeline.
Ensure you can browse, access and restore old versions of both data AND metadata. Track who is making changes and when, so that there is a complete and clear audit trail linking the data to the source.
For traceable, transparent data-based decisions, archive an accompanying immutable copy of the data at the time of use. Archives should be secure and backed-up to multiple locations.
Fjelltopp has extensive experience working with the open source data management platform CKAN to deliver each of these principles for health data hubs. These hubs help Ministries of Health and NGOs disseminate their datasets and reports in an efficient and secure manner. We have spoken about our experiences at the CKAN monthly live webinar which you can access and view here. We have also applied these principles whilst building a real-time, electronic, case-based, integrated national public health surveillance framework with Python, used by the WHO especially during times of crisis.
We’d love to know what you think.
You can set up a meeting with us here. In the meantime, here are some questions to get you thinking…
- Would you add, or draw out, any additional principles to those presented above?
- How would you prioritise the above principles?
- How would you score your own data systems (say, out of 5) against each principle?
- In your opinion, which areas need improving as a highest priority?