This piece unpacks why we hoard, when “data is the new gold” stops being true, and how to confidently delete what no longer creates value
We instinctively understand the dangers of hoarding in the physical world. We’ve all seen stories of hoarders’ homes stacked with everything they’ve ever owned. Organisations are practising the same behaviour when they hold onto data long after it is useful, safe or relevant.
In the past, physical storage limits forced decisions. As network drives filled up, teams had to delete low-value material. Cloud computing has removed that pressure. Data is easily copied into reports, spreadsheets and dashboards, then forgotten on shared drives. Each copy may be justified at the time, but kept beyond its use-by date, it becomes expensive clutter - and a disaster waiting to happen.
Data hoarding exponentially increases breach risks
Every unnecessary copy of data kept beyond its useful life becomes another point of unwanted exposure. Data breach risks do not just arise from hackers getting into a company’s network. They often stem from data hoarding.
In recent years, multiple high-profile data breaches have been traced back to forgotten files on company devices. In one case, stolen laptops exposed decades of a university's student records, including tax numbers and identity details. In another case, tens of thousands of patient records were lost when a laptop was taken from a car. At the extreme end, Morgan Stanley was fined USD $35 million after client data was discovered on hard drives it had not properly decommissioned.
In each of these cases, the data was originally copied for legitimate operational reasons. Yet no one remained accountable for managing it through the rest of its lifecycle, including its secure disposal.
Risk of uncontrolled cost of storage
Storing data has always come at a cost. There is the storage itself, but also the infrastructure around it, including systems, networking, energy, physical space and the people required to manage it.
Cloud computing has removed that friction. What was once a large expense that was carefully deliberated over is now absorbed into a monthly operating cost that grows incrementally and often unnoticed. A small increase here, another there, and a supplier price rise later, and storage costs can balloon.
Because the impact of more storage is delayed until the next budget review, people hold onto data longer than necessary, and far longer than the data is useful or valuable.
Even social media companies, some of the worst data hoarders and the best at turning user data into advertising gold, are feeling the pinch of giving infinite storage to their users, with companies like Snapchat introducing fees for storing old photos.
Risk of conflicting data or errors in data
Much of what constitutes data hoarding, i.e. duplication of records and data, leads not only to increased storage costs, but also creates compliance risks. Hoarding can lead organisations to rely on inaccurate or incomplete information when making decisions or meeting legal and policy requirements – as several government agencies have found out the hard way.
Of course, it’s normal for teams to aggregate, cleanse, or remove records to suit a specific business question. However, when different parts of the organisation apply different methods to the same underlying data, they also run the risk of arriving at different numbers.
This problem may be manageable when the analysis remains within a team and is used for short-term purposes. When conflicting outputs are shared widely, creating multiple versions of the truth, the risks for organisations increase significantly. Uncertainty can start to creep in about which data decision-makers should trust.
Why does data hoarding occur?
To mitigate the risks of data hoarding, it helps to understand the underlying causes. Hoarding data stems, in part, from the business mantra, “data is the new gold”. Leaders have been led to believe that data is a corporate asset and is extremely valuable in all situations.
In my experience, data hoarding in organisations can be traced to two major cultural reasons:
- The value of an organisation’s data isn’t known or is exaggerated, so teams will tend to hoard data to ensure they aren’t responsible for value being lost.
- Staff aren’t given clear indications on when data can or should be deleted. Employees will tend to hoard data to make sure they comply with data retention rules.
If employees aren’t sure of what to do about data, they will choose the safest option that they think will prevent them from getting fired, which is keeping the data.
What can be done to combat data hoarding?
Data hoarding can be addressed through organisation-wide behavioural change. Giving teams a rubric for cleaning up data can systematise data best practices.
Staff should understand how to assess the value of data, knowing how to dispose of different types of data compliantly, and what needs to be documented afterward.
Here are examples of data that are often hoarded and can be regularly cleaned up with clear policies in place:
- Any ad-hoc reports in employees’ download folders that have been generated from live systems
- Data that was used for projects that are now completed, and have no reason for retention (this is especially true for migration activities)
- Working, staging or temporary files that were used developing a dashboard that has been completed
- System backups that are no longer needed for service level agreements, for example backups older than a few years
- Historical or old transactional or event data that is no longer relevant for business purposes.
Marking staff calendars for periodic data clean-ups can help reinforce this behaviour. Dedicated initiatives like
Anticipate objections
There will always be people in an organisation who insist everything needs to be kept forever. Their thinking is, “if we can keep all the client data and analyse it, we can get new insights on how to serve them better”. Consider marketing, for example. Customer data can be valuable for understanding marketing trends when it’s current. But when consumer behaviour changes every five years, older data is less likely to offer accurate insight.
A useful test is simple: does this data still create value? If it no longer informs decisions, reflects current market or stakeholder reality, or meets a legal retention requirement, keeping it only adds cost and risk.
Metadata is insurance for data disposal
Good metadata is essential for data security. As data about data, metadata is the evidence organisations need for maintaining audit trails. Staff should be trained to document what data is deleted, creating a clear record of what was removed, by whom, when, how and why.
Metadata is even more critical when data cannot be deleted. Recording why data has ongoing value makes it easier to find, manage and protect, while supporting future audits and risk reviews. Just as importantly, documenting high-value data reduces the risk that it is accidentally deleted. That kind of assurance is key for engendering trust in data management decisions.
Samuel Spencer is an Adjunct Professor at the University of Canberra where he acts as an industry advisor, research partner and speaker on data governance and strategy. Sam is currently writing his new book Mostly Quadrants, exploring the elements of effective decision-making for organisations. For more on Samuel’s upcoming book please visit his substack: https://mostlyquadrants.substack.com/