Data destruction is a huge part of what a business does on a day to day basis. A real consideration, however, is very rarely placed on what this data destruction functionally does to the hardware which hosts it. Hardware has a limited lifespan, and as such, each interaction must be considered as a cost-benefit analysis.
Not all data must be destroyed, and doing so can directly limit the lifespan of hard disks. Considering the “data overhead” or your processes, and utilizing metrics to make better-informed decisions is extremely important.
In this piece, we’re going to talk about why hard disk life is limited, what happens when data is deleted, and the very real costs of data destruction. We’ll talk about the differences different approaches might bring about in costs, and their relative compliance with best practices.
Why Hard Disks Have Lifespans
A hard disk is a bit like the front door of a house – when data is created, something walks through that doorway, and when data is deleted, something walks out of the doorway. While a doorway might be stable with four or five people walking through it two or three times a day, imagine millions upon millions of people walking through those doors every single hour, 24 hours a day, and you start to see the lifespan limitations of a hard disk.
A secondary consideration is the rate at which these hard disks need to be replaced. It’s one thing to have to replace a backup drive every four years, but to have to replace a drive every year would be a significant investment. This phase between application and failure is called “mean time between failures”, and is a vital calculation for project managers, hardware manufacturers, and equipment maintenance.
With all of this in mind, how does this apply to data deletion? There are a few considerations here.
Mechanical Failures
Using a drive involves moving a spindle across platters, and just like our door analogy above, involves a certain amount of wear and tear. While this may be a negligible amount of spin-up for the drive motor and movement for the spindle arm motor, it is still a mechanical effort.
Therefore, reducing the amount of time these two items are engaged is beneficial. On a system that is meant to be accessible 24/7, this is a non-issue, as the platters are already moving, but for long term, sporadic access drives, this is a huge consideration.
Data Integrity
Modern hard drives are much more dependable than at any time in history, but that’s not to say they’re beyond reproach. When wiping a hard drive, there is a rate of data write failure that is negligible, but non-zero. This means that multiple passes might be required, adding to the mechanical stress mentioned above.
Availability
Wiping data means a hard disk is not able to be used during that time. While this isn’t an issue for a sporadic access drive, it’s a huge issue for a 24/7 access drive. Limiting the amount of space might be fine in small doses, but if you’re liberally deleting content on the regular, this quickly adds up and results in half your drives being occupied by read/write processes that aren’t actually contributing to your business efforts.
The Solution
The solution, then, is to consider the metrics behind data deletion. One method is to look at the ratio between data creation, destruction, and usage. Tracking when data is created and under what circumstances is helpful as an auditing methodology, but it is also super powerful when leveraged with other considerations such as how the data was used and when, and why, it was deleted.
For example, let’s say we’re managing a small storefront online. Every time a customer makes a purchase order, two copies of their data are stored, one for direct sales (i.e. email address and direct address), and another for shipping. To ease the complexities of the system, the two copies are stored in separate databases.
Make the Database More Efficient
The problem here is that our database is in an improper “form”, with duplicated data. When we go to erase one drive because the item has shipped and is no longer an active order, we’re wiping data that doesn’t necessarily need to have existed in the first place. This uses up bandwidth, locks the disk into a delete cycle, and wears the disk down.
A better solution would be to look at where the data was collected, and why it was deleted. Did we ever use the state data field? Surely, we can use the zipcode to cross-reference a database to get the state address when shipping the item. Are we storing phone numbers with dashes, or as simple numbers? The dashes don’t seem like a lot, but a single bit over hundreds of orders, replicated four or five times an order? That adds up fast.
Schedule Deletion
When we get to the actual deletion part, we have to wonder why it’s done “on demand”. Why are we storing two different forms of the same content, and then deleting one? If we make our database more efficient and set data to delete in batches, rather than “when it’s no longer needed”, we can definitely increase the efficiency of our system, but we’ve still got some overhead issues.
A better way of doing this deletion method would be to simply assign the data an expiration date regardless of time or batch, and assign it sequentially to the disk in which that batch resides. By doing so, you create a funnel wherein old data is pushed closer to deletion, and as old data is deleted in mass batches, the newly cleaned disks are used for new data.
Ensuring Compliance
This entire process also has some serious implications for compliance. While data deletion is quite straight forward, complying with the various regulations governing the practice is not. Any additional complexity in the process is only going to make compliance that much more difficult.
By simplifying our data collection methodologies, and utilizing metrics to see why this data is being stored and deleted (and specifically how), we’re essentially conducting an audit, something we’ve advocated previously outside of this context.
Using The Proper Tool
Something to keep in mind here is that not every tool is the same. Getting an efficient tool is only half the battle – ensuring your tool is actually doing what it should do, and doing so in a legally compliant way, is just as important.
ClaraWipe is a great solution for this, as it not only gets rid of your data – and efficiently – it does so in a legally compliant way. ClaraWipe adheres to or exceeds the following major regulatory and technical standards:
• Sarbanes-Oxley Act (SOx)
• HIPAA & HITECH
• The Fair and Accurate Credit Transactions Act of 2003 (FACTA)
• US Department of Defense 5220.22-M
• CSEC ITSG-06
• Payment Card Industry Data Security Standard (PCI DSS)
• Personal Information Protection and Electronic Documents Act (PIPEDA)
• EU data protection directive of 1995
• Gramm-Leach-Bliley Act (GLBA)
• California Senate Bill 1386
By using a proper data deletion system and adhering to some basic metric-driven principles of efficiency and intelligent deletion, data overhead can be drastically reduced, if not all but deleted.