Data management is fundamentally a fight between business needs with security. In a perfect world with no need for data security, data would be duplicated across multiple machines, accessible to anyone who could possibly need it.

Unfortunately, that’s not the world we live in. Duplicated content might be useful from a business sense, allowing departments to access their own custom copy of data sets and forms, but from a security perspective, it’s an incredibly dangerous proposition.

With this in mind, and with the understanding that some data duplication must occur in any business environment, here are some basic concepts to keep in mind when securing remotely duplicated content.

Data Duplication Policies

As a matter of business, securing remotely duplicated data can be done through the institution of policy. When crafting a data duplication policy, management needs to keep in mind that what might be good for business in simple terms can damage security, and thus have the long-term effect of actually damaging the business.

With this in mind, duplication policies need to consider whether or not the duplication is actually warranted. Is data being duplicated so that somebody can access the data at home without going through the trouble of accessing a VPN? This is not a good use case — laziness is not an excuse for poor security.

Is the data being duplicated for backup purposes? This is a perfectly legitimate purpose to be duplicating content, with the sole stipulation that long-term backups should be encrypted, and old backups should be securely deleted when no longer needed.

These two considerations are a perfect starting line for such a policy, as it outlines two very specific use cases and their specific solutions, offering caveats that must be adhered to given the situation (such as demanding encryption for backups and a schedule for deleting old duplicate content).

Track Using a Change Management Solution

Remotely duplicated content will almost always have some sort of calling card. When a workstation creates data, the owner is stated in the details of that file — and this can be extended using any variety of third party solutions.

By establishing tracking of file creation and modification, you can gather a lot of information. This information can inform you as to whether the duplication was warranted, who needs to access the data, and, if in violation, who has violated the stated company policy.

Change management tracking isn’t just about responsibility, though — by tracking data in this way, you can identify out of date, obsolete, unneeded, and unnecessarily duplicated data with ease, marking it for deletion. This results in more space on corporate and personal servers, fewer data to track, and an established, clear chain of custody.

As a tangentially related concept, consider hashing. Hashing is essentially establishing a chunk of code derived from the contents of a file itself. When this content is changed, the hash is changed, and this can aid in the automatic detection of duplicated content over a network of resources.

Create a Resource Map

One of the best things a business can do when dealing with duplicate content is to create a resource map. It can be as complicated or as simple as you want, but fundamentally, a resource map simply states “these are the files and resources we have, who is responsible for them, when they were created, and what they are used for”.

By putting in writing who owns the resource and the various properties of that resource, you are not only establishing an overview of all the data used by your business, you’re also establishing a network that prevents data duplication.

A good amount of data duplication does not come out of need, but rather out of desire for simplicity. A user who needs to routinely request data might simply make a copy of a folder with that data in it rather than finding out who to ask for the relevant information. Likewise, a contact might find it easier to simply copy a file than to make a new file with the information requested.

These issues can be rectified by using a resource map, as this also establishes a clear line of responsibility to the content owner. When a user wonders “who is taking care of this customer?”, they can simply look at the resource map, see that the owner of the data concerning “customer allocation” is managed by a certain employee, and then they can request that data from the employee directly. If they routinely need access to this file, they can then ask that specific employee to share the data with them by networking a drive or allowing VPN access to the resource, rather than simply copying it and emailing it to the requester.

Consider Legal Implications

In the IT space, workers often forget that litigation is unfortunately a huge part of life. As part and parcel to the rise of litigation in the current corporate climate, you need to be aware of the potentially astronomical cost of improperly implementing data retention and duplication policy.

Discovery is the process by which a litigant collects and collates data for use in a trial or hearing. Discovery and assessment is a huge cost in litigation, with average estimates of yearly cost between $1-3 Billion USD. The cost is already huge, but poor content duplication and retention policies can drastically increase this cost.

When data needs to be rebuilt or collated from multiple resources, you’re amplifying the initial cost in economic and human terms of the discovery process as a whole. By properly managing remotely duplicated content, you can assure local copies are complete, up to date, and controlled; in the opposite, poorly managed case, you not only lose effort, time and money in making sure your content is right, you also risk faulty information being used against you.

Consider the Nature of Data

What this all boils down to is this — remote duplicated content is dangerous. Imagine you were told to watch a very dangerous animal that, although extremely useful, could do significant damage to your office. Having one of those animals would be stressful enough — now imagine hundreds of these animals, all appearing out of nowhere and without any control.

This is what’s happening when you’re not managing remote content effectively. By allowing duplication of content without responsibility, without oversight, and without tracking, you’re essentially saying “yes, this data is important, and yes, it could be used to hurt us, but make as many copies as you want”. While that’s an oversimplification (there are a good variety of reasons to duplicate content), the sentiment remains valid.


Utilizing a solution like ClaraWipe can make this process much less painful than it could otherwise be. As part of a proper methodology and process for managing data (and securely wiping data when no longer used), ClaraWipe helps ensure remote content is dealt with effectively and efficiently.

Most importantly, ClaraWipe adheres to several legal requirements, including the aforementioned HIPAA as well as the Sarbanes-Oxley Act, the Fair and Accurate Credit Transactions Act (FACTA), and others. As we’ve demonstrated here, adhering to these legal requirements is of the utmost importance, and should be factored as a prime consideration towards which service or application is best suited for your needs.

Resources and Further Reading

An Intelligent Approach to E-discovery

EDRM Guide to De-Duplication

MD5 Hashing: The Foundation of a Defensible E-Discovery Process

Try it for free!