What if I told you that you could reduce risk and costs at the same time? Skeptical? I would be. It sounds like some cheesy marketing ploy chuck full of hidden costs or high upfront costs with low ROI. No, I am not pitching a product or trying to sell you a solution. I am however trying to get your attention. I am talking about data minimization.
Companies collect millions of gigabytes of information, all of which has to be stored, maintained, and secured. There is a general fear of removing data lest it be needed some day but this practice is quickly becoming a problem that creates privacy and compliance risk. Some call it “data hoarding” and I am here to help you clean your closet of unnecessary bits and bytes.
Risk and Costs
The news is full of examples of companies losing data. These companies incur significant cost to shore up their information security and their reputations. In a study by the Ponemon Institute, the estimated cost per record for a data breach in 2009 was $204. Based on this, losing 100,000 records would cost a company over twenty million dollars. It is no wonder that companies are concerned. Those that are not in the news are spending a great deal of money to protect the information they collect.
So why are I collecting this information in the first place? Like abstinence campaigns, the best way to avoid a data breach is to not store the data in the first place. This is where data minimization steps in to reduce such risk. As part of the data minimization effort, organizations need to ask themselves three questions:
- Do I really need to keep this data?
- Would a part of the data be as useful as the whole for my purposes?
- Could less sensitive data be used in place of this data?
Do I really need to keep this data?
The first┬ádata minimization┬áquestion to ask is: do I really need to keep this data? Some data is transitive in nature. It is needed in the moment but it is not needed in the long-term. Transitive data should not be stored or archived. It can simply be removed as soon as the transaction is complete. Optimally, this data should not be stored on the hard disk, but rather be kept in memory while processing the transaction and then flushed to avoid risk of storing this data where it could be later obtained by an unauthorized entity.
Other information such as buying preferences or survey data is collected to be used in aggregation and reporting. The individual responses may not be needed once the data has been aggregated so it should be purged. When analyzing business workflows, it is worth considering implementing a purge process following the aggregation and reporting process.
Effort should be made to periodically remove any records that are no longer relevant. After all, information has a shelf life, an expiration date if you will. The plain fact is that information that is no longer useful to the organization should be removed. This removes the privacy, compliance, eDiscovery or other risk associated with the data and allows organizational resources to be spent elsewhere.
Another instance where you should ask if you really need to keep data is when you have a copy of the data elsewhere. In this case, you do not need to keep the data because it is a duplicate. I understand the need for redundancy but build that into a centralized database system. In this way you can protect a single area but still provide high availability. If you absolutely need distributed systems, consider segmenting the database so that distributed systems only contain the portion of the data you need.
Would a part of the data be as useful as the whole for my purposes?
The second┬ádata minimization┬áquestion to ask is: would a part of the data be as useful as the whole for my purposes? Sometimes a part of the data can be as useful as the whole. Take a Social Security Number (SSN) for example. Storing the last fmy digits of the social may be as useful as storing the entire number and the damage associated with the disclosure of just those digits is minimal compared to the entire SSN. Similarly, a company could store just the last few digits of a credit card number rather than the entire thing.
This area of data minimization is extremely important when working with credit cards and PCI compliance as places where numbers are stored need to be in full compliance with the regulation. This is a risk that compliance officers are eager to mitigate.
Could less sensitive data be used in place of this data?
The third┬ádata minimization┬áquestion you should ask is: could less sensitive data be used in place of this data? Instead of storing a value that is global in nature, like a driver’s license number or SSN, consider storing a customer ID that is only used by your company. This will allow you to identify the customer without needing to store personal information and be greatly helpful in reducing compliance costs for securing data such as PHI (Personal Health Information) in HIPAA or credit card information in PCI-DSS.
Another option would be to store a security question such as a place of birth or mother’s maiden name instead of a password. If passwords must be stored, make sure they are stored as a hash value rather than plain text. Passwords should never be stored as plain text.
To sum it all up, data minimization can reduce the amount of data you need to protect and store,┬áreducing IT costs and information security costs and risk. Three questions can aid in determining what data to prune. Ask yourself (1) Do I really need to keep this data? (2) Would a part of the data be as useful as the whole for my purposes? And (3) Could less sensitive data be used in place of this data?
For further reading
Time for a Data Diet? Deciding What Customer Information to Keep — and What to Toss┬á
Ponemon Study Shows the Cost of a Data Breach Continues to Increase
Security special report: The internal threat
Less Data, More Security