Alternative security uses for eDiscovery software

A recent breach at the Memorial Sloan-Kettering Cancer Center called attention to the fact that you canÔÇÖt protect data from a breach if you donÔÇÖt know what data your organization possesses.┬á This may sound simple but many organizations do not have a good grasp on what data exists in their organization and whether that data should be protected against disclosure.┬á This makes it difficult to detect a breach and thus, breached data persists in the wild much longer than it could if organizations had a better understanding of the data they manage.

An interesting solution, documented in Data Breach Today, is being used by Franciscan Health System (FHS) in Washington State.┬á FHS has started using an eDiscovery tool, typically used to gather, filter, prepare and evaluate data use in litigation, to gain a big picture on the data they have on their systems.┬á eDiscovery tools allow users to search across a large amount of data to find data of a specific type.┬á In litigation, lawyers ask, ÔÇ£What data is relevant to my case?ÔÇØ and in information security and privacy, the question is, ÔÇ£what sensitive data exists in my company?ÔÇØ ┬áFCS and others have found another use for eDiscovery tools in the information security field.┬á These tools are much further along on the maturity cycle than some recently developed tools.┬á Some eDiscovery tools allow for data visualization such as the Attenex document mapper from FTI that shows a picture of the data in the system by using a series of circles of varying sizes connected together.┬á The circles and connections picture the classifications and relationships between data.

There may be many in an organization that are creating content and some sensitive information may accidentally or intentionally be included in a document.┬á eDiscovery software evaluates the content of files to help identify the data that may be hiding within a document and it can be used for cyber security in addition to litigation.┬á In the case above, Memorial Sloan-Kettering Cancer Center had unencrypted patient information in a set of Microsoft PowerPoint slides that were available online.┬á WhatÔÇÖs worse is that the information was available for six years before it was found.┬á An eDiscovery system could have alerted them to this data breach much sooner.

Data: If you don’t need it, delete it

Organizations are accumulating data at a pace that would cause a hoarder to blush.┬á Just like that old bicycle seat stored in the attic, data is often kept ÔÇ£just in case it may be┬áneeded someday.ÔÇØ┬á This practice, however, comes at a cost.

Some organizations think that it is inexpensive to store data, especially with the steady decline in hard drive prices.┬á The fact is, however, data is expensive to keep.┬á Organizations spend a significant portion of time managing, archiving and securing data.┬á Data is housed on servers, each of which must be maintained.┬á Data is also archived regularly according to the organizationÔÇÖs backup schedule and it is audited and secured against loss.┬á Each of these activities consumes the time (i.e. increases the cost) for those in information management.

Excessive data retention can also pose a risk to an organization in regard to compliance and electronic discovery requirements.┬á Personally identifiable information that is lost could result in significant fines. ┬áIn addition, old document drafts that may not provide organizational value could still damage the organization if disclosed.┬á Data related to litigation is costly to obtain, organize, and produce.┬á Searching through an organizationÔÇÖs legacy data adds additional complexity and cost.

For the above stated reasons, it is important to remove unnecessary data.  A structured approach is necessary to avoid the loss of important data and to provide consistency throughout an organization.  Structure can be accomplished through a data retention policy.   A data retention policy should specify how long certain types of data such as emails, documents, drafts, instant message conversations, or even voice mails should be kept and how the data will be properly disposed of.


At a minimum, a data retention policy should contain a scope section that outlines the types of data covered.┬á Examples would be tax records, personal information, business records and legal documents. In addition, the policy will need to spell out how long and in what form each type of document will be retained.┬á Some policies may include guidelines on removal of data – or this may be left to a data destruction policy.

Retention Term

One of the most difficult parts of defining a  data retention policy is specifying the length of time to retain certain types of documents.  Compliance requirements may determine the minimum or maximum length of time while business requirements may stipulate other terms.  Both the compliance and business requirements will need to be considered in defining the duration. The following are some best practices and can be used a starting point in the formation of a data retention policy:

  • Audit documentation and associated financial documents will need to be kept for at least 7 years if there is a SOX requirement. The IRS requires that tax documents be retained for at least 4 years after they were due.
  • The list of hazardous chemicals provided by OSHA contains many substances common in the workplace and data retention policies should define how long documentation of hazardous chemical exposure data will be kept.┬á OSHA requires that such documents be retained for 30 years.
  • The Health Insurance Portability and Accounting Act (HIPAA) requires that information disclosure authorizations, patient requests, business associate contracts and other such covered documents be retained for at least 6 years from the last transaction or 2 years following the patientÔÇÖs death.
  • Exceptions may be made to these recommendations when pending litigation or audits require an information freeze or legal hold for specific data.┬á In these instances, organizations will need to show that they have made reasonable efforts to prevent the destruction of discoverable information.

This article discussed the need for data retention policies and outlined some regulatory requirements that should be included in business retention requirements.   An effective data retention policy can go a long way in reducing data clutter, improving organizational efficiency and reducing risk.  However, defining the policy will not be enough.  Employees will need to be aware of the policy and motivated to follow it.



Data retention policies reduce the risk of data breach

What if I told you that you could reduce risk and costs at the same time? Skeptical? I would be. It sounds like some cheesy marketing ploy chuck full of hidden costs or high upfront costs with low ROI. No, I am not pitching a product or trying to sell you a solution. I am however trying to get your attention. I am talking about data minimization.

Companies collect millions of gigabytes of information, all of which has to be stored, maintained, and secured. There is a general fear of removing data lest it be needed some day but this practice is quickly becoming a problem that creates privacy and compliance risk. Some call it “data hoarding” and I am here to help you clean your closet of unnecessary bits and bytes.


Risk and Costs

The news is full of examples of companies losing data. These companies incur significant cost to shore up their information security and their reputations. In a study by the Ponemon Institute, the estimated cost per record for a data breach in 2009 was $204. Based on this, losing 100,000 records would cost a company over twenty million dollars. It is no wonder that companies are concerned. Those that are not in the news are spending a great deal of money to protect the information they collect.

So why are I collecting this information in the first place? Like abstinence campaigns, the best way to avoid a data breach is to not store the data in the first place. This is where data minimization steps in to reduce such risk. As part of the data minimization effort, organizations need to ask themselves three questions:


  1. Do I really need to keep this data?
  2. Would a part of the data be as useful as the whole for my purposes?
  3. Could less sensitive data be used in place of this data?


Do I really need to keep this data?

The first data minimization question to ask is: do I really need to keep this data? Some data is transitive in nature. It is needed in the moment but it is not needed in the long-term. Transitive data should not be stored or archived. It can simply be removed as soon as the transaction is complete. Optimally, this data should not be stored on the hard disk, but rather be kept in memory while processing the transaction and then flushed to avoid risk of storing this data where it could be later obtained by an unauthorized entity.

Other information such as buying preferences or survey data is collected to be used in aggregation and reporting. The individual responses may not be needed once the data has been aggregated so it should be purged. When analyzing business workflows, it is worth considering implementing a purge process following the aggregation and reporting process.

Effort should be made to periodically remove any records that are no longer relevant. After all, information has a shelf life, an expiration date if you will. The plain fact is that information that is no longer useful to the organization should be removed. This removes the privacy, compliance, eDiscovery or other risk associated with the data and allows organizational resources to be spent elsewhere.

Another instance where you should ask if you really need to keep data is when you have a copy of the data elsewhere. In this case, you do not need to keep the data because it is a duplicate. I understand the need for redundancy but build that into a centralized database system. In this way you can protect a single area but still provide high availability. If you absolutely need distributed systems, consider segmenting the database so that distributed systems only contain the portion of the data you need.


Would a part of the data be as useful as the whole for my purposes?

The second data minimization question to ask is: would a part of the data be as useful as the whole for my purposes? Sometimes a part of the data can be as useful as the whole. Take a Social Security Number (SSN) for example. Storing the last fmy digits of the social may be as useful as storing the entire number and the damage associated with the disclosure of just those digits is minimal compared to the entire SSN. Similarly, a company could store just the last few digits of a credit card number rather than the entire thing.

This area of data minimization is extremely important when working with credit cards and PCI compliance as places where numbers are stored need to be in full compliance with the regulation. This is a risk that compliance officers are eager to mitigate.


Could less sensitive data be used in place of this data?

The third┬ádata minimization┬áquestion you should ask is: could less sensitive data be used in place of this data? Instead of storing a value that is global in nature, like a driver’s license number or SSN, consider storing a customer ID that is only used by your company. This will allow you to identify the customer without needing to store personal information and be greatly helpful in reducing compliance costs for securing data such as PHI (Personal Health Information) in HIPAA or credit card information in PCI-DSS.

Another option would be to store a security question such as a place of birth or mother’s maiden name instead of a password. If passwords must be stored, make sure they are stored as a hash value rather than plain text. Passwords should never be stored as plain text.

To sum it all up, data minimization can reduce the amount of data you need to protect and store, reducing IT costs and information security costs and risk. Three questions can aid in determining what data to prune. Ask yourself (1) Do I really need to keep this data? (2) Would a part of the data be as useful as the whole for my purposes? And (3) Could less sensitive data be used in place of this data?

For further reading

Time for a Data Diet? Deciding What Customer Information to Keep — and What to Toss┬á

Ponemon Study Shows the Cost of a Data Breach Continues to Increase

Security special report: The internal threat

Less Data, More Security