Content Obesity – Part 2:Treatment
(…continued from Content Obesity – Part 1: Diagnosis)
You can’t, and don’t want to, stop data growth.
The growth of digital volume has been instrumental in driving major operational and cultural change in today’s business. Better, more personalised customer interaction; Insight from BigData business analytics; Social media and collaboration; effective training and multi-media marketing, all rely on the flow of much higher volumes of information through the organisation. Not taking advantage of this would make your organisation less competitive.
So, if reducing the volume of data being consumed is not an option, how else can you manage Content Obesity? There are two approaches to this:
Managing the symptoms
There are some key technologies that help alleviate some of the symptoms of content obesity. These, in our human analogy, are the equivalent of liposuction and nip-and-tuck.
- De-duplication can identify and remove multiple copies of identical documents. It is only effective if you can apply it across all your document stores (ECM systems, Records management, Shared file drive, personal file drives, SharePoint, email servers, etc.). This rarely happens, and when it does, it is usually restricted to one or two of these sources and focuses only on files, not structured data.
- Archiving and tiered storage Being able to select the most appropriate storage type for archived data, can have a positive impact on reducing storage costs. Not everything needs to be stored in expensive high-availability devices. A lot of the organisation’s data can sit on lower cost equipment, that can be restored from backups in hours, or days, rather than instantly. But how do you decide which information goes where? Most organisations will use this expensive high-availability storage for core systems, regardless of the age or significance of the date stored by these systems, as there is no easy way to apply policies at a granular level. There is certainly no way to map those logical “shared” network drives, where the majority of documents is stored, to tiered storage.
- Compression. There are storage systems that use very sophisticated algorithms to reduce the physical space required, by compressing the data when stored and de-compressing it when it needs to be used. These are also expensive and require additional computing power to be able to maintain reasonable speeds in the compressing and de-compressing process.
All of these techniques offer some relief, but the relief is marginal, if it’s not driven by a unified policy, and they do not address the fundamental issue: Whilst they temporarily reduce the impact of storage cost, they do not curb the information growth rate.
They also do not address any of the compliance or legal risks associated with content obesity: The same logical volume of data needs to be preserved, analysed and delivered to litigation and the same effort is required to manually manage the multiple retention policies and respond to regulatory challenges.
Treating the disease
In order to properly resolve content obesity, we need to consider the organisation’s metabolism: How quickly information is digested, which nutrients (value) can be extracted from content and how the organisation disposes of the waste.
The key question to ask is: “How much of this content do organisations actually need to keep?”, Discussions with our customers indicate that an average of 70% of all retained data, is obsolete! (the actual number will vary somewhat by organisation, but I’ll use the 70%/30% analogy for the purposes of this article) This represents information that is duplicated, it is outdated, it has become irrelevant or has no business value. Or, it is content that can be readily obtained or reproduced from other sources.
The problem, however, is that nobody within the organisation knows which 70% of the data is obsolete. So nobody has the knowledge, or the authority, to allow that content to be deleted. The criteria for defining or identifying which information that 70% represents, are virtually impossible to determine systemically.
A more drastic and more realistic approach is required, to provide a permanent solution to the problem.
The concept behind treating Content Obesity is simple: If, and only if, the organisation was able to identify the 30% of information which they need to keep then, by definition, any information that falls outside that, could be legitimately deleted.
If this level of content metabolism could be controlled automatically, regularly, and effectively, it would free up critical IT storage resources and the corresponding budget that can be used to invest in growth projects instead.
What organisations need, is the equivalent of a Thyroid gland: A centralised Information Lifecycle Governance mechanism, that monitors the all the different retention requirements, regulates the content metabolism and drives a digestive system that extracts the value from the content and disposes of all the waste. Most organisations do not have such a regulating organ, or function, at all.
Sounds simple enough, but how can you create a centralised policy that determines precisely, which 30% of the content, needs to be preserved?
- Regulatory obligation – controlled by Records Managers
- Litigation – controlled by the Legal department
- Business Utility – controlled by each business function or department.
These are the three groups in the organisation that are responsible for the metabolic rate of content. Yet these groups rarely connect with each other, do not use the same terminology and, certainly, never had common policies and control mechanisms that they can communicate to IT. The legal group issues data preservation orders (legal holds) to custodians. Records Managers define taxonomies, fileplans and retention schedules, and task the business to abide by them. Business functions have more important things to do (like… keeping the business running) and, frankly, don’t have much appetite for understanding, let alone complying with, either legal hold orders or retention schedules. Business functions need the correct information to be available to them, at the right time, to make decisions on and to service their customers.
And who has the responsibility to physically protect, or to destroy, digital information? The IT group, which is not usually part of any of the conversations above.
At the heart of an Information Lifecycle Governance function, is a unified policy engine. A common logical repository, where Records Managers can document, manage and communicate their multiple retention schedules and produce consolidated fileplans; the Legal Group, can manage its ongoing legal matters, issue legal hold and preservation orders and communicate with custodians and the other parts of the business; IT and the business functions can identify and document which information is stored in each device and each application, and the business requirements for information preservation. A place where all of these disparate groups can determine the value that each information asset brings to the business – for both structured and unstructured information.
Once this thyroid function is established to control the content metabolism, it is key to connect it to the mechanisms that physically manage information – the “organs”. Connecting this policy engine to the document collection tools and repositories, records management systems, structured data archives, eDiscovery tools, tiered storage archives, etc., provides the instrumentation which is needed to monitor the data growth, execute the policies and provide the auditability and defencibility that is needed to justify regular content purging.
There is no quick fix for Content Obesity and, like medical obesity, it requires a fundamental change in behaviour. But it is achievable. Organisations need to design a governance model that transparently joins the dots: The business needs to describe the information entities, based on their value and utility, mapping them to the asset, system and application descriptions that IT understands. Legal can then manage their legal holds and eDiscovery, based on knowing what information exists, what part of the business it relates to, and where information lives, not only by custodians. Compliance groups can then consolidate their records management directives and apply a unified taxonomy and disposition schedule, relevant to the territory and business function. When all of these policies are systematically connected to the data sources, IT can accurately identify what information should be preserve and, by definition then, what information can be justifiably disposed of. (IBM calls this process Defensible Disposal).