Archive

Posts Tagged ‘Data Governance’

Looking for Mr. Right – Revisited

I was reading a recent article by Chris Dale, where he gave an overview of Debra Logan‘s “Why Information Governance fails and how to make it succeed” keynote speech. It’s difficult to disagree with most points made in the session, but one point in particular caught my attention. Chris transcribes Debra’s thoughts as:

“…we are at the birth of a new profession, with hybrid players who have multiple strands of skills and experience. You need people with domain expertise, not just about apps and servers but data and information. The usual approach is to take people who already have jobs and give them something else to do on top or instead. You need to find people who understand the subject and teach them to attach metadata to their material, to understand document retention, perhaps even send them to law school to turn them into a legal/IT/subject matter expert hybrid.”

In parallel, I have also had several conversations, recently, relating to AIIM‘s new “Certified Information Professional” accreditation (which I am proud to possess, having passed their stringent exam). It is a valiant attempt to recognise individuals who have enough breadth of skills in Information Management, to cover most of the requirements of Debra’s “new profession“.

These two – relatively unrelated – events, prompted me to go and re-discover an article that I wrote for AIIM’s eDoc online magazine, published sometime around June 2005. Unfortunately the article is no longer online, so apologies for  embedding it here, in its entirety:

Looking for Mr. Right

Why advances in ECM technology have generated a serious skills gap in the market.

ECM technologies have advanced significantly in the last ten years. The convergence of Document/Content Management, Workflow, Searching, web technologies, records management, email capture, imaging and intelligent forms processing, has created a new information management environment that is much more aware of the value of information assets.

Most analysts agree that we are entering a new phase in ECM, where medium and large size organizations are looking to invest in ECM as a strategic enterprise deployment in order to leverage their investment in multiple business areas – especially where improving operational efficiencies and compliance are the key drivers, as these tend to have a more horizontal appeal across the organization.

But as ECM technologies are starting to become pervasive, there is a lot of confusion on the operational management of these systems. Technically, the IT department is responsible for ensuring the systems are up and running as optimally as the technology permits. But whose responsibility is it, to make sure that these systems are configured appropriately and that the information held within them is managed correctly as a valuable asset?

Think about your own company: Who decides how information is managed across your organization? With ECM, you are generating a virtual library of information that should be used and leveraged consistently across departments, geographical boundaries, organizational structures and individual responsibility areas. And if you include Business Process Management in the picture, you are also looking for common, accountable and integrated business practices across the same boundaries. Does this responsibility sit within the business community, the IT department or as a separate internal service function? And what skills would be required to support this?

There is a new role requirement emerging, which is not very well defined or understood at the moment. There is a need for an individual or a group, depending on the size of the organization, who can combine the following capabilities:

  • identify what information should be managed and how, based on its intrinsic value and legal status
  • implement mechanisms for filtering and purging redundant information
  • design and maintain information structures
  • define metadata and classification schemes and policies
  • design folder structures and record management file plans
  • define indexing topologies, thesauri and search strategies
  • implement policies and timelines for content lifecycle management
  • devise and implement record retention and disposition strategies
  • define security models, access controls and auditing requirements
  • devise schemes for the most efficient location of information across distributed architectures
  • devise content and media refresh strategies for long-term archiving
  • consolidate information management practices across multiple communication channels: e.g. email, web, fax, instant messaging, SMS, VoIP
  • consolidate taxonomies, indexing schemes and policies across organizational structures
  • etc.

And all of this, for different business environments and different vertical needs with a good understanding of both business requirements and the capabilities offered by the technology –  someone who can comfortably bridge the gap between the business requirements and the IT department.

People who can effectively combine the skills of librarian, administrator, business analyst, strategist and enterprise architect are extremely rare to find. If you can find one, hire them today!

The closest title one can use for this role today is “Information Architect” although job descriptions with that title differ significantly. More importantly, people with this collective skill set are very difficult to find today and even more difficult to train since a lot of “best practices” in this area are not established or documented.

This is a wakeup call for universities, training agencies, consultants and people wanting to re-skill: While the ECM technology itself is being commoditised, more and more application areas are opening up which will require these specialist skills. Companies need more people with these capabilities and they need them today. Without them, successful ECM deployments will remain difficult and expensive to achieve.

The more pervasive ECM becomes as an infrastructure discipline, the bigger the skill gap will become, unless we start addressing this today.

Apart from feeling slightly proud that I highlighted in June 2005 something that Gartner is raising as an issue today, this doesn’t reassure me at all: 7 years have passed and Debra Logan is (and organisations are…) still looking for Mr. Right!

I am happy that Information Governance has finally come to the forefront as an issue, and that AIIM’s CIP certification is making some strides in helping the match-making process.

But I really hoped we would have come a bit further by now…

Advertisements

Content Obesity – Part 2:Treatment

(…continued from Content Obesity – Part 1: Diagnosis)

You can’t, and don’t want to, stop data growth.

The growth of digital volume has been instrumental in driving major operational and cultural change in today’s business. Better, more personalised customer interaction; Insight from BigData business analytics;  Social media and collaboration;  effective training and multi-media marketing, all rely on the flow of much higher volumes of information through the organisation. Not taking advantage of this would make your organisation less competitive.

So, if reducing the volume of data being consumed is not an option, how else can you manage Content Obesity? There are two approaches to this:

Managing the symptoms

There are some key technologies that help alleviate some of the symptoms of content obesity. These, in our human analogy, are the equivalent of liposuction and nip-and-tuck.

  • De-duplication can identify and remove multiple copies of identical documents. It is only effective if you can apply it across all your document stores (ECM systems, Records management, Shared file drive, personal file drives, SharePoint, email servers, etc.). This rarely happens, and when it does, it is usually restricted to one or two of these sources and focuses only on files, not structured data.
  • Archiving and tiered storage Being able to select the most appropriate storage type for archived data, can have a positive impact on reducing storage costs. Not everything needs to be stored in expensive high-availability devices. A lot of the organisation’s data can sit on lower cost equipment, that can be restored from backups in hours, or days, rather than instantly. But how do you decide which information goes where? Most organisations will use this expensive high-availability storage for core systems, regardless of the age or significance of the date stored by these systems, as there is no easy way to apply policies at a granular level. There is certainly no way to map those logical “shared” network drives, where the majority of documents is stored, to tiered storage.
  • Compression. There are storage systems that use very sophisticated algorithms to reduce the physical space required, by compressing the data when stored and de-compressing it when it needs to be used. These are also expensive and require additional computing power to be able to maintain reasonable speeds in the compressing and de-compressing process.

All of these techniques offer some relief, but the relief is marginal, if it’s not driven by a unified policy, and they do not address the fundamental issue: Whilst they temporarily reduce the impact of storage cost, they do not curb the information growth rate.

They also do not address any of the compliance or legal risks associated with content obesity: The same logical volume of data needs to be preserved, analysed and delivered to litigation and the same effort is required to manually manage the multiple retention policies and respond to regulatory challenges.

Treating the disease

In order to properly resolve content obesity, we need to consider the organisation’s metabolism: How quickly information is digested, which nutrients (value) can be extracted from content and how the organisation disposes of the waste.

The key question to ask is: “How much of this content do organisations actually need to keep?”, Discussions with our customers indicate that an average of 70% of all retained data, is obsolete! (the actual number will vary somewhat by organisation, but I’ll use the 70%/30% analogy for the purposes of this article) This represents information that is duplicated, it is outdated, it has become irrelevant or has no business value. Or, it is content that can be readily obtained or reproduced from other sources.

The problem, however, is that nobody within the organisation knows which 70% of the data is obsolete. So nobody has the knowledge, or the authority, to allow that content to be deleted. The criteria for defining or identifying which information that 70% represents, are virtually impossible to determine systemically.

A more drastic and more realistic approach is required, to provide a permanent solution to the problem.

The concept behind treating Content Obesity is simple: If, and only if, the organisation was able to identify the 30% of information which they need to keep then, by definition, any information that falls outside that, could be legitimately deleted.

If this level of content metabolism could be controlled automatically, regularly, and effectively, it would free up critical IT storage resources and the corresponding budget that can be used to invest in growth projects instead.

What organisations need, is the equivalent of a Thyroid gland: A centralised Information Lifecycle Governance mechanism, that monitors the all the different retention requirements, regulates the content metabolism and drives a digestive system that extracts the value from the content and disposes of all the waste. Most organisations do not have such a regulating organ, or function, at all.

Sounds simple enough, but how can you create a centralised policy that determines precisely, which 30% of the content, needs to be preserved?

Studies conducted by the CGOC (Compliance, Governance and Oversight Council), have shown that there are only three key reasons why companies need to preserve data for any length of time:

  • Regulatory obligation – controlled by Records Managers
  • Litigation – controlled by the Legal department
  • Business Utility – controlled by each business function or department.

These are the three groups in the organisation that are responsible for the metabolic rate of content. Yet these groups rarely connect with each other, do not use the same terminology and, certainly, never had common policies and control mechanisms that they can communicate to IT. The legal group issues data preservation orders (legal holds) to custodians. Records Managers define taxonomies, fileplans and retention schedules, and task the business to abide by them. Business functions have more important things to do (like… keeping the business running) and, frankly, don’t have much appetite for understanding, let alone complying with, either legal hold orders or retention schedules. Business functions need the correct information to be available to them, at the right time, to make decisions on and to service their customers.

And who has the responsibility to physically protect, or to destroy, digital information? The IT group, which is not usually part of any of the conversations above.

At the heart of an Information Lifecycle Governance function, is a unified policy engine. A common logical repository, where Records Managers can document, manage and communicate their multiple retention schedules and produce consolidated fileplans; the Legal Group, can manage its ongoing legal matters, issue legal hold and preservation orders and communicate with custodians and the other parts of the business; IT and the business functions can identify and document which information is stored in each device and each application, and the business requirements for information preservation. A place where all of these disparate groups can determine the value that each information asset brings to the business – for both structured and unstructured information.

Once this thyroid function is established to control the content metabolism, it is key to connect it to the mechanisms that physically manage information – the “organs”. Connecting this policy engine to the document collection tools and repositories, records management systems, structured data archives, eDiscovery tools, tiered storage archives, etc., provides the instrumentation which is needed to monitor the data growth, execute the policies and provide the auditability and defencibility that is needed to justify regular content purging.

Conclusion

There is no quick fix for Content Obesity and, like medical obesity, it requires a fundamental change in behaviour. But it is achievable. Organisations need to design a governance model that transparently joins the dots: The business needs to describe the information entities, based on their value and utility, mapping them to the asset, system and application descriptions that IT understands. Legal can then manage their legal holds and eDiscovery, based on knowing what information exists, what part of the business it relates to, and where information lives, not only by custodians. Compliance groups can then consolidate their records management directives and apply a unified taxonomy and disposition schedule, relevant to the territory and business function. When all of these policies are systematically connected to the data sources, IT can accurately identify what information should be preserve and, by definition then, what information can be justifiably disposed of. (IBM calls this process Defensible Disposal).

Content Obesity – Part1:Diagnosis

Obesity: a medical condition in which excess body fat has accumulated to the extent that it may have an adverse effect on health, leading to reduced life expectancy and/or increased health problems

Content Obesity: An organisational condition in which excess redundant information has accumulated to the extent that it may have an adverse effect on business efficiency, leading to depleted budgets, reduced business agility and/or increased legal and compliance risks.

First of all, let me apologise to all the people who are currently suffering from obesity, or who are supporting friends and family that do. I have no intention of making fun of obese people and I have great sympathy and respect for the pain they are going through. I lost my best friend to a heart attack. He was obese.

In a recent conversation with a colleague, about Information Lifecycle Governance and Defensible Disposal, I made a casual remark about an organisation suffering from Content Obesity. I have to admit that it was an off-the-cuff remark, but it conveyed very succinctly the picture I was trying to paint. Since then, the more I think about this analogy the more sense it makes.

People are not born obese, they become obese. And they don’t become obese overnight, it’s a slow, steady process. Unless it’s addressed early, the problem grows in very predictable stages: gaining weight, being overweight, being obese, being morbidly obese, dying. Most people, however, do not want to acknowledge the problem until it is too late. They live in denial, they make excuses, they make jokes. Until it’s often too late to reverse the process.

Organisations consume and generate content at an incredible rate: IDC’s Digital Universe study (2011), predicts an information growth factor of 50x between 2010 and 2020. Just to give that figure some context: If an average grown up person would grow at the same rate, they would weigh 3.5 tons by 2020!. Studies we conducted with our own customers, puts the annual growth rate at a slightly more conservative figure of 35-40% per year, which is still significant.

We love our digital content these days, we can’t get enough!

We all create office files and our presentations are growing larger, our email rate is not slowing down (we have several accounts each), we communicate with our customers electronically more than ever before, we collaborate inside and outside the firewall, we engage in social media, we text, we document life with our mobile phones’ cameras and we use YouTube videos extensively for marketing and education. We collect and analyse blogs and conferences and twitter streams. We analyse historical transactional data and we create new predictive ones. And if collecting our own streams is not enough, we also collect those of our competitors so that we can analyse them too. Our electricity meter collects data, our car collects data, our traffic sensors collect data, our mobile phones collect data, our supermarkets collect data. We have an average of two game consoles per family (all of which connect to the internet), we watch high-definition TV, from every fixed or portable device that has a screen, our kids have mobile phones, and PSPs and DSs and laptops. We have our home computer, our work laptop, our BYOD tablet and our smart phones. Our average holiday yields over 500 pictures, all of which are 12 Megapixel. And the kids take another 500 with their camera… In fact we generate so much digital data, that we now have special ways of handling it with Big Machines that manage Big Data to give Big Insights. And that is all wonderful, and it all exploded in the last five years.

I’ll say it again: We love digital content.

Going back to my health analogy, you could say that we gorge on content. The problem is, we are now overweight with content, since most of that content has been accumulated without any particular thought of organisation or governance. So today, we can’t lose weight, we can’t clean it up because IT doesn’t know what it is, where it is, who owns it or if it’s of any use to anyone. And, frankly, because it’s far too much hassle and we have better things to do.  It’s all digital so… “storage is cheap, we’ll just buy some more storage”: A staggering 78% of respondents to another recent study, stated that their strategy for dealing with data growth was to “buy more storage”!

Newsflash: Storage is not cheap! By the time you create your high-availability, tier-1 storage with 3 generations of backup tapes and put it in a data centre, pay for electricity and air-conditioning, and pay people to manage it, it’s no longer cheap. Even if storage prices go down by 20% per year, if your data grows at 40%, you are still 20% worse off… Simple maths!

Most organisations are still in denial about the problem. The usual answer to the question “How much storage do you currently have and how much does it grow each year?” is “We don’t really know, we never measured it that way”. Well, I would argue that whoever is writing the cheque to the storage vendors every year, ought to know.

Fortunately, for large multinational organisations (banks, pharmaceuticals, energy, etc), the penny has finally dropped. Growth rates of 40%, on a storage estate of 20 Petabytes, translates to an increase of dozens of millions of storage costs per year. In an economy where IT budgets are shrinking, this is not a pleasant conversation to have with your CFO. These organisations are now self-diagnosed as Content Obese, and are desperately looking for ways to curb the growth, before they become Morbidly Obese.

And, similarly to the human disease, Content Obesity has side effects. Even if you could somehow overcome (or overlook, or sweep under the carpet…) the cost implications, it creates huge health risks for the organisation.

Firstly, it creates risks for the Business. Unruly, high volumes of content clog up processes, the arteries of the business. Content that is lost in the bulk, uncategorised and not readily available to support decision making, is slowing down the flow of information across the organisation. Content that is obsolete or outdated can create confusion and lead to incorrect decisions. Unmanaged content volumes do not lend themselves to fast changing business models, marketing innovation, shared services or better customer support. And by consuming huge amount of IT capital, they also stifle investment and innovation into new business services.

Secondly, it creates a huge Legal risk. All electronic content in the organisation, is potentially discoverable. The legal group has a duty to preserve information that is relevant to litigation. When information is abundant and not governed, the only method that the legal group has to identify and preserve it, is by notifying all people that may have access to it – custodians – asking them to protect it. This approach is inaccurate, expensive and time consuming. And when it comes to delivering that information to opposing parties or the courts, the organisation has to sift through these huge volumes of content to identify what is actually relevant, often incurring huge legal fees in the process. (Unashamed plug: If you are interested to find out more about the role of Information governance in UK civil litigation, I recommend this excellent IBM paper authored by Chris Dale, respected author of the eDisclosure Information Project)

Finally, Content Obesity creates a huge Compliance risk. Different regulations dictate that records are kept for defined periods of time. Privacy and data protection regulations, dictate that certain types of content are disposed of, after defined periods of time. Record Managers often have to comply with multiple (and often conflicting) regulations, from multiple jurisdictions, affecting hundreds of systems and millions of records. An ever-growing volume of unclassified content, means that records cannot be correctly identified, disposition schedules cannot be executed consistently and policies remain on a binder on the shelf (or in a PDF file somewhere on the intranet). Regulatory audits become impossible, wasting valuable resources and often leading to significant fines (As the regulator put it in one of many examples: “These failings were made worse by their inability to determine the areas in which the breakdown in its record keeping systems had occurred“)

So, how much of that content do organisations actually need to keep? And who has the responsibility and the right to get rid of it?

Next: Content Obesity – Part 2: Treatment

%d bloggers like this: