Archive

Posts Tagged ‘IBM_Reblog’

A clouded view of Records and Auto-Classification

When you see Lawrence Hart (@piewords), Christian Walker (@chris_p_walker) and Cheryl McKinnon (@CherylMcKinnon) involved in a debate on Records Management, you know it’s time to pay attention! 🙂

This morning, I was reading Lawrence’s blog titled “Does Records Management Give Content Management a Bad Name?”, which picks on one of the points in Cheryl’s article “It’s a Digital-First World: Five Trends Reshaping Records Management As You Know It”, with some very insightful comments added by Christian.  I started leaving a comment under Lawrence’s blog (which I will still do, pointing back to this) but there are too many points I wanted to add to the debate and it was becoming too long…

So, here is my take:

First of all, I want to move away from the myth that RM is a single requirement. Organisations look to RM tools as the digital equivalent to a Swiss Army Knife, to address multiple requirements:

  • Classification – Often, the RM repository is the only definitive Information Management taxonomy managed by the organisation. Ironically, it mostly reflects the taxonomy needed by retention management, not by the operational side of the business. Trying to design a taxonomy that serves both masters, leads to the huge granularity issues that Lawrence refers to.
  • Declaration – A conscious decision to determine what is a business record and what is not. This is where both the workflow integration and the auto-classification have a role to play, and where in an ideal world we should try to remove the onus of that decision from the hands of the end-user. More on that point later…
  • Retention management – This is the information governance side of the house. The need to preserve the records for the duration that they must legally be retained, move them to the most cost-effective storage medium based on their business value, and actively dispose of them when there is no regulatory or legal reason to retain them any longer.
  • Security & auditability – RM systems are expected to be a “safe pair of hands”. In the old world of paper records management, once you entrusted your important and valuable documents to the records department, you knew that they were safe. They would be preserved and looked after until you ask for them. Digital RM is no different: It needs to provide a safe-haven for important information, guaranteeing its integrity, security, authenticity and availability. Supported by a full audit trail that can withstand legal scrutiny.

Auto-categorisation or auto-classification, relates to both the first and the second of these requirements: Classification (using linguistic, lexical and semantical analysis to identify what type of document it is, and where it should fit into the taxonomy) and Declaration (deciding if this is a business document worthy of declaration as a record). Auto-classification is not new, it’s been available both as a standalone product  and integrated within email and records capture systems for several years. But its adoption has been slow, not for technological reasons, but because culturally both compliance and legal departments are reluctant to accept that a machine can be good enough to be allowed to make this type of decisions. And even thought numerous studies have proven that machine-based classification can be far more accurate and consistent than a room full of paralegals reading each document, it will take a while before the cultural barriers are lifted. Ironically, much of the recent resurgence and acceptance of auto-classification is coming from the legal field itself, where the “assisted review” or “predictive coding” (just a form of auto-classification to you and me) wars between eDiscovery vendors, have brought the technology to the fore, with judges finally endorsing its credibility [Magistrate Judge Peck in Moore v. Publicis Groupe & MSL Group, 287 F.R.D. 182 (S.D.N.Y.2012), approving use of predictive coding in a case involving over 3 million e-mails.].

The point that Christian Walker is making in his comments however is very important: Auto-classification can help but it is not the only, or even the primary, mechanism available for Auto-Declaration. They are not the same thing. To take the records declaration process away from the end-user, requires more than understanding the type of document and its place in a hierarchical taxonomy. It needs the business context around the document, and that comes from the process. A simple example to illustrate this would be a document with a pricing quotation. Auto-classification can identify what it is, but not if it has been sent to a client or formed part of a contract negotiation. It’s that latter contextual fact that makes it a business record. Auto-Declaration from within a line-of-business application, or a process management system is easy: You already know what the document is (whether it has been received externally, or created as part of the process), you know who it relates to (client id, case, process) and you know what stage in its lifecycle it is at (draft, approved, negotiated, signed, etc.). These give enough definitive context to be able to accurately identify and declare a record, without the need to involve the users or resort to auto-classification or any other heuristic decision. That’s assuming, of course, that there is an integration between the LoB/process and the RM system, to allow that declaration to take place automatically.

The next point I want to pick up is the issue of Cloud. I think cloud is a red herring to this conversation. Cloud should be an architecture/infrastructure and procurement/licensing decision, not a functional one. Most large ECM/RM vendors can offer similar functionality hosted on- and off-premises, and offer SaaS payment terms rather than perpetual licensing. The cloud conversation around RM however, comes to its own sticky mess where you start looking at guaranteeing location-specific storage (critical issue for a lot of European data protection and privacy regulation) and when you start looking at the integration between on-premise and off-premise systems (as in the examples of auto-declaration above). I don’t believe that auto-classification is a significant factor in the cloud decision making process.

Finally, I wanted to bring another element to this discussion. There is another RM disruptive trend that is not explicit in Cheryl’s article (but it fits under point #1) and it addresses the third RM requirement above: “In-place” Retention Management. If you extract the retention schedule management from the RM tool and architect it at a higher logical level, then retention and disposition can be orchestrated across multiple RM repositories, applications, collaboration environments and even file systems, without the need to relocate the content into a dedicated traditional RM environment. It’s early days (and probably a step too far, culturally, for most RM practitioners) but the huge volumes of currently unmanaged information are becoming a key driver for this approach. We had some interesting discussions at the IRMS conference this year (triggered partly because of IBM’s recent acquisition of StoredIQ, into their Information Lifecycle Governance portfolio) and James Lappin (@JamesLappin) covered the concept in his recent blog here: The Mechanics on Manage-In-Place Records Management Tools. Well worth a read…

So to summarise my points: RM is a composite requirement; Auto-Categorisation is useful and is starting to become legitimate. But even though it can participate, it should not be confused with Auto-Declaration of records;  “Cloud” is not a functional decision, it’s an architectural and commercial one.

Advertisements

I buy, sell, market, service… When did ECM become a Monte Carlo celeb?

P1030993sI am writing this at 40,000 feet, on a morning flight to Nice, final destination Monte-Carlo, for what promises to be a very busy 4-day event. The European leg of IBM’s Smarter Commerce Global Summit runs from 17-20 June at the Grimaldi Forum in Monaco, and in a strange twist of fate I am neither a speaker nor an attendee. I am staff!

The whole event is structured around the four commerce pillars of IBM’s Smarter Commerce cycle: Buy, Sell, Market and Service. Each pillar represents a separate logical track at the event, covering the software, services and customer stories.

Enough with the corporate promo already, I hear you say, where does Enterprise Content Management come into this? Surely, SmarterCommerce is all about retail, transactional systems, procurement, supply chain, CRM and marketing campaign tools?

Yes and no. It’s true that in the fast moving, high volume commercial transaction world, these tools share the limelight. But behind every new promotion, there is a marketing campaign review; behind every supplier and distributor channel, there is a contract negotiation; behind every financial transaction there is compliance; behind every customer complaint there is a call centre; and behind every customer loyalty scheme, there is an application form: ECM underpins every aspect of Commerce. From the first approach to a new supplier to the friendly resolution of a loyal customer’s problem, there is a trail of communication and interaction, that needs to be controlled, managed, secured and preserved. Sometimes paper-based, but mostly electronic.

ECM participates in all commerce cycles: Buy (think procurement contracts and supplier purchase orders and correspondence), Sell (invoices, catalogues, receipts, product packaging, etc.), Market (collateral review & approval, promotion compliance, market analysis, etc.).

But the Service cycle is where ECM has the strongest contribution, and its role goes much beyond providing a secure repository for archiving invoices and compliance documents: The quality, speed and efficiency of customer service, relies on understanding your customer. It relies on knowing what communication you have previously had with your customer or supplier (regardless of the channel they chose), it relies on understanding their sentiment about your products, it relies on anticipating and quickly resolving their requests and their problems.

As a long-standing ECM advocate, I have had the privilege of leading the Service track content at this year’s IBM Smarter Commerce Global Summit in Monaco. A roller-coaster two month process, during which we assembled over 250 breakout sessions for the event, covering all topics related to commerce cycles, and in particular for customer service: Advanced Case management for handling complaints and fraud investigations; Content Analytics for sentiment analysis on social media; Mobile interaction monitoring, to optimise the user’s experience; Channel-independent 360 degree view of customer interaction; Digitising patient records to minimise hospital waiting times; Paperless, on-line billing; Collaboration tools to maximise the responsiveness of support staff; and many more.

A global panel of speakers, with a common goal: putting the customer at the very centre of the commercial process and offering the best possible experience with the most efficient tools.

More comments after the event…

Content Obesity – Part 2:Treatment

(…continued from Content Obesity – Part 1: Diagnosis)

You can’t, and don’t want to, stop data growth.

The growth of digital volume has been instrumental in driving major operational and cultural change in today’s business. Better, more personalised customer interaction; Insight from BigData business analytics;  Social media and collaboration;  effective training and multi-media marketing, all rely on the flow of much higher volumes of information through the organisation. Not taking advantage of this would make your organisation less competitive.

So, if reducing the volume of data being consumed is not an option, how else can you manage Content Obesity? There are two approaches to this:

Managing the symptoms

There are some key technologies that help alleviate some of the symptoms of content obesity. These, in our human analogy, are the equivalent of liposuction and nip-and-tuck.

  • De-duplication can identify and remove multiple copies of identical documents. It is only effective if you can apply it across all your document stores (ECM systems, Records management, Shared file drive, personal file drives, SharePoint, email servers, etc.). This rarely happens, and when it does, it is usually restricted to one or two of these sources and focuses only on files, not structured data.
  • Archiving and tiered storage Being able to select the most appropriate storage type for archived data, can have a positive impact on reducing storage costs. Not everything needs to be stored in expensive high-availability devices. A lot of the organisation’s data can sit on lower cost equipment, that can be restored from backups in hours, or days, rather than instantly. But how do you decide which information goes where? Most organisations will use this expensive high-availability storage for core systems, regardless of the age or significance of the date stored by these systems, as there is no easy way to apply policies at a granular level. There is certainly no way to map those logical “shared” network drives, where the majority of documents is stored, to tiered storage.
  • Compression. There are storage systems that use very sophisticated algorithms to reduce the physical space required, by compressing the data when stored and de-compressing it when it needs to be used. These are also expensive and require additional computing power to be able to maintain reasonable speeds in the compressing and de-compressing process.

All of these techniques offer some relief, but the relief is marginal, if it’s not driven by a unified policy, and they do not address the fundamental issue: Whilst they temporarily reduce the impact of storage cost, they do not curb the information growth rate.

They also do not address any of the compliance or legal risks associated with content obesity: The same logical volume of data needs to be preserved, analysed and delivered to litigation and the same effort is required to manually manage the multiple retention policies and respond to regulatory challenges.

Treating the disease

In order to properly resolve content obesity, we need to consider the organisation’s metabolism: How quickly information is digested, which nutrients (value) can be extracted from content and how the organisation disposes of the waste.

The key question to ask is: “How much of this content do organisations actually need to keep?”, Discussions with our customers indicate that an average of 70% of all retained data, is obsolete! (the actual number will vary somewhat by organisation, but I’ll use the 70%/30% analogy for the purposes of this article) This represents information that is duplicated, it is outdated, it has become irrelevant or has no business value. Or, it is content that can be readily obtained or reproduced from other sources.

The problem, however, is that nobody within the organisation knows which 70% of the data is obsolete. So nobody has the knowledge, or the authority, to allow that content to be deleted. The criteria for defining or identifying which information that 70% represents, are virtually impossible to determine systemically.

A more drastic and more realistic approach is required, to provide a permanent solution to the problem.

The concept behind treating Content Obesity is simple: If, and only if, the organisation was able to identify the 30% of information which they need to keep then, by definition, any information that falls outside that, could be legitimately deleted.

If this level of content metabolism could be controlled automatically, regularly, and effectively, it would free up critical IT storage resources and the corresponding budget that can be used to invest in growth projects instead.

What organisations need, is the equivalent of a Thyroid gland: A centralised Information Lifecycle Governance mechanism, that monitors the all the different retention requirements, regulates the content metabolism and drives a digestive system that extracts the value from the content and disposes of all the waste. Most organisations do not have such a regulating organ, or function, at all.

Sounds simple enough, but how can you create a centralised policy that determines precisely, which 30% of the content, needs to be preserved?

Studies conducted by the CGOC (Compliance, Governance and Oversight Council), have shown that there are only three key reasons why companies need to preserve data for any length of time:

  • Regulatory obligation – controlled by Records Managers
  • Litigation – controlled by the Legal department
  • Business Utility – controlled by each business function or department.

These are the three groups in the organisation that are responsible for the metabolic rate of content. Yet these groups rarely connect with each other, do not use the same terminology and, certainly, never had common policies and control mechanisms that they can communicate to IT. The legal group issues data preservation orders (legal holds) to custodians. Records Managers define taxonomies, fileplans and retention schedules, and task the business to abide by them. Business functions have more important things to do (like… keeping the business running) and, frankly, don’t have much appetite for understanding, let alone complying with, either legal hold orders or retention schedules. Business functions need the correct information to be available to them, at the right time, to make decisions on and to service their customers.

And who has the responsibility to physically protect, or to destroy, digital information? The IT group, which is not usually part of any of the conversations above.

At the heart of an Information Lifecycle Governance function, is a unified policy engine. A common logical repository, where Records Managers can document, manage and communicate their multiple retention schedules and produce consolidated fileplans; the Legal Group, can manage its ongoing legal matters, issue legal hold and preservation orders and communicate with custodians and the other parts of the business; IT and the business functions can identify and document which information is stored in each device and each application, and the business requirements for information preservation. A place where all of these disparate groups can determine the value that each information asset brings to the business – for both structured and unstructured information.

Once this thyroid function is established to control the content metabolism, it is key to connect it to the mechanisms that physically manage information – the “organs”. Connecting this policy engine to the document collection tools and repositories, records management systems, structured data archives, eDiscovery tools, tiered storage archives, etc., provides the instrumentation which is needed to monitor the data growth, execute the policies and provide the auditability and defencibility that is needed to justify regular content purging.

Conclusion

There is no quick fix for Content Obesity and, like medical obesity, it requires a fundamental change in behaviour. But it is achievable. Organisations need to design a governance model that transparently joins the dots: The business needs to describe the information entities, based on their value and utility, mapping them to the asset, system and application descriptions that IT understands. Legal can then manage their legal holds and eDiscovery, based on knowing what information exists, what part of the business it relates to, and where information lives, not only by custodians. Compliance groups can then consolidate their records management directives and apply a unified taxonomy and disposition schedule, relevant to the territory and business function. When all of these policies are systematically connected to the data sources, IT can accurately identify what information should be preserve and, by definition then, what information can be justifiably disposed of. (IBM calls this process Defensible Disposal).

Content Obesity – Part1:Diagnosis

Obesity: a medical condition in which excess body fat has accumulated to the extent that it may have an adverse effect on health, leading to reduced life expectancy and/or increased health problems

Content Obesity: An organisational condition in which excess redundant information has accumulated to the extent that it may have an adverse effect on business efficiency, leading to depleted budgets, reduced business agility and/or increased legal and compliance risks.

First of all, let me apologise to all the people who are currently suffering from obesity, or who are supporting friends and family that do. I have no intention of making fun of obese people and I have great sympathy and respect for the pain they are going through. I lost my best friend to a heart attack. He was obese.

In a recent conversation with a colleague, about Information Lifecycle Governance and Defensible Disposal, I made a casual remark about an organisation suffering from Content Obesity. I have to admit that it was an off-the-cuff remark, but it conveyed very succinctly the picture I was trying to paint. Since then, the more I think about this analogy the more sense it makes.

People are not born obese, they become obese. And they don’t become obese overnight, it’s a slow, steady process. Unless it’s addressed early, the problem grows in very predictable stages: gaining weight, being overweight, being obese, being morbidly obese, dying. Most people, however, do not want to acknowledge the problem until it is too late. They live in denial, they make excuses, they make jokes. Until it’s often too late to reverse the process.

Organisations consume and generate content at an incredible rate: IDC’s Digital Universe study (2011), predicts an information growth factor of 50x between 2010 and 2020. Just to give that figure some context: If an average grown up person would grow at the same rate, they would weigh 3.5 tons by 2020!. Studies we conducted with our own customers, puts the annual growth rate at a slightly more conservative figure of 35-40% per year, which is still significant.

We love our digital content these days, we can’t get enough!

We all create office files and our presentations are growing larger, our email rate is not slowing down (we have several accounts each), we communicate with our customers electronically more than ever before, we collaborate inside and outside the firewall, we engage in social media, we text, we document life with our mobile phones’ cameras and we use YouTube videos extensively for marketing and education. We collect and analyse blogs and conferences and twitter streams. We analyse historical transactional data and we create new predictive ones. And if collecting our own streams is not enough, we also collect those of our competitors so that we can analyse them too. Our electricity meter collects data, our car collects data, our traffic sensors collect data, our mobile phones collect data, our supermarkets collect data. We have an average of two game consoles per family (all of which connect to the internet), we watch high-definition TV, from every fixed or portable device that has a screen, our kids have mobile phones, and PSPs and DSs and laptops. We have our home computer, our work laptop, our BYOD tablet and our smart phones. Our average holiday yields over 500 pictures, all of which are 12 Megapixel. And the kids take another 500 with their camera… In fact we generate so much digital data, that we now have special ways of handling it with Big Machines that manage Big Data to give Big Insights. And that is all wonderful, and it all exploded in the last five years.

I’ll say it again: We love digital content.

Going back to my health analogy, you could say that we gorge on content. The problem is, we are now overweight with content, since most of that content has been accumulated without any particular thought of organisation or governance. So today, we can’t lose weight, we can’t clean it up because IT doesn’t know what it is, where it is, who owns it or if it’s of any use to anyone. And, frankly, because it’s far too much hassle and we have better things to do.  It’s all digital so… “storage is cheap, we’ll just buy some more storage”: A staggering 78% of respondents to another recent study, stated that their strategy for dealing with data growth was to “buy more storage”!

Newsflash: Storage is not cheap! By the time you create your high-availability, tier-1 storage with 3 generations of backup tapes and put it in a data centre, pay for electricity and air-conditioning, and pay people to manage it, it’s no longer cheap. Even if storage prices go down by 20% per year, if your data grows at 40%, you are still 20% worse off… Simple maths!

Most organisations are still in denial about the problem. The usual answer to the question “How much storage do you currently have and how much does it grow each year?” is “We don’t really know, we never measured it that way”. Well, I would argue that whoever is writing the cheque to the storage vendors every year, ought to know.

Fortunately, for large multinational organisations (banks, pharmaceuticals, energy, etc), the penny has finally dropped. Growth rates of 40%, on a storage estate of 20 Petabytes, translates to an increase of dozens of millions of storage costs per year. In an economy where IT budgets are shrinking, this is not a pleasant conversation to have with your CFO. These organisations are now self-diagnosed as Content Obese, and are desperately looking for ways to curb the growth, before they become Morbidly Obese.

And, similarly to the human disease, Content Obesity has side effects. Even if you could somehow overcome (or overlook, or sweep under the carpet…) the cost implications, it creates huge health risks for the organisation.

Firstly, it creates risks for the Business. Unruly, high volumes of content clog up processes, the arteries of the business. Content that is lost in the bulk, uncategorised and not readily available to support decision making, is slowing down the flow of information across the organisation. Content that is obsolete or outdated can create confusion and lead to incorrect decisions. Unmanaged content volumes do not lend themselves to fast changing business models, marketing innovation, shared services or better customer support. And by consuming huge amount of IT capital, they also stifle investment and innovation into new business services.

Secondly, it creates a huge Legal risk. All electronic content in the organisation, is potentially discoverable. The legal group has a duty to preserve information that is relevant to litigation. When information is abundant and not governed, the only method that the legal group has to identify and preserve it, is by notifying all people that may have access to it – custodians – asking them to protect it. This approach is inaccurate, expensive and time consuming. And when it comes to delivering that information to opposing parties or the courts, the organisation has to sift through these huge volumes of content to identify what is actually relevant, often incurring huge legal fees in the process. (Unashamed plug: If you are interested to find out more about the role of Information governance in UK civil litigation, I recommend this excellent IBM paper authored by Chris Dale, respected author of the eDisclosure Information Project)

Finally, Content Obesity creates a huge Compliance risk. Different regulations dictate that records are kept for defined periods of time. Privacy and data protection regulations, dictate that certain types of content are disposed of, after defined periods of time. Record Managers often have to comply with multiple (and often conflicting) regulations, from multiple jurisdictions, affecting hundreds of systems and millions of records. An ever-growing volume of unclassified content, means that records cannot be correctly identified, disposition schedules cannot be executed consistently and policies remain on a binder on the shelf (or in a PDF file somewhere on the intranet). Regulatory audits become impossible, wasting valuable resources and often leading to significant fines (As the regulator put it in one of many examples: “These failings were made worse by their inability to determine the areas in which the breakdown in its record keeping systems had occurred“)

So, how much of that content do organisations actually need to keep? And who has the responsibility and the right to get rid of it?

Next: Content Obesity – Part 2: Treatment

%d bloggers like this: