Archive

Posts Tagged ‘BI’

Data Governance is not about Data

December 1, 2011 3 comments

Those that have been reading my blogs for a while, know that I have great objections to the term “unstructured” and the way it has been used to describe all information that is text-based, image-based or any other format that does not tend to fit directly into the rows and columns of a relational database. None of that “unstructured” content exists without structure inside and around it, and databases have long moved on from storing just “rows and columns”.

A conversation last night with IDC Analyst @AlysWoodward, (at the excellent IDC EMEA Software Summit in London), prompted me to think about another problem that distinction has created:

Calling that content “unstructured” is a convention invented by analysts and vendors, to distinguish between the set of tools required to manage that content and the tools that service the world of databases and BI tools.  The technologies used to manage text-based content and digital media need to be different, as they have a lot of different issues to address.

It has also been a great way of alerting the business users that while they are painstakingly taking care of their precious transactional data, that only represents a about 20% of their IT estate, while all this other “stuff” keeps accumulating uncontrolled and unmanaged on servers, C: drives, email servers, etc.

These artificial distinctions however, are only relevant when you consider HOW you manage that information, the tools and the technologies. These distinctions are not relevant when you are trying to understand WHAT business information you hold and need as an organisation, WHY you are holding it and what policies need to be applied to it, or WHO is responsible for it: The scanned image of an invoice is subject to the same retention requirements as the row-level data extracted from it; the Data Protection act does not give a different privacy rules for emails and for client records kept in your CRM system; a regulatory audit scrutinising executive decisions will not care if the decisions are backed by a policy document or a BI query; you can’t have a different group of people deciding on security policies for confidential information on your ERP system and another group for the product manufacturing instructions held in a document library.

“Data Governance” (or “Information Governance”, or “Content Governance”, I’ve seen all of these terms used) is not an IT discipline, it’s a business requirement. It does not only apply to the data held in databases and data warehouses, it applies to all information you manage as an organisation, regardless of location, format, origin or medium. As a business, you need to understand what information you hold about your customers, your suppliers, your products, your employees. You need to understand where that information lives and where you would go to find it. You need to understand who is responsible for managing it, making sure it’s secure and who has the right to decide that you can get rid of it. Regardless if that information lives in a “structured” or “unstructured” medium, and regardless of the tools or technologies that are needed to implement these governance policies.

The Data Governance Council, has developed an excellent maturity model for understanding how far your organisation has moved in understanding and implementing Data Governance. It covers areas such as “Stewardship”, “Policy”, “Data Risk management”, “Value Creation”, “Information Lifecycle Management”, “Security”, “Metadata”, etc. etc.  All of these disciplines are just as relevant in taking control of the data in your databases, as they are for managing the files on your shared drives, your content repositories and the emails on your servers.

I seriously believe that by propagating this artificial divide between “data” and “content”, we are creating policy silos that not only minimise the opportunity for getting value out of our information, but we are introducing even further risks through gaps and inconsistencies. We may have to use different tools for implementing these governance controls on different mediums, but the business should be having ONE consistent governance scheme for all its information.

Open to your thoughts and suggestions, as always!

Are Content Analytics turning the grubby ECM worm into a butterfly?

Colleagues that have known me for a while, have all heard me bemoaning the use of the term “unstructured” to describe text-based content. Without boring you again to tears, my main issue is that the ECM industry has been largely treating content files as amorphous “unstructured” blobs, ignoring the rich value that is locked inside these content objects.

For the last twenty years or so, ECM systems have been providing a cocoon, where documents and media files have been stored, preserved, secured, archived and generally left to their own devices. But we have been focusing in protecting the whole container, the box, based on the label it has outside and only looking inside the box, one box at a time.

There is change afoot! 2010 looks set to be the year of Content Analytics, which promises to finally unlock the value that is locked inside our gigantic festering ECM repositories. And if the early success signs of IBM’s new Content Analytics software is anything to go by, we are starting to witness a fundamental transformation in the way content is leveraged in large organisations.

Much in the same way that Data Warehousing and Business Intelligence transformed the bland data storage provided by databases in the mid-90s, Content Analytics is today bringing natural language processing, trends analysis, contextual discovery and predictive analytics to the “unstructured” world.

Purists will argue that these algorithms are not new and, to a certain extent, that is true. However, this is the first time that we are seeing these technologies applied easily, (i.e. with off-the-shelf products, without the need of a PhD statistician or linguist by your side…) in real commercial applications, to solve real business problems: Car manufacturers avoiding recalls with early fault trends analysis; Pharmaceutical companies recognising equipment failure trends much earlier; large multi-nationals saving millions in litigation fees, etc.

The ECM industry may still be thriving, but in terms of innovation it has reached a plateau that makes most of us uncomfortable (or complacent… depending on your point of view). Basic content management functionality is being commoditised with CMIS, OpenSource and SharePoint leading the charge. There’s nothing wrong with that, it’s the natural maturity curve for any 20-year technology sector. We’ve created a very big ECM cocoon and we’ve filled it to the brim with content worms. It’s time to innovate again!

Making no apologies for the crass analogy (it is March after all and, allegedly, spring is coming…), Content Analytics are starting to finally poke the cocoon, making the value of content slowly emerge, transformed from archived fodder into real business insight.

%d bloggers like this: