Data Governance is not about Data
Those that have been reading my blogs for a while, know that I have great objections to the term “unstructured” and the way it has been used to describe all information that is text-based, image-based or any other format that does not tend to fit directly into the rows and columns of a relational database. None of that “unstructured” content exists without structure inside and around it, and databases have long moved on from storing just “rows and columns”.
A conversation last night with IDC Analyst @AlysWoodward, (at the excellent IDC EMEA Software Summit in London), prompted me to think about another problem that distinction has created:
Calling that content “unstructured” is a convention invented by analysts and vendors, to distinguish between the set of tools required to manage that content and the tools that service the world of databases and BI tools. The technologies used to manage text-based content and digital media need to be different, as they have a lot of different issues to address.
It has also been a great way of alerting the business users that while they are painstakingly taking care of their precious transactional data, that only represents a about 20% of their IT estate, while all this other “stuff” keeps accumulating uncontrolled and unmanaged on servers, C: drives, email servers, etc.
These artificial distinctions however, are only relevant when you consider HOW you manage that information, the tools and the technologies. These distinctions are not relevant when you are trying to understand WHAT business information you hold and need as an organisation, WHY you are holding it and what policies need to be applied to it, or WHO is responsible for it: The scanned image of an invoice is subject to the same retention requirements as the row-level data extracted from it; the Data Protection act does not give a different privacy rules for emails and for client records kept in your CRM system; a regulatory audit scrutinising executive decisions will not care if the decisions are backed by a policy document or a BI query; you can’t have a different group of people deciding on security policies for confidential information on your ERP system and another group for the product manufacturing instructions held in a document library.
“Data Governance” (or “Information Governance”, or “Content Governance”, I’ve seen all of these terms used) is not an IT discipline, it’s a business requirement. It does not only apply to the data held in databases and data warehouses, it applies to all information you manage as an organisation, regardless of location, format, origin or medium. As a business, you need to understand what information you hold about your customers, your suppliers, your products, your employees. You need to understand where that information lives and where you would go to find it. You need to understand who is responsible for managing it, making sure it’s secure and who has the right to decide that you can get rid of it. Regardless if that information lives in a “structured” or “unstructured” medium, and regardless of the tools or technologies that are needed to implement these governance policies.
The Data Governance Council, has developed an excellent maturity model for understanding how far your organisation has moved in understanding and implementing Data Governance. It covers areas such as “Stewardship”, “Policy”, “Data Risk management”, “Value Creation”, “Information Lifecycle Management”, “Security”, “Metadata”, etc. etc. All of these disciplines are just as relevant in taking control of the data in your databases, as they are for managing the files on your shared drives, your content repositories and the emails on your servers.
I seriously believe that by propagating this artificial divide between “data” and “content”, we are creating policy silos that not only minimise the opportunity for getting value out of our information, but we are introducing even further risks through gaps and inconsistencies. We may have to use different tools for implementing these governance controls on different mediums, but the business should be having ONE consistent governance scheme for all its information.
Open to your thoughts and suggestions, as always!