Every so often, an idea comes along that stops you in your tracks.
Innovation is happening at the speed of light all around us but most of of the time it consists only of incremental, evolutionary thinking, which takes us a little bit further in the same direction we were going all along. We have become fairly blazé about innovation.
And then you spot something that makes you sit up, pay attention, change direction, and re-think everything. I had one of these moments a few weeks back.
The name “EpyDoc” will probably mean nothing to most of you. Even looking at their existing website I would have dismissed it as a second or third-rate Document Management wannabe. Yet, EpyDoc is launching a new concept in April, that potentially re-defines the whole Data / Content / Information / Process Management industry, as we know it today. You know what happens when you mix comets and dinosaurs? It is that revolutionary.
I have lost track of the number of times over the years that I’ve moaned about the constraints that our current infrastructure is imposing on us:
- The arbitrary segregation of structured and unstructured information [here]
- The inherent synergy of Content and Process management [here]
- The content granularity that stops at the file level [here]
- The security models that protect the container rather than the information [here]
- The lack of governance and lifecycle management of all information, not just records [here]
- The impossibility of defining and predicting information value [here]
…etc. The list goes on. EpyDoc’s “Information Operating System” (a grand, but totally appropriate title), seeks to remove all of these barriers by re-thinking the way we manage information today. Not in small incremental steps, but in a giant leap.
Their approach is so fundamentally different, that I would not do it credit by trying to summarise it here. And if I’m honest, I am still discovering more details behind it. But if you are interested in having a taste on what the future of information management might look like in 5-10 years, I would urge you to read this 10-segment blog set which sets the scene, and let me know your thoughts.
And if, while you are reading through, you are, like me, sceptical about the applicability or commercial viability of this approach, I will leave you with a quote that I saw this morning on the tube:
“The horse is here to stay but the automobile is only a novelty – a fad”
(President of the Michigan Savings Bank, 1903)
P.S. Before my pedant friends start correcting me: I know that dinosaurs became extinct at the end of the Cretaceous period, not the Jurassic… 😉
I’ve been wanting to write this article for a while, but I thought it would be best to wait for the deluge of 2014 New Year predictions to settle down, before I try and look a little bit further in the horizon.
The six predictions I discuss here are personal, do not have a specific timescale, and are certainly not based on any scientific method. What they are based on, is a strong gut feel and thirty years of observing change in the Information Management industry.
Some of these predictions are more fundamental than others. Some will have immediate impact (1-3 years), some will have longer term repercussions (10+ years). In the past, I have been very good at predicting what is going to happen, but really bad at estimating when it’s going to happen. I tend to overestimate the speed at which our market moves. So here goes…
Behaviour is the new currency
Forget what you’ve heard about “information being the new currency”, that is old hat. We have been trading in information, in its raw form, for years. Extracting meaningful value however from this information has always been hard, repetitive, expensive and most often a hit-or-miss operation. I predict that with the advance of analytics capabilities (see Watson Cognitive), raw information will have little trading value. Information will be traded already analysed, and nowhere more so than in the area of customer behaviour. Understanding of lifestyle-models, spending-patterns and decision-making behaviour, will become the new currency exchanged between suppliers. Not the basic high-level, over-simplified, demographic segmentation that we use today, but a deep behavioural understanding of individual consumers that will allow real-time, predictive and personal targeting. Most of the information is already being captured today, so it’s a question of refining the psychological, sociological and commercial models around it. Think of it this way: How come Google and Amazon know (instantly!) more about my on-line interactions with a particular retailer, than the retailer’s own customer service call centre? Does the frequency of logging into online banking indicate that I am very diligent in managing my finances, or that I am in financial trouble? Does my facebook status reflect my frustration with my job, or my euphoric pride in my daughter’s achievement? How will that determine if I decide to buy that other lens I have been looking at for my camera, or not? Scary as the prospect may be, from a personal privacy perspective, most of that information is in the public domain already. What is the digested form of that information, worth to a retailer?
Security models will turn inside out
Today most security systems, algorithms and analysis, are focused on the device and its environments. Be it the network, the laptop, the smartphone or the ECM system, security models are there to protect the container, not the content. This has not only become a cat-and-mouse game between fraudsters and security vendors, but it is also becoming virtually impossible to enforce at enterprise IT level. With BYOD, a proliferation of passwords and authentication systems, cloud file-sharing, and social media, users are opening up security holes faster than the IT department can close. Information leakage is an inevitable consequence. I can foresee the whole information security model turning on its head: If the appropriate security becomes deeply embedded inside the information (down to the file, paragraph or even individual word level), we will start seeing self-describing and self-protecting granular information that will only be accessible to an authenticated individual, regardless if that information is in a repository, on a file-system, on the cloud, at rest or in transit. Security protection will become device-agnostic and infrastructure-agnostic. It will become a negotiating handshake between the information itself and the individual accessing that information, at a particular point in time.
Oh, and while we are assigning security at this granular self-contained level, we might as well transfer retention and classification to the same level as well.
The File is dead
In a way, this prediction follows on from the previous one and it’s also a prerequisite for it. It is also a topic I have discussed before [Is it a record, who cares?]. Information Management, and in particular Content Management, has long been constrained by the notion of the digital file. The file has always been the singular granular entity, at which security, classification, version control, transportation, retention and all other governance stops. Even relational databases ultimately live in files, because that’s what Operating Systems have to manage. However, information granularity does not stop at the file level. There is structure within files, and a lot of information that lives outside the realm of files (particularly in social media and streams). If Information Management is a living organism (and I believe it is), then files are its organs. But each organ has cells, each cell has molecules, and there are atoms within those molecules. I believe that innovation in Information Management will grow exponentially the moment that we stop looking at managing files and start looking at elementary information entities or segments at a much more granular level. That will allow security to be embedded at a logical information level; value to grow exponentially through intelligent re-use; storage costs to be reduced dramatically through entity-level de-duplication; and analytics to explode through much faster and more intelligent classification. File is an arbitrary container that creates bottlenecks, unnecessary restrictions and a very coarse level of granularity. Death to the file!
BYOD is just a temporary aberration
BYOD is just a transitional phase we’re going through today. The notion of bringing ANY device to work is already becoming outdated. “Bring Work to Your Device” would have been a more appropriate phrase, but then BWYD is a really terrible acronym. Today, I can access most of the information I need for my work, through mobile apps and web browsers. That means I can potentially use smart phones, tablets, the browser on my smart television, or the Wii console at home, or my son’s PSP game device to access work information. As soon as I buy a new camera with Android on it, I will also be able to access work on my camera. Or my car’s GPS screen. Or my fridge. Are IT organisations going to provide BYOD policies for all these devices where I will have to commit, for example, that “if I am using that device for work I shall not allow any other person, including family members, to access that device”? I don’t think so. The notion of BYOD is already becoming irrelevant. It is time to accept that work is no longer tied to ANY device and that work could potentially be accessed on EVERY device. And that is another reason, why information security and governance should be applied to the information, not to the device. The form of the device is irrelevant, and there will never be a 1:1 relationship between work and devices again.
It’s not your cloud, it’s everyone’s cloud
Cloud storage is a reality, but sharing cloud-level resources is yet to come. All we have achieved is to move the information storage outside the data centre. Think of this very simple example: Let’s say I subscribe to Gartner, or AIIM and I have just downloaded a new report or white paper to read. I find it interesting and I share it with some colleagues, and (if I have the right to) with some customers through email. There is every probability that I have created a dozen instances of that report, most of which will end up being stored or backed up in a cloud service somewhere. Quite likely on the same infrastructure where I downloaded the original paper from. And so will do many others that have downloaded the same paper. This is madness! Yes, it’s true that I should have been sending out the link to that paper to everyone else, but frankly that would force everyone to have to create accounts, etc. etc. and it’s so much easier to attach it to an email, and I’m too busy. Now, turn this scenario on its head: What if the cloud infrastructure itself could recognise that the original of that white paper is already available on the cloud, and transparently maintain the referential integrity, security, and audit trail, of a link to the original? This is effectively cloud-level, internet-wide de-duplication. Resource sharing. Combine this with the information granularity mentioned above, and you have massive storage reduction, cloud capacity increase, simpler big-data analytics and an enormous amount of statistical audit-trail material available, to analyse user behaviour and information value.
The IT organisation becomes irrelevant
The IT organisation as we know it today, is arguably the most critical function and the single largest investment drain in most organisations. You don’t have to go far to see examples of the criticality of the IT function and the dependency of an organisation to IT service levels. Just look at the recent impact that simple IT malfunctions have had to banking operations in the UK [Lloyds Group apologies for IT glitch]. My prediction however, is that this mega-critical organisation called IT, will collapse in the next few years. A large IT group – as a function, whether it’s oursourced or not – is becoming an irrelevant anachronism, and here’s why: 1) IT no longer controls the end-user infrastructure, that battle is already lost to BYOD. The procurement, deployment and disposition of user assets is no longer an IT function, it has moved to the individual users who have become a lot more tech-savy and self-reliant than they were 10 or 20 years ago. 2) IT no longer controls the server infrastructure: With the move to cloud and SaaS (or its many variants: IaaS, PaaS, etc.), keeping the lights on, the servers cool, the backups running and the cables networked will soon cease to be a function of the IT organisation too. 3) IT no longer controls the application infrastructure: Business functions are buying capabilities directly at the solution level, often as apps, and these departments are maintaining their own relationships with IT vendors. CMOs, CHROs, CSOs, etc. are the new IT buyers. So, what’s left for the traditional IT organisation to do? Very little else. I can foresee that IT will become an ancillary coordinating function and a governance body. Its role will be to advise the business and define policy, and maybe manage some of the vendor relationships. Very much like the role that the Compliance department, or Procurement has today, and certainly not wielding the power and the budget that it currently holds. That, is actually good news for Information Management! Not because IT is an inhibitor today, but because the responsibility for Information Management will finally move to the business, where it always belonged. That move, in turn, will fuel new IT innovation that is driven directly by business need, without the interim “filter” that IT groups inevitably create today. It will also have a significant impact to the operational side of the business, since groups will have a more immediate and agile access to new IT capabilities that will enable them to service new business models much faster than they can today.
Personally, I would like all of these predictions to come true today. I don’t have a magic wand, and therefore they won’t. But I do believe that some, if not all, of these are inevitable and it’s only a question of time and priority before the landscape of Information Management, as we know today, is fundamentally transformed. And I believe that this inevitable transformation will help to accelerate both innovation and value.
I’m curious to know your views on this. Do you think these predictions are reasonable, or not? Or, perhaps they are a lot of wishful thinking. If you agree with me, how soon do you think they can become a reality? What would stop them? And, what other fundamental changes could be triggered, as a result of these?
I’m looking forward to the debate!
It’s Autumn. The trees are losing their leaves, the nights are getting longer, it’s getting cold and grey and generally miserable. It’s also the time for the annual lament of the Enterprise Content Management industry and ECM… the name that refuses to die!
At least once a year, ECM industry pundits go all depressed and introspect and predict, once again, that our industry is too wide, too narrow, too complex, too simplified, too diverse or too boring and dying or not dying or dead and buried. Once again this year, Laurence Hart (aka Pie), Marko Sillanpää, Daniel Antion, John Mancini and, undoubtedly, several other esteemed colleagues, with a collective experience of several hundred years of ECM on their backs, will try (and fail) to reconcile and rationalize the semantics of one of the most diverse sectors in the software industry.
You will find many interesting points and universal truths about ECM if you follow the links to these articles above. Some I agree with wholeheartedly, some I would take with a pinch of salt.
But let me assure you, concerned reader, that the ECM industry is not going anywhere, the name will not change and we will again be lamenting its demise, next Autumn!
There is a fundamental reason why this industry is so robust and so perplexing: This is not a single industry, or even a single coherent portfolio of products. It’s a complex amalgamation of technologies that co-exist and complement each other, with the only common denominator being an affinity for managing “stuff” that does not fit in a traditional relational database. And every time one of these technologies grows out of favour, another new discipline joins the fold: Documents and emails and archives and repositories and processes and cases and records and images and retention and search and analytics and ETL and media and social and collaboration and folksonomies and cloud, and, and, and… The list, and its history, is long. The reason this whole hotchpotch will continue to be called Enterprise Content Management, is that we don’t have a better collective noun that even vaguely begins to describe what these functions do for the business. And finally, more and more of the market (you know, the real people out there, not us ECM petrolheads…) are starting to recognise the term, however vague, inappropriate and irrational it may be to the purists among us.
And there is one more reason: Content Management is not a technology, it’s an operational discipline. Organisations will manage content with or without ECM products. It’s just faster, cheaper and more consistent if they use tools.
As I said, if you have an academic interest in this ECM industry, the articles above are definitely worth reading. For my part, I would like to add one more thought into that mix:
The word “Enterprise” in “ECM” has been the source of much debate. And whilst I agree with Laurence that originally some of the vendors attempted to promote the idea of a single centralised ECM repository for the whole enterprise, that idea was quickly abandoned in the early ’00s as generally a bad idea. Anyone who has tried to deploy this approach in a real world environment, can give you a dozen reasons why it’s really, really a very naïve idea.
Nevertheless, Content Management has always been, and will always be “Enterprise”, in the sense that it very rarely works as a simple departmental solution. There is very little value in doing that, especially when you combine it with process management, which adds the most value when crossing inter-departmental boundaries. It is also “Enterprise” in the sense that as a platform it can support both vertical and horizontal applications across most parts of an organisation. Finally, there are certain applications of ECM, that can only be deployed as “Enterprise” tools: It would be madness to design Records Management, eMail archiving, eDiscovery or Social collaboration solutions, on a department by department basis. There is no point!
That’s why, in my opinion at least, the term ECM will live for a long time yet… Long Live ECM!
It’s not often that I describe a refrigerator as a taxonomy, so bear with me here… So, you loaded up the car with your grocery shopping, you brought it all in the kitchen from the car, and you are about to load up the fridge. Do you organise your fridge layout based on the “Use By” date of the products? No, nobody does. You put the vegetables in the vegetable drawer, you put the raw meats on a shelf of their own, the yoghurts and the desert puddings on a separate shelf. The eggs go in the door. You may consider the use-by date as you stack things of the same category, e.g. the fresh chicken will have to be eaten before the sausages which will still last until next week, but that’s incidental, it’s not the primary organisational structure. Your fridge has a taxonomy, a classification scheme, and it is organised functionally, by product class, not by date.
Where am I going with this? Records and retention management (where else?). It’s over fours years ago, that I wrote an article called “Is it a record? Who cares!” which created quite a bit of animosity in the RM community, and I quickly had to follow it up with a Part 2 to explain that my original title was quite literal, not sarcastic.
Four years later, I find myself still having very similar conversations with clients and colleagues. The more we move into an era of Information Governance, the more the distinction between records and non-records becomes irrelevant. And the more we move from the world of paper documents to the multi-faceted world of electronic content, the more we need to move away from the “traditional” records management organisational models of retention-based fileplans: The physical management of paper records necessitated their organisation in clusters of documents with similar retention requirements in order to dispose of them, so classification taxonomies (fileplans) were organised around that requirement.
In the digital world, this is no longer a requirement. Retention period, is just another logical attribute (metadata) applied to each individual content piece, not an organisational structure. With the right tools in place, a retention model can be associated with each piece of content individually, and collections of content with the same retention and – more importantly, disposition – periods, can be assembled dynamically as and when required.
For me, there are only two logical questions that drive the classification of digital content: “What is it?” (the type of content, or class) and “What is it for?” (the context under which it has been, or will be used). To use an example: An application form for opening a new account, is a certain type of content which will determine its initial retention period while it’s being processed. If that application is approved or rejected, is context that will further affect its retention period. If the client raises a dispute about his new account, it may further impact that retention period of that application form. This context-driven variance, cannot be supported in a traditional fileplan-based records management system, which permanently fixes the record – fileplan – retention relationship.
The classification (organisation, taxonomy, use any term you like…) of that content, is not even relevant to this fileplan/retention discussion. The application form in the previous example, will need to be associated with the customer, the account type, and the approval process or the dispute process. That is the context under which the organisation will need to organise and find that particular application form. You will not look for it by its retention period, unless you are specifically looking to dispose of it.
To go back to my original fridge metaphor: You will not start cooking dinner by picking up the item in the fridge that will expire first – that’s probably the pudding. You will look in the relevant shelf for the food you are trying to cook: meat or vegetables or eggs. Only after that you may double check the date, to see if it is still valid or expired.
So… I remain convinced that:
(a) there is no point in distinguishing between records and non-records any more, non-records are just records with zero shelf-life
(b) the concept of a “fileplan” as a classification structure is outdated and unnecessary for digital records, and
(c) it’s time we start managing content “in context”, based on its usage history and not as an isolated self-defining entity.
As always, I’m keen to hear your thoughts on this.
P.S. I read some blogs to learn, some for their amusing content, and some because (even if their content sometimes irritates me) force me to re-think. I read Chris Walker’s blog because it generally makes me nod my head in violent agreement 🙂 . He often expresses very similar views to mine and I find his approach to Information Governance (which he is now consolidating into a book) extremely down to earth. The reason for this shameless plug to his blog, is that as I was writing the thoughts expressed above, I caught up with his article from last week Big Buckets of Stuff, that covers very similar ground… Well worth a read.
This morning, I was reading Lawrence’s blog titled “Does Records Management Give Content Management a Bad Name?”, which picks on one of the points in Cheryl’s article “It’s a Digital-First World: Five Trends Reshaping Records Management As You Know It”, with some very insightful comments added by Christian. I started leaving a comment under Lawrence’s blog (which I will still do, pointing back to this) but there are too many points I wanted to add to the debate and it was becoming too long…
So, here is my take:
First of all, I want to move away from the myth that RM is a single requirement. Organisations look to RM tools as the digital equivalent to a Swiss Army Knife, to address multiple requirements:
- Classification – Often, the RM repository is the only definitive Information Management taxonomy managed by the organisation. Ironically, it mostly reflects the taxonomy needed by retention management, not by the operational side of the business. Trying to design a taxonomy that serves both masters, leads to the huge granularity issues that Lawrence refers to.
- Declaration – A conscious decision to determine what is a business record and what is not. This is where both the workflow integration and the auto-classification have a role to play, and where in an ideal world we should try to remove the onus of that decision from the hands of the end-user. More on that point later…
- Retention management – This is the information governance side of the house. The need to preserve the records for the duration that they must legally be retained, move them to the most cost-effective storage medium based on their business value, and actively dispose of them when there is no regulatory or legal reason to retain them any longer.
- Security & auditability – RM systems are expected to be a “safe pair of hands”. In the old world of paper records management, once you entrusted your important and valuable documents to the records department, you knew that they were safe. They would be preserved and looked after until you ask for them. Digital RM is no different: It needs to provide a safe-haven for important information, guaranteeing its integrity, security, authenticity and availability. Supported by a full audit trail that can withstand legal scrutiny.
Auto-categorisation or auto-classification, relates to both the first and the second of these requirements: Classification (using linguistic, lexical and semantical analysis to identify what type of document it is, and where it should fit into the taxonomy) and Declaration (deciding if this is a business document worthy of declaration as a record). Auto-classification is not new, it’s been available both as a standalone product and integrated within email and records capture systems for several years. But its adoption has been slow, not for technological reasons, but because culturally both compliance and legal departments are reluctant to accept that a machine can be good enough to be allowed to make this type of decisions. And even thought numerous studies have proven that machine-based classification can be far more accurate and consistent than a room full of paralegals reading each document, it will take a while before the cultural barriers are lifted. Ironically, much of the recent resurgence and acceptance of auto-classification is coming from the legal field itself, where the “assisted review” or “predictive coding” (just a form of auto-classification to you and me) wars between eDiscovery vendors, have brought the technology to the fore, with judges finally endorsing its credibility [Magistrate Judge Peck in Moore v. Publicis Groupe & MSL Group, 287 F.R.D. 182 (S.D.N.Y.2012), approving use of predictive coding in a case involving over 3 million e-mails.].
The point that Christian Walker is making in his comments however is very important: Auto-classification can help but it is not the only, or even the primary, mechanism available for Auto-Declaration. They are not the same thing. To take the records declaration process away from the end-user, requires more than understanding the type of document and its place in a hierarchical taxonomy. It needs the business context around the document, and that comes from the process. A simple example to illustrate this would be a document with a pricing quotation. Auto-classification can identify what it is, but not if it has been sent to a client or formed part of a contract negotiation. It’s that latter contextual fact that makes it a business record. Auto-Declaration from within a line-of-business application, or a process management system is easy: You already know what the document is (whether it has been received externally, or created as part of the process), you know who it relates to (client id, case, process) and you know what stage in its lifecycle it is at (draft, approved, negotiated, signed, etc.). These give enough definitive context to be able to accurately identify and declare a record, without the need to involve the users or resort to auto-classification or any other heuristic decision. That’s assuming, of course, that there is an integration between the LoB/process and the RM system, to allow that declaration to take place automatically.
The next point I want to pick up is the issue of Cloud. I think cloud is a red herring to this conversation. Cloud should be an architecture/infrastructure and procurement/licensing decision, not a functional one. Most large ECM/RM vendors can offer similar functionality hosted on- and off-premises, and offer SaaS payment terms rather than perpetual licensing. The cloud conversation around RM however, comes to its own sticky mess where you start looking at guaranteeing location-specific storage (critical issue for a lot of European data protection and privacy regulation) and when you start looking at the integration between on-premise and off-premise systems (as in the examples of auto-declaration above). I don’t believe that auto-classification is a significant factor in the cloud decision making process.
Finally, I wanted to bring another element to this discussion. There is another RM disruptive trend that is not explicit in Cheryl’s article (but it fits under point #1) and it addresses the third RM requirement above: “In-place” Retention Management. If you extract the retention schedule management from the RM tool and architect it at a higher logical level, then retention and disposition can be orchestrated across multiple RM repositories, applications, collaboration environments and even file systems, without the need to relocate the content into a dedicated traditional RM environment. It’s early days (and probably a step too far, culturally, for most RM practitioners) but the huge volumes of currently unmanaged information are becoming a key driver for this approach. We had some interesting discussions at the IRMS conference this year (triggered partly because of IBM’s recent acquisition of StoredIQ, into their Information Lifecycle Governance portfolio) and James Lappin (@JamesLappin) covered the concept in his recent blog here: The Mechanics on Manage-In-Place Records Management Tools. Well worth a read…
So to summarise my points: RM is a composite requirement; Auto-Categorisation is useful and is starting to become legitimate. But even though it can participate, it should not be confused with Auto-Declaration of records; “Cloud” is not a functional decision, it’s an architectural and commercial one.