Archive for the ‘eDiscovery’ Category

What IS the value of a document?

September 1, 2010 5 comments

I just read an interesting blog “What is the cost of a Lost Document” by Jeff Shuey.

The points he makes about the risk of not capturing information appropriately, are of course valid and often quoted in the world of document management. But it got me thinking on a more fundamental issue: How do you determine what the real value of a document is?

Clearly not all documents have the same value: losing a grocery receipt is fundamentally different to losing your driver’s license or your passport. Misplacing an expenses claim receipt might be worth £20, misplacing a vital piece of evidence in a litigation case might be worth £20 million. Today’s Financial Times is vital for making business decisions tomorrow, but worthless recycling paper next week.

Which makes the generic numbers like the PriceWaterhouse ones quoted in the Jeff’s blog seem just as relevant as ordering clothes for 2.75 children.

So how can we measure the value of a document? What is it worth? None of the Content Management systems that I’m aware of today, have provisions for assigning individual value to stored content, let alone managing its lifecycle differently based on that value.

Is it even possible to determine the value of a document? (And for the purposes of this discussion “a document” could be anything from a 140-character tweet message, to a 300,000-page drug application…) Where does the value come from?

  • The cost and effort of preparing or acquiring it?
  • The cost of storing it and managing it?
  • The context in which it has been used in the past, or may be used in the future?
  • Its rarity or brevity or accuracy?
  • Its relevance now? Its potential relevance in the future?
  • How often it has been accessed and referenced and by whom?
  • Who it is relevant to?
  • The length of time that it retains its value?
  • At what point does its value peak and when does it wane?
  • The risk it carries, by its existence or by its absence?
  • etc., etc. …

The list goes on! And this is before we even start thinking about assigning metrics or actual monetary value to any of the above.

Common sense says it’s probably some combination of all of the above. But do we measure any of this today? Should we?

Imagine the potential scenarios, if every document in a Content Management system carried a continually adjusted “Relative Content Value” property (You’ve heard it here first: a document’s RCV! 🙂 ). We could easily foresee…

  • A system that automatically discards a document, because it’s readily and securely available online, storing a reference instead
  • A system that automatically archives and protects an email that has been used in contract negotiations
  • A system that automatically hides a document that contains personal or confidential information
  • A system that automatically discard or hides documents that have repeatedly appeared in search results but nobody chooses to read
  • A system that automatically relocates content to different risk mediums based on its value
  • A system that automatically calculates insurance premiums for insuring against loss of its content, based on the total content’s value to the organisation
  • A system that can determine the likely life expectancy of a document, based on the history of how similar documents have been accessed in the past.
  • A system that would notify you, the author, when facts in your original research sources have been disputed or have changed, rendering your document misleading.
  • etc., etc. …

Actually, we have technologies today to implement most of these things, if we knew what that “relative content value” was. What we are missing is a coherent way of calculating and storing that value on an ongoing basis.

Which brings me back to my original conundrum: Is it ever possible to determine what IS the value of a document? How?


Are Content Analytics turning the grubby ECM worm into a butterfly?

Colleagues that have known me for a while, have all heard me bemoaning the use of the term “unstructured” to describe text-based content. Without boring you again to tears, my main issue is that the ECM industry has been largely treating content files as amorphous “unstructured” blobs, ignoring the rich value that is locked inside these content objects.

For the last twenty years or so, ECM systems have been providing a cocoon, where documents and media files have been stored, preserved, secured, archived and generally left to their own devices. But we have been focusing in protecting the whole container, the box, based on the label it has outside and only looking inside the box, one box at a time.

There is change afoot! 2010 looks set to be the year of Content Analytics, which promises to finally unlock the value that is locked inside our gigantic festering ECM repositories. And if the early success signs of IBM’s new Content Analytics software is anything to go by, we are starting to witness a fundamental transformation in the way content is leveraged in large organisations.

Much in the same way that Data Warehousing and Business Intelligence transformed the bland data storage provided by databases in the mid-90s, Content Analytics is today bringing natural language processing, trends analysis, contextual discovery and predictive analytics to the “unstructured” world.

Purists will argue that these algorithms are not new and, to a certain extent, that is true. However, this is the first time that we are seeing these technologies applied easily, (i.e. with off-the-shelf products, without the need of a PhD statistician or linguist by your side…) in real commercial applications, to solve real business problems: Car manufacturers avoiding recalls with early fault trends analysis; Pharmaceutical companies recognising equipment failure trends much earlier; large multi-nationals saving millions in litigation fees, etc.

The ECM industry may still be thriving, but in terms of innovation it has reached a plateau that makes most of us uncomfortable (or complacent… depending on your point of view). Basic content management functionality is being commoditised with CMIS, OpenSource and SharePoint leading the charge. There’s nothing wrong with that, it’s the natural maturity curve for any 20-year technology sector. We’ve created a very big ECM cocoon and we’ve filled it to the brim with content worms. It’s time to innovate again!

Making no apologies for the crass analogy (it is March after all and, allegedly, spring is coming…), Content Analytics are starting to finally poke the cocoon, making the value of content slowly emerge, transformed from archived fodder into real business insight.

%d bloggers like this: