Wednesday, August 05, 2009

Documents versus contextual information

Well.. Let me start by asking for your needs. If you say that its for legal purposes and is to be used for documentation together with a digital signature or strictly for printing I would tend to suggest the document approach. I any other case I would recommend storing you information in a contextual manner.

A document is basically a black box where you can dump your data. You can find your data by a folderpath and a filename. Thats basically what metadata you get for your document. For me I might as well put the document straight in the trashcan because I know I won't be able to find the document when I need it in a month when there are several versions of the document and a lot of documents have been written since.

The contextual information is different. I'm in particullar thinking of wikis. Here information is normalised to its 3rd degree, not being repeated. When you need the information somewhere you just make a reference to it.

It's an uneven match. Is your information to be used or are they just to be archived.


  1. To some extend I agree with you. Word documents, PDF documents, and images are examples of what researchers refer to as hypermedia dead-ends. Once you end up in such a document, you're stuck. A wiki, on the other hand, is an interlinked web of information, *possibly* normalized to 3NF. Unfortunately, all information must now exist within the wiki, and be expressed in its simple (compared to Word) markup language.

    Perhaps a middle ground would be to use a document-oriented database. With Couch DB, for instance, key/value pairs can be associated with a document, and version history retained. Pure document-oriented databases haven't really caught on, though. A SharePoint document library is probably the closest most people get to a document-oriented database.

    Key/value pairs in the form of columns are quickly added to a SharePoint document library. The problem with tagging -- used here in the broad sense meaning any manually entered metadata -- is that it's not utilized within an organization. Organizations are too small to leverage the crowd in the way Dan Bricklin explains in his The Cornucopia of the Commons essay.

  2. Hi Ronnie,

    I think the simple markup, is part of the reason for wiki's success. What you often see in large collection of word documents, is that people use different markup for the same semantic thing. In it's extreme people use different fonts, sizes, ... for headings. But also people change font sizes or colors in the middle of the text. It often looks like hell. In my experience it requires too much discipline to get everybody to strictly follow a template document. If *useble* templates even exists within an organization. To sum up, people cannot handle all the flexibility words gives them, so better put them in the wiki-markup straight jacket.

    Even if, you got a good templates, it is next to impossible to change the layout, as you got a ton of old documents, which keeps the old style. In a wiki it is different. As users usually express things like "this is a heading level 2", "this text is emphasized", ... then it becomes easy to change the layout of all documents at once.

    Also, how much markup do you really need? Besides tables I rarely miss any word features in wikis. Here I assume that images can be handled by your wiki.

    Though I do not know how much it is used, having simpler but fairly consistent markup, makes the job of internal company search engines easier. They can infer that a words marked with "emphasize" or "heading 2" is more important than un-marked-up text. To my knowledge, this is standard behavior for Internet wide search engines, so maybe internal search engines uses it too.

    Finally, using a wiki do not mean everything must go there. You can use a wiki for 95% of your documents, and put the rest in Word, when the wiki functionality is not good enough.