Information about Information

I attended an IEEE sponsored workshop on "The Future of Information".  This was held in Washington DC and included a number of diverse experts on how information will be changing over the next decades.  Needless to say, IEEE and the CS are both conduits for information and also academically (and practically) interested in information, so this was a very interesting workshop.

A key realization that emerged, at least for me, was that information about information is becoming the most valuable information.  In some ways we have known this for years. We do peer-review to vet information, and the endorsement of content by peers via peer review is important (albeit fairly transparent) information about the information. Citation is an additional indicator, more powerful because it is explicit -- you know who cited a document, and the context of that citation. Going on-line with content we have started adding meta-data, tags, and other data about the data. The entire business of search engines is to help surface 'best fit' content; and recommendation engines (Amazon, Netflix) are going further with this concept. Stephen Hawking, in his book "The Universe in a Nutshell" points out (pg 158)  that the rate of publishing of scientific articles is growing exponentially -- which is what we might expect as long as we keep investing in research in these fields on a global basis. But this means it is hopeless to try to "keep up", the best you can do is to hope to find the best of the best content as the place where you invest your time.  This makes information about this information a concept of increasing value, and an essential area for future research and products.

Vint Cerf presented a painfully insightful discussion of bit rot at this conference (see related presentation on You Tube) Besides his founding role in the Internet (eat your heart out Al Gore), Vint has been an ongoing voice addressing the challenges we face in getting the full value out of the Internet. In this case, it is not just the meta information (document in HTML 5.0), but the next tier (uses browser to render), and the next tier (Browser written in C++) and the next tier (C++ compiler...) and so forth. In a thousand years, you will need all of those elements with source code, and "period" technology (or emulation of it) to be able to interpret the initial HTML document. I had the opportunity to suggest that in his role at Google, he is in a good position to encourage and facilitate "bit preservation".  Should Google decide to give precedent to well-formed documents (ones that are consistent within a defined set of standards, and for which open source tools exist for rendering these) ... then new documents will migrate to one of those priority environments, and the potential for archival preservation will be greatly increased.

So we have some challenges that face us in our professional activities.  We need to give thought to the question of what information about information we need to capture, document and 'curate'.  In our communications, we need to apply this understanding to facilitate long term access and effective transmission/receiption,  In our research, standards and other related activities we need to help build the tools that can facilitate this, and help other professions to understand and adopt these for the benefit of future generations.

In a past life, I chaired a standards effort on web site engineering best practices (IEEE Std. 2001, ISO Std 23026), where we sought to identify best practices in this particular information space.  One of the realizations that emerged from this effort was the overall impact of 'doing it right'.  For example, including captions on images not only improves understanding for the visually impaired, but also increases the relevance in page ranking for searches.  Similarly, providing a transcript for an audio or video stream facilitates access for deaf persons, and then provides for easier understanding by individuals in different language or dialect groups -- as well as providing more accurate page ranking. Getting the information about information right is the challenge of the information age.


