Program: Electronic Library & Information Systems. XLIII/2 (2009): 215-228.
Bekir Kemal Ataman
Faculty of Engineering, Department of Industrial Engineering, Marmara University, Istanbul, Turkey
1. New generation of users
The archival world has been arguing about the paperless office concept for decades. One theory states that there will always be paper and that computers simply produce paper faster than human beings could ever do. This is absolutely correct, of course, but when we look at the real reasons for the increase in paper-based information, a startling truth confronts us. What is being generated so much faster is not mere paper but real information. Computers have given us the capability to create, in just a few years, more information than the whole human race managed to produce during its entire history.
What is more, the information we see on paper now was produced, and stored, in an electronic environment before being printed as hard copy. We seem to be in a transitional period. The stock of information is currently doubling every three years. According to 2002 figures, our civilisation creates five exabytes (5,000,000,000 GB) of unique information every 12 months, enough to fill the Library of Congress in the US 37,000 times over, or the equivalent of about 1.2 trillion books. 92% of this information is recorded on magnetic media (Lyman and Varian, 2003).
According to a study by the International Data Corporation (IDC), the amount of digital information produced in 2006 reached 161 exabytes, a figure which is expected to reach 988 exabytes by 2010 (Gantz et al., 2007). It is glaringly obvious that this figure is going to increase exponentially, and that in the near future little or no unique information is going to be produced by actual pens, pencils and typewriters making marks on paper.
Nevertheless, the need to transfer information onto paper is likely to continue for some time to come -at least as long as members of our generation, who grew up seeing information generated exclusively on paper, are still living. It is not easy for us to shake free from our well-established habits, but the time will come, as we move on and leave the stage to the new generation, when it will no longer be necessary to transfer information onto paper. The archives, libraries and information sources of the future are likely to be based entirely in electronic environments. Major projects are already under way all over the world transferring the information that we and earlier generations produced on paper to electronic media, so that those who come after us can find that information electronically. Substantial financial resources (Tonta 2006, 8-9) are being invested in such projects as:
In addition to such state-supported ventures, there are also a number of private sector initiatives, of which Michael Hart"s Gutenberg Project was the vanguard (http://www.gutenberg.org). These are now fast increasing, with developments such as:
The rising generation of users will be, not "cyborgs" in cyberspace, but "inforgs" growing up in the global "infosphere". The infosphere has been defined by Floridi (2006) as "the whole informational environment made up of all informational entities (including informational agents), their properties, interactions, processes, and relations” with "inforgs" being connected informational organisms, living, breathing human beings inhabiting this information universe which is their ecosystem. If a person spends more time online than asleep, they might be defined as an "inforg".
Inforgs do not like their information served up on paper. They see paper as slow, cumbersome, unfriendly and old-fashioned. They are used to getting things quick, and expect more or less immediate access to any source of information. For them, information that cannot be accessed in increasingly shorter periods of time is just not worth accessing (Cox, 1998, 11). This generation also wants to cut directly to the result, the relevant slice of information, rather than wrestle with the fine detail of the staggering mass of information at their disposal. Various intelligent information agents give them this level of direct access, and therefore information which cannot be processed by a search engine (because it is on paper, for example) is no use to them. Information sources in the form of books or paper record files are simply off their radar screen. The inforg generation are interested in new, current information rather than old information. Information now grows stale very quickly indeed, and its shelf life is growing ever shorter. For example, the deterioration rate for data about individuals runs at an unbelievable 3% per month (Klau, 2003) (because people have a habit of moving house, changing their marital status, shifting jobs and ultimately, dying).
The new generation may not be in the habit of taking trips to archives and libraries, but that does not mean they are not users of information sources. On the contrary, these users, for whom libraries are merely virtual pathways, far more frequently access electronic articles, books and library collections than their predecessors used to access paper records (Tonta 2007, 3). The reason that they are such eager users is that the facilities are at their fingertips, and they also have friendly assistance from the expert members of news and discussion groups, who fulfill the role of a reference librarian (Ataman, 2004, 39).
2. New roles, new skills
In this new environment there is no longer a need for information professionals such as archivists, records managers, and librarians, in their traditional role as "people who assist in accessing information". We now use automatic systems (search robots etc) for this sort of work. In traditional environments, preservation and control of the carrier medium and the physical object was the prime consideration. In an electronic environment, by contrast, content and reliability are key (Duranti, 1998). In examining the future role of information professionals, these two aspects will first be examined, and then some predictions about future changes in the role of archivists and manuscript librarians will be made, firstly as forensic experts testifying authenticity, secondly as conservators, and thirdly as records managers.
2.1 Content presentation
The presentation of content has long been viewed as a fundamental role within the information profession. It is now no longer practical, however, for information professionals to create metadata out of content for each item, by hand, because the volume of information and records has grown too huge to deal with in this way. What is more, the descriptions, which in traditional archives were made at the series level, must now be made at the piece level. Similarly, librarians will need to catalogue each single page (on the Web) rather than the book (the website) as a whole. In our new world where information is accessed by various automatic systems, information professionals will no longer be "people who assist in accessing information" and must transform themselves into "people who design and create ways to access information".
During the transitional period, the new version of the traditional role of "people who assist in accessing information" is the "person who develops online reference services". The reference archivist or librarian may now be providing a variety of services including instant response to questions from users via electronic communication systems, FAQs, help pages, and online tutorial documents (Yakel and Reynolds, 2006).
In a related development, methods of presenting information are also constantly in a state of flux. The approach of many traditional archivists and librarians could hardly be described as user-centred (Yakel and Reynolds, 2006), but in this new world it will be essential to establish who is using the online sources and how, to learn how familiar these users are with archive and library sources, and to plan systems to meet their needs. Concepts such as information architecture (Wodtke, 2003) and usability studies (Garrett, 2003) are becoming increasingly important. The field of Human Computer Interaction (HCI) is also growing in significance, since even quite simple changes of font or colour can drastically affect the interaction of the users with online material (Yakel and Reynolds 2006).
However, these are not the only skills which the information professionals of the future will have to acquire. Take, for example, the simple matter, for today"s information professional, of writing a paper label and attaching it to an information source. Information professionals of the future will have to have enough programming knowledge to write a user interface or modify an existing one so that they can perform the same kind of operation just as easily and naturally in the new medium. Conventional forms of metadata, such as the library catalogue and archival finding aid, have been transformed into a variety of electronic metadata types. Metadata can be divided into three distinct classes:
Content metadata is placed there principally so that it can be collected by the search engines that visit and index online resources. Information professionals will need to be able to study, research and prepare content metadata for online sources. These are, after all, the main pathways used by the new generations who rely on intelligent agents to fetch their information for them. Research in this area has made considerable progress. Standards such as the Dublin Core Metadata Initiative (Dublin Core- http://dublincore.org), which was developed by the librarian community, and the Encoded Archival Description (EAD - http://www.loc.gov/ead), developed by the archives community, are the principal foundation stones of this edifice. Mention must also be made of the Z39.50 protocol which was developed by the librarian community in the 1980s specifically for the distribution and sharing of metadata, and has since received widespread acceptance.
Instruments like these are viewed primarily as resource discovery techniques. But nowadays, even more sophisticated structures are required which go beyond resource discovery mechanisms to include identifiers, in the form of complex object formats, such as:
All three of these formats are able to verify the actual source of the digital record. These formats typically share the following core characteristics:
OAI-PMH (Open Archives Initiative, Protocol for Metadata Harvesting) is currently gaining acceptance as a shared protocol for co-ordinating and harmonising such sophisticated structures which gathers all these sophisticated structures under a single umbrella (Van de Sompel et al., 2004).
However, all this will not be enough for information users of the future because they will not want to limit themselves to using cataloguing terminology prescribed by distant authorities. They will expect to use the same colloquial language and the same search and retrieval techniques that they use in their everyday lives. Today such users are already using tagging methods, known as folksonomy, pioneered by sites such as del.icio.us and Flickr. This method uses, in place of the cataloguer"s generalised and superficial labels, much richer classifications which may be unique to the topic in question and are produced by experts from communities focussed on a particular field. Moreover, since such tags are entered by users as they read the material, the tagging process is more or less instantaneous.
Not surprisingly, these methods come with some built-in disadvantages. Most obviously there is no clear and shared standard. Worse, these techniques are vulnerable to spam-type abuse (Hammond et al., 2005).
An alternative would be to develop title and subject metadata by means of text mining, a technique resembling data mining (Stollar and Kiehne, 2006, 5). Such tools, already under development by Google, are intended to form the basis for automatic multilingual translation engines.
2.2 Trustworthy storage
A fundamental problem for the information professional of the future will be the question of reliability. The reliability of stored information depends principally on how well its authenticity has been preserved. Archives and libraries offer the only trusted third-party storage for authentic information (InterPARES, 2001, 21)- but are they equipped with the security measures necessary for them to maintain this role? In the traditional setting archivists needed to have the physical object under control, but in electronic archives and libraries it is the functions, processes and uses that need to be controlled (Gilliland-Swetland and Eppard, 2000).
2.2.1. Function control
In an electronic environment it is easy to create documents by collaboration either vertically within a single organisation or horizontally with other partner organisations. In such situations (comparatively) new concepts such as multi-provenance emerge (Gavrel, 1990, 26-27). These new problems, which require radical revision of fundamental archive theory, have resulted in lengthy discussions in archival literature. The original discussion started with a questioning of the concept "life-cycle of records" but has moved on to arguments about whether or not a functional approach will have to be adopted in place of strict provenance (Cunningham, 1998; McKemmish et al., 1998).
2.2.2 Process control
One of the basic criteria in identifying the authenticity of a document was the presence of an unbroken, trustworthy, well documented custodial chain (Ashley, 2000). The quest is for an effective method of providing this sort of documentary evidence for electronic records. Rather than have information professionals enter or add the relevant information and metadata, this information could be included in the contextual metadata of the document at the point at which the information is created. Automatic additions to this contextual metadata could be made throughout the subsequent life of the e-document (Wallace, 1995). Systems which can automatically enter and add contextual metadata have not yet been fully developed, but a number of metadata extraction tools have been constructed -such as Jhove - which can locate and pull out the metadata developed when the document was created (Jhove, 2006). A similar problem occurs with electronic publications. Now that we have on-demand publishing of electronic publications which are also distributed electronically, it is technically very easy to insert last-minute additions. But for libraries, keeping track of such editions - or even worse "manuscripts" - will be a nightmare unless documents are automatically equipped with contextual metadata.
In order to guarantee that the copy in question has not been amended or changed during its journey through the custodial chain on the way to the e-library or the e-archives, it will be necessary to ensure that contextual metadata entries relating to this chain, and all the stops along the way up to the point of arrival at the e-library or the e-archives, are added automatically. This is not going to be easy. Updating contextual metadata at every stage of the chain of a document or publication will inevitably involve a variety of institutions and/or departments, and probably various systems with different formats. Gathering such a wide variety of standards under a single umbrella would not be an easy job even for IT experts, let alone information professionals.
Another commonly proposed solution would be to integrate "event awareness" into research tools (Lagoze, 2000). Such research tools would involve monitoring the life cycle of each individual source, maintaining awareness of the relationships between them, the events which took place in the transformation of input sources into output sources, and awareness of their characteristics (or metadata) (Lagoze, Hunter and Brickley, 2000). It would also require the subsequent storage of the contextual metadata which has been originated, and its preservation throughout the life of the document (or perhaps even for longer than that).
2.2.3 Control of uses
A wide variety of techniques have been developed over the years for identifying the authenticity of documents or manuscripts (Ataman, 2005). Unfortunately, hardly any of these can be directly applied to electronic records since a change to an electronic record is currently almost impossible to spot simply by looking at the document itself. There is, therefore, much to be gained from developing methods that will control and track usage of e-records and publications, and developing security systems which ensure that no undocumented change can be made to the material.
Document management system (DMS) software has been designed to enable us to monitor changes made at various times by various people to the same e-document. However, since changes to ordinary documents - once they have assumed the record function - could legally be classed as forgery, such software is not suitable for the management of e-records. The software industry identified this need and set to work adding control and monitoring mechanisms to document management systems, to produce a class of software known as electronic records management systems (ERMS). ERMS applications do not permit alterations to e-records registered on the system and keep an audit log to track all incidents of read-only access to the document, and even access attempts.
In an age in which information is power, it would of course be unthinkable to leave the storage and security of electronic records and information in the hands of a single software application. Information security is increasingly becoming a field of expertise in its own right. Archivists and records managers will have to start taking a closer interest in the issue, and undoubtedly information professionals in the future will all have to undergo rigorous training on information security.
If it is possible to establish control of functions, processes and usage, then originals of records held by trustworthy third parties, such as archives and libraries, can be used as reference specimens for comparison with materials circulating out in the wider electronic world. Under such an arrangement, comparison of any material with its original could also be carried out automatically by text mining tools.
2.3 Digital forensics
Whenever there is doubt about the authenticity of a historical document or manuscript, courts and lawyers often seek the expert opinion of an archivist or a manuscripts librarian. Such professionals reach a judgment by looking at the calligraphy, watermark, diplomatics and physical structure, and assessing its internal consistency and its resemblances with documents of the same period. It may be necessary to carry out very similar sorts of examinations in respect of electronic records and publications, especially when the custodial history of the records cannot be traced fully - in the case of documents in private archive collections, for example, or donated publications.
The equivalent of calligraphic characteristics, for an electronic record, would be the character encoding and details of embedded fonts. These would give us some information about the earliest possible date on which the document or publication could have been created (Lynch, 2000).
There was a fine example of this during the US elections in 2004: "proof" that George Bush had evaded military service, supposedly dating from 1971, was found to be printed in Times New Roman. But this font was devised for use by computers in the 1990s. Result: the forgery was being discussed on the new generation"s blogs just 18 minutes after the allegation entered the public domain, and Kerry"s campaign suddenly lost a significant measure of credibility among the upcoming generation upon whose support he was relying (Karlgaard, 2004).
Establishing authenticity of original media often involved looking at watermarks, which were mostly datable. Now there is interest in applying the digital equivalent of a watermark on electronic media. DataGlyph technology has been developed by Xerox to embed data on paper or in images in a manner that is undetectable by the human eye but readable with special tools (PARC Research, 2002) and provides hope for the technicians to develop similar techniques for electronic media.
The science of diplomatics is a fundamental area of archive studies dealing with the construction, style, form and materials involved in the making of documents, bound volumes and other archive material, their presentation and internal structure, and the relation of all these characteristics to their content and context (Duranti, 1998, 27).
The diplomatic characteristics of documents provide fundamental clues to establishing authenticity, because although the formal characteristics of documents chronologically follow the actual creation of the document, they offer a much quicker method of identification (Eastwood, 1988, 248). This branch of science can be deployed in establishing the authenticity of electronic records, but this will entail an updating of the field of diplomatics and the initiation of close study of the diplomatic characteristics of electronic records. For the coming generation, any diplomatics expert who does not know how an e-mail header is formed (to give the simplest example), would be a mere document archaeologist equipped to deal with only the most prehistoric of artifacts.
2.3.4 Physical form
Maintaining authenticity of records and manuscripts on traditional media involved studying its physical form such as the fabric of the paper and the chemistry of the ink. The equivalent in an electronic environment would be studying what comprises the formal attributes of the record as well as the technological context that determines its external make-up, including:
All these elements are, of course, mostly "transparent or invisible to the user” (Duranti and MacNeil, 1996, 49).
Archivists and librarians of the future are clearly going to have to build themselves an extensive technical information infrastructure in this field if they are going to assume a role in ascertaining authenticity, by means of such documentary elements.
Information professionals will also have to transform their everyday duty to preserve materials from various forms of damage. Research into the durability of traditional carrier media will inevitably continue, but now we must look at the durability of carrier media for digital information. There are several contemporary studies on the durability of CD and DVD media (e.g. Bradley, 2006). In the near future there will presumably be similar research on emergent media such as BlueRay, Flash disk, etc.
One of our tasks in ensuring that our digital information heritage is passed on to coming generations and used by them, is to monitor current and future file formats, and the structural metadata associated with them. Some of this structural data may be embedded in the documents themselves. Examples include "document properties" information created by word-processing applications, or the ID3 tags attached to MP3 sound files. Considerable care must be taken in moving data between formats in order to prevent the destruction of information. When a document originated under MacOS is converted to Windows format, for example, the "resource fork" which was an integrated element of the original file, is converted to an additional file which can easily become separated from its partner or lost. Furthermore, the file creation date and last amendment date on the Mac file is lost when converted to Windows format because Windows treats the date on which the file was copied as the file creation date (Stollar and Kiehne, 2006, 3).
The UK National Archives took a first step in this direction by establishing a database, PRONOM, of various file formats (http://www.nationalarchives.gov.uk/pronom). Tracking hundreds, and probably thousands, of file formats is a major task on its own, but ensuring that information stored in such formats is accessible in the long term will be such a tricky, painstaking business, and is referred to as "digital archaeology" (Ross and Gow, 1999).
One frequently recommended solution to this problem is to ensure that all file types are transferred into a more convenient format, a process referred to as migration (Wheatley, 2001). The most widely accepted format is the portable document format (PDF-A). This trend of migration to PDF was already becoming accepted as the de facto industry standard, but has now received recognition as an ISO standard (ISO 19005-1 2005 and ISO 15930-1-8 2001-2003). But archives and records managers may be unaware that this migration process wipes out much of the detail which would permit authenticity to be ascertained, because much of the structural metadata would be lost in this transformation. So one would be left with nothing more than the methods based on controlling functions, processes and usage to prove the authenticity of an electronic record.
Almost all of this information handling becomes completely impossible when it comes to interactive applications. With this sort of document, there are two choices. You can either set up a technology museum - with the massive associated cost of maintaining a wide variety of old systems in working order (Rothenberg, 1998), or you can access the material via emulators (Granger, 2000). The Domesday Project, a geographical information system established on the BBC computer in the UK in the early years of the personal computer in the 1980s, is one interesting example of a body of information salvaged and made accessible by means of an emulator program (Darlington et al., 2003).
The storage of security copies is particularly important for vital records management in an electronic environment, and one of the choices in this field is LOCKSS technology (Lots of Copies Keep Stuff Safe) (Reich and Rosenthall, 2001), which automates the process of ensuring that security copies of vital electronic records are maintained in a second location, or in multiple locations.
2.5 Records management
At the outset we discussed the current unprecedented speed of information senescence, how the life of information is consequently shortened, and how the accuracy of information about individuals, for example, deteriorates at the strikingly rapid rate of 3% per month. In this context, the quality and usefulness of data can only be protected by regular and systematic data cleansing. Records managers who were traditionally responsible for appraisal of information which had reached the end of its useful life and destroying whatever does not warrant continued storage, seem to be the most likely candidates to be appointed as digital data cleansers of the future.
In a world which produces enough unique information to fill the Library of Congress 37,000 times over every year, and where 92% of this information is held in electronic environments, information professionals are going to have to be equipped with the latest technology. This means in turn that institutions training such professionals will have to ensure that they are producing what Haspel (1998) has called "archive engineers" , or, in more general terms, information engineers. Unfortunately, some universities, have already started using this term to mean what have been traditionally referred to as "management information systems".
Unless they are equipped with new skills appropriate to the new technology, the stereotype of archivists, records managers, librarians and information managers as elderly eccentrics burrowing around in dusty stacks will actually become a reality. It may already be happening. A wide-ranging survey in the US in 2004 found that 60% of existing archivists are over 45 years of age (Walch and Yakel 2006, 21).
Clearly, archives and libraries will need staff with traditional training for some time to come; but if we do not begin training the information professionals of the future right now, and start wresting the new technology aspects of our profession from the hands of IT experts we are going to be swallowed up by the gaping digital divide. Once we have drifted under the control of other fields, we may be forced once again to embark on the "wars of independence" which the archives profession had to fight in its early years. We must therefore ensure that the information professionals of the future are armed not only with the traditional library and archive skills, but also with this new class of technological skills.
The vital question to be asked at this point is: what weight should be given to technology in the training that is going on right now? We must remember that the answer to that question will change with every passing day.
The next and still more pressing question is whether we are going to nurture our future information professionals by adding technical elements to the curricula of institutions focussed on information sciences, or introduce elements on records and information management and librarianship to engineering schools focussed on information technology. A glance at the papers delivered at the American Association of Archivists" 2004 conference on archive education published in Archival Science magazine in 2006 (Eastwood, 2006; Tibbo, 2006; Uhde, 2006; Cox 2006) shows that progress in introducing the necessary technology does not give grounds for optimism.
Most of us probably would not much welcome the second of the above options, in which IT people take command of the technological aspects of our work, but leaving information technology training to experts might create space for academicians working on information sciences to concentrate on researching conceptual issues in their own field.
We must give these matters serious thought, and make a start on our plan of action. If we fail in this, we may find that events leave us far behind.
This paper is based on a presentation made, in Turkish, at the Symposium on Information Management in a Changing World organised by Hacettepe University, Ankara, Turkey on 24-26 October 2007.
I would like to express my gratitude to Prof. Yaşar Tonta for his valuable opinions and contributions during the preparation of this text, and to Jonathan Sugden for proof reading and editing the English text.
References (All URLs were checked on November 29 th2008)
Ataman, B.K. (2004), “Technological means of communication and collaboration in archives and records management”, Journal of Information Science, Vol. 30, No.1, pp.30-40. Available at: http://www.archimac.org/BKACV/Articles/TechMeans.spml
Bradley, K. (2006), Risks Associated with the Use of Recordable CDs and DVDs as Reliable Storage Media in Archival Collections - Strategies and Alternatives , UNESCO, Paris. Available at:
Duranti, L. (1998), Diplomatics. New Uses for an Old Science, Scarecrow Press, Lanham MD.