Reformatting computer media
The BDPL creates disk images for most computer media that come through the lab. A disk image is a computer file that contains a complete copy of a storage volume. It is a bit-for-bit copy of the original computer media—for example, a hard disk, a floppy disk, a magnetic storage tape, or a piece of optical media. We can then extract files from those disk images and make the files available to researchers.
KNASTER.DOC, from the floppy disk above. John Perry Barlow papers (M2531). Department of Special Collections and University Archives, Stanford University Libraries, Stanford, California.
Why not just transfer files off the computer media?
Our collections contain computer media from many eras, and disk imaging helps ensure that we preserve legacy filesystem structures, character encodings, and other complex content. In some cases, the media is so old that we can’t expect to continue being able to mount it and access the content, so we want to get the most comprehensive copy of the data on the media that we can.
How do we capture disk images?
To capture disk images, the BDPL uses a range of hardware and software. Before being connected to the host computer doing the imaging work, the computer media is attached to a hardware write blocker, which prevents the host computer from writing to or changing the original media. Depending on the original media, this connection can require a daisy chain of cables—older media can require FireWire, SCSI, SATA, or other legacy connections.
The writeblocker is then attached to the host computer, which runs software to image the disk. There are a wide range of disk imaging software available for each type of operating system; the BDPL uses several depending on the original media and the operating system of the host computer.
Hard drives, whether internal (sealed inside your computer) or external (plugged into an outside port), are ubiquitous in the world and within our collections. They have been commercially available since the 1950s, and the BDPL has seen many of the varieties available since then!
Hard drives, particularly recent models, usually hold much more data than many other computer media. Processing and describing large amounts of data on a hard drive for researcher discovery and access is time-consuming compared to processing the files on a 3.5” floppy disk. Hard drives, particularly recent models, usually hold much more data than many other computer media. Processing and describing large amounts of data on a hard drive for researcher discovery and access is time-consuming compared to processing the files on a 3.5” floppy disk.
Whole Computer Systems
The Born-Digital Preservation Lab occasionally acquires whole computing systems as part of a personal, corporate or academic archives. These are often legacy computing systems that were used by record creators to generate or create born-digital archival records. These computing systems often contain legacy software applications that were used by record creators as part of their professional or academic work.
Lab staff will consult with the cognizant bibliographer to determine the appropriate type of preservation and data capture necessary. Examples where lab staff may recommend retaining whole computer systems include:
- The computer has artifactual value, or is unique in its components and configuration.
- The computer contains working legacy software necessary to access the data. This can include obsolete computer software such as operating systems, databases, custom software applications, or digital typefaces or fonts.
- The computer contains legacy hardware interfaces that are necessary to access data stored on older types of computer storage media.
- The computer hardware is currently the only means of providing access to the born-digital collection materials, and the original hardware must be kept operational.
Small computer media
The BDPL reformats a wide range of small computer media in-house, such as 3.5-inch and 5.25-inch floppy disks, Zip 100 and 250 disks, CDs, DVDs, and flash drives. These media span a wide range of eras and storage sizes; a 5.25” floppy disk can date from as early as 1976 (and hold a whopping 110 kilobytes), but we also receive modern flash drives that hold hundreds of gigabytes.
Reformatting and processing small media, particularly those with small storage capacities, presents some familiar technical challenges. For example, the age of the media will have a huge impact on our ability to decode, read, and make accessible the files it stores, considering that many files created with proprietary software won’t render in modern programs without conversion. (Consider the number of desktop publishing software programs that flourished in the 1980s and 1990s—at least, we have!)
Small media also present challenges specific to their technical characteristics. Because of the small storage size and affordable cost of a floppy or CD, people often accumulated dozens or hundreds of these discs to hold all their work, which means we image those dozens or hundreds of discs one-by-one (rather than imaging a few, much more expensive, computers and internal hard drives).