Digitization Considerations & Philosophies


The mission of the Digital Production Group (DPG) is creating high-quality cultural heritage images of the library’s diverse collections holdings, in order to allow for both rigorous research and casual browsing of the materials that is not contingent on physical access.

In addition to making the materials digitally accessible, these “digital surrogates” reduce potentially harmful physical handling, while still encouraging viewers to come to the reading room to view objects of particular interest. Digital objects are not merely surrogates for the physical objects, however, as they also enable other types of research and exploration - from viewing details that are beyond observation with the naked eye, to searching for specific terms in the textual output of OCR (optical character recognition), and more.

Digitization is more than just the production workflow from image capture to screen, and behind every digital object are many individual contributors and lab staff members who have been making choices that affect the final digital objects viewers encounter. Every object is unique, even among items similar in content, structural format, and era of creation.

Beyond the purely technical considerations, most of these decisions depend on the specific requirements of each digitization request or project, which vary widely. Some considerations relate to how the images will be used, some may be purely aesthetic preference, and all decisions are made in conjunction with curators and other stakeholders before digitization begins. In addition, a balance must be struck between these considerations and our available resources (primarily time).

The following sections will cover some of the main considerations, and then some technical details, of our work.


Gray, white, or black background?

Firstly, with most of the considerations discussed on this page, including this one, there is not necessarily an absolute right answer - but there is a particular option that will work best for each digitization effort.

DPG uses a number of background colors – gray, black, and white – behind the archival objects, depending on project needs or the visual characteristics of the material. There are aesthetic considerations and technical considerations for each object or collection that affect this choice, and even among cultural heritage institutions there is not currently a best-practices consensus (except in rare cases where there may be a standard for a particular type of object, such as papyrus fragments).

An important note is that cameras and digitization equipment have improved considerably over the decades that DPG has been digitizing things, and some background choice considerations regarding contrast and dynamic range that were important in the past for technical reasons are no longer an issue. Many early DPG images visible in Searchworks and in other Spotlight exhibits (and in this one!) illustrate this point when compared to newer images.

As technology has continued to improve and our imaging capabilities and philosophies have evolved, aesthetic preferences and end-use considerations are now the primary factors driving background color choice. Ideally, the background which best allows our eyes to interpret the object is chosen.

Cannery Row: Original pencil manuscript (143 pp.)
Cannery Row: Original pencil manuscript (143 pp.)
Coptic 14
Coptic 14
Mozart. Konzert fur 3 Klavier. Cembalo terzo
Mozart. Konzert fur 3 Klavier. Cembalo terzo

Bound items: spreads or separate pages?

For the most part, this decision is dictated by the objects themselves. It is not always possible to image a bound volume as spreads – in fact the majority of the time it isn't – or it may be possible, but text/information towards the center of the book - the "gutter” - may become too curved to be legible.

Examples of bound items imaged as separate pages

Examples of bound items imaged as spreads

Some bound items can be imaged as spreads without damaging the binding by carefully propping them up, illustrated below. This isn't appropriate for most bound items, as it could cause damage to the binding or paper, but for some it may actually be the only feasible way to image them. A piece of glass is positioned at a fixed height over the item, supported on the sides, holding the pages open as flat as safely possible without applying more than slight pressure. The pages are also supported from below.

George White daily appointment book, 1948
George White daily appointment book, 1948

How should the images be cropped?

Cropping may not seem like a particularly important part of digitization at first, but it comprises a significant portion of the workflow, and significantly influences the way our images are perceived.

By default, DPG includes a border around the physical object in the digital image. This enables the viewer to better understand the physical object, and to see that nothing is missing or cut off. The width of the border retained around the item may vary depending on the size/shape of the item and aesthetic factors.

There are many cases where a border may be undesirable, such as for theses and reports - particularly if the end-use of the images is for OCR (optical character recognition).

Cropping the images is a very hands-on process, and while we use an automated cropping tool that suggests an initial crop that gets somewhat close, our staff must still make final adjustments to the crops for each image individually - a sometimes-tedious process.


DPG's Technical Standards


General thoughts on quality and technical considerations

While "digitization" sounds very computer-y and, well, digital, it is in reality a very analog, very human, hands-on process with little automation possible. As a team we produce tens of thousands of digitized images every year, each bringing our own unique backgrounds, talents, and sensibilities to bear.

All of the technical decisions involved in digitization are made using norms and best practices developed by the cultural heritage imaging community at institutions like ours all around the country and the world. Even with today's technology, though, we still must make tradeoffs.

It's worth emphasizing that preservation-level accuracy isn't required to provide online access to materials. We often face a choice between creating images that are less-than-perfect or not imaging the item at all - and making these archival materials digitally accessible always wins that argument.

There have been incredibly significant advances in the image quality achievable by DPG and similar groups at other cultural heritage institutions in recent years. You'll see evidence of both current and past imaging standards in the items included in this exhibit, some of which were digitized quite some time ago when imaging technology was nowhere near as good as it is now.

As image quality is something that continually improves with better equipment, resources, and training, our time and effort is usually best spent making new materials accessible rather than re-imaging past materials that were shot with “legacy” standards.


Golden Thread calibrated color target

Where are the color bars?

You may be familiar with archival photographs from other institutions that feature prominent color bars. Color bars or other color targets are a critical part of the cultural heritage imaging process, though they may or may not be of use to end-users viewing the final images. There are different brands and types of color bars, and different ways of using them, such as an “object level” color bar which sits next to the object or “device level” color targets which are used for testing but do not get included in final images.

DPG performs device-level color calibration for each imaging session, which involves capturing images of physical color targets that get checked with calibration software to ensure that colors and tones are accurate (it even gives a “pass” or “fail”).

Despite this scientific calibration, we will still visually evaluate our images as there may still be slight variations in tone or color which the human eye is extraordinarily good at detecting. Additional evaluations are made during quality assurance checks, such as for camera focus, organics (something on the object like a hair or bit of fluff), or other issues that may need to be corrected with a reshoot.


Lighting

For our overhead cameras, where the bulk of our digitization takes place, the lighting setups have been designed according to cultural heritage imaging norms to create even lighting that illuminates as much of the object as possible.

Lighting is tested and calibrated daily to ensure that it is even across the imaging area and the object, while still allowing some shadows - shadows help to provide some visual depth which better communicates the three-dimensional characteristics of the objects.

Shadows help to demonstrate things like deep folds in stiff parchment, the texture of coarse handmade paper (and even in finer paper), or the relief designs on a wax seal. Carefully managed shadows are an important aid to our ability to represent the physicality of the objects.


Resolution

The details in any object may be measured using a magnifying loupe or other methods in order to identify the resolution necessary to clearly represent it in a photograph, a method called “item-driven image fidelity” (IDIF). Read about this method in an interesting article from the Smithsonian Institution Digitization Program here (PDF).

Based on extensive testing using that method, and in line with the Smithsonian's findings, DPG defaults to 600 ppi resolution - which actually far exceeds the needs of the majority of objects. This resolution captures fine details - including physical characteristics like the texture of paper - and is very manageable with our equipment for most items.

PPI stands for pixels per inch, meaning, in the case of 600 ppi, one inch measured on the original physical object is represented by 600 digital pixels.

There are cases where the details in the objects, fluctuating sizes of the materials within a collection, specific end-uses of the images, or other factors might prompt us to image them at 400 ppi in order to have a streamlined and efficient workflow. We typically don't go lower in resolution than that.

There are also cases where higher resolutions than 600 may be desired (or specifically indicated by the IDIF method). Photographic prints are usually imaged at 800-1000 ppi or more, along with items like artwork, coins, and other items which have fine details that benefit from a higher resolution.

Photographic film is a whole other realm, reaching as high as 10,000 ppi, with 7900 ppi being the standard for regular 35mm film.

Coptic 14
Coptic 14

High-resolution examples

There are several examples of items imaged at higher than 600 ppi in this exhibit - here are a few.

Marine Algae of New England Coast. Album
Marine Algae of New England Coast. Album
Sing Fat Co. : two postcards of the landmark building in San Francisco's Chinatown, from before and after the Chinese revolution of 1911
Sing Fat Co. : two postcards of the landmark building in San Francisco's Chinatown, from before and after the Chinese revolution of 1911
Negative sheet 11 : frame number 71
Negative sheet 11 : frame number 71

Detailed notes on film digitization - resolution & other considerations

As with any type of material DPG works on, we are much more cautious with physical handling than is generally deemed necessary in a commercial setting. To that end, the film digitization method used in cultural heritage imaging does not use fluid mounting or other techniques employed as standard in high-end commercial scanning. The film is touched only by gloved hands, clean and dry anti-newton's-ring-glass film holders with no edges, and filtered air from a blower.

Dedicated high-end film scanners used commercially such as drum scanners and Hasselblad/Imacon Flextight scanners are no longer made, in any case, and meanwhile newer imaging technologies have advanced so significantly that the results we can achieve with our much more cautious methods already exceed scans from those devices, and with much faster throughput, with only minor drawbacks.

While resolution and tonality in our images exceeds typical drum scanning, the dust, debris, hairs/fibers, and so on that are very common in archival film collections are visually reduced by fluid mounting in a drum scanner and will unavoidably be more evident with our dry method.

We look for debris that covers important details or that would be difficult to remove with digital post-processing in our digitized images, and attempt to use a filtered blower to carefully move or remove the debris before imaging (multiple times, if necessary). We don't perform any type of digital cleanup for images that get accessioned into the Stanford Digital Repository, but the idea is to create images that would be easy for someone else who wishes to use them for publication or other reproduction to touch up as desired themselves.

The 7900 ppi number mentioned earlier as standard for 35mm film imaging is not arbitrary; it is what we can image in a single shot with our current equipment. It also happens to far exceed typical film digitization resolution standards (such as the FADGI standards often referenced in cultural heritage imaging).

For most practical purposes besides extreme enlargements resolutions this high are not strictly necessary, but we feel they are justified by the IDIF philosophy. Though the actual resolving power of film - not to mention camera lenses - is not actually that high, we feel this high resolution captures the analog quality of film grain similar to what is achievable with a very fine darkroom print. The analog tonality of the film is better captured than at typical resolutions, and aliasing and other things that give digitized analog film images a digital look are completely avoided, producing as analog-looking of an image that is possible while also being practical.

For medium and large format photographs, single-shot images must be done at lower resolutions - but our single-shot resolution for these formats also exceeds common imaging standards for medium format (e.g. 120 film and similar formats) and large format such as 4x5" and 8x10" film. Less resolution is generally necessary because the amount of enlargement required is significantly less, but we must still choose resolutions carefully to avoid a digital appearance to the images.

Stitching multiple shots together to obtain higher resolution images of larger formats is possible, and we make the decision to do so based on each particular piece of film. It is a time-consuming and sometimes difficult process with film compared to other types of materials that we commonly stitch such as maps and other oversized items, but it is justified in many cases.