Session 6 – Lightning Talks

How reliable are our forensic tools?

Working with born digital files requires the use of various tools, and there is an expectation that the software will perform as advertised. The tools do not always work properly, but problems are usually apparent in the form of error messages or other clear indicators. Unfortunately, this is not always the case.

n this lightening talk, I will discuss my work using FTK Imager and IsoBuster to image optical discs containing project files and videos of lectures at the Getty. In the process of exporting files from the images, I discovered that FTK Imager and IsoBuster sometimes generate corrupt ISO files without producing error messages. In some cases the ISO files were completely unusable, while in other instances videos in the ISO images could be played but were missing portions from the original.

In addition, I found that using FTK Imager’s file export feature on non-corrupt ISO images sometimes produced files with checksums different from those on the mounted image. I will discuss this process of discovery and how the Getty’s Institutional Records and Archives adjusted its workflow in response.

  • Lorain Wang, J. Paul Getty Trust

Managing the Environmental Impact of Digital Preservation

Environmental sustainability is an imperative that has engaged the cultural heritage community for many years. This has taken numerous forms, such as reducing environmental impacts from the built environment, disaster planning and adaptation in the face of climate change, and reevaluating purchasing decisions based on products’ environmental impacts. However, the drive toward environmental sustainability has not been thoroughly explored in relation to digital preservation activities.

In this lightning talk, the speakers will present a summary of their research on the environmental impact of digital preservation, and argue for a shift in the way that digital preservation activities are evaluated. The authors propose that sustainable practice will come only from critical examination of the underlying motivations and assumptions of digital preservation practice, and not from improvements in technological efficiencies. The speakers will briefly explore the paradigm shift that is needed in three areas of digital preservation practice: appraisal, permanence, and availability.

Tim Walsh, Canadian Centre for Architecture
  • Laura Alagna, Northwestern University
Keith Pendergrass, Harvard Business School
  • Walker Sampson, University of Colorado Boulder

The Case of the QIC Data Cartridge Tapes

As an intern at the NASA-Caltech Jet Propulsion Laboratory (JPL), I worked on a digital repository focused on capturing and preserving Entry, Descent, and Landing (EDL) records. I was given two QIC data cartridge tapes as contributions to the repository, and my talk will outline the steps I took to try and recover the data on the cartridge tapes, which ultimately I was unable to do.

  • Sara Bond, UCLA GSEIS Information Studies – Introduction and Overview is a project funded by the Institute for Museum and Library Services (IMLS) to to study and advance the adoption of digital forensics tools and methods in libraries and archives through professional education efforts.

This project will address two primary research questions: What are the primary institutional and technological factors that influence adoption of digital forensics tools and methods in library and information science (LIS) classes in different educational settings? What are the most viable mechanisms for sustaining collaboration among LIS programs on the adoption of digital forensics tools and methods? This lightning talk will summarize the project rationale, scope and projected deliverables.

  • Cal Lee, University of North Carolina at Chapel Hill School of Information and Library Science

BitCurator NLP

The BitCurator NLP project is developing software for collecting institutions to extract, analyze, and produce reports on features of interest in text identified in born-digital materials. The software uses existing natural language processing software libraries to identify and report on those items likely to be relevant to ongoing preservation, information organization, and access activities. These may include entities (e.g. persons, places, and organizations), potential relationships among entities, and topic models to provide insight into how concepts are naturally clustered within the documents.

This presentation will focus on two software services. The first, BitCurator Access Webtools, allows users to create customized web-accessible views from groups of raw and forensically packaged disk images identified within collections. Selected disk images are automatically processed in a background service that identifies candidate file types (common document formats), extracts and indexes text identified in relevant files. and generates statistical reports for each group of images. A web interface allows users to browse the contents of file systems, examine text extracted from files, and view automatically tagged features including entities. The second, bitcurator-nlp-gentm, uses a similar text-extraction method to prepare candidate materials identified within disk images for topic modeling.

Abstract topics generated from these materials can provide insight into term clustering, differences in term distribution within particular disk images versus the group, and assist in identifying outliers or unrelated materials. The tool incorporates a widely-used topic modeling technique (LDA), and leverages existing visualization platforms (including PyLDAvis) to support visualization. BitCurator NLP is supported by a grant from The Andrew W. Mellon Foundation.

  • Cal Lee, University of North Carolina at Chapel Hill School of Information and Library Science
  • Kam Woods, University of North Carolina at Chapel Hill School of Information and Library Science

Normalizing partition system analysis to understand disk images

The objective of disk imaging software is to make a faithful representation of original source media. The storage system parsing software that comprises archival workflows should equally make a faithful reproduction of all of the files on the disk image. However, default workflows of extracting files from disk media may not be designed to recognize all available file systems and may miss entire partitioning systems, potentially resulting in significant data omission.

File system detection – a different problem from parsing – is a foundational problem in file extraction workflows worth further attention. There are numerous examples of archival material that would be adversely affected by workflows that rely on a single parse of a disk image, including hard drives of desktop computers (especially considering dual-boot computers or drives with recovery partitions), USB keys formatted for multiple operating systems, or older software installers (e.g. hybrid Mac/PC optical media).

To assist with file system detection, we have released supporting tooling to bring independent partition system parsing perspectives to file extraction workflows. We released a Disktype output parser that generates DFXML to represent container layers more foundational than file systems.

This work includes DFXML core language and library updates to describe the path to discovery of file systems. What we released can enable archival workflows to discover file systems in a mechanically parseable way. Overall, we hope for an outcome of this talk to be providing institutions more information to support decision processes regarding retention of disk images acquired through archival processing.

  • Alex Nelson, National Institute of Standards and Technology
  • Dianne Dietrich, Cornell University

