Session 1

St John Karp, Pratt Institute; Brock Stuessi, Mike Kelley Foundation for the Arts | BitCurator Consortium

The Interconnectedness of All Things

St John Karp, Pratt Institute

It is in the nature of documents to be interrelated. The traditional finding aid presents documents in a flat or hierarchical structure, but documents’ true nature is a web of interconnected relationships. An archivist working with a born-digital collection can use software to determine automatically the relationships between documents stored on digital media. Different drafts of an author’s novel, edited versions of a photographer’s pictures, documents that are reproduced elsewhere in different formats — all such relationships are automatically discoverable and can help the archivist understand and describe a collection that they would otherwise have to examine without affordance.

I am developing software that will assist archivists working with born-digital or digitized materials. This software, named “Eltrovo”, identifies similarity between files to determine whether they may represent versions of the same work. Eltrovo is designed to be forward-looking, to provide accessible interfaces to assist the archivist, and to use next-generation conceptual models such as “Records in Contexts”. Archivists can also use Eltrovo to create their own connections between files, annotate relationships, and output finding aids.

This project is currently in the design phase. During the first half of 2024 I will implement an alpha version of the software as part of an independent study at the Pratt Institute under the supervision of Anthony Cocciolo, the dean of the School of Information.

Eltrovo is an experiment in different ways to approach discovery and description in digital archives. I hope it will open up new avenues of exploration and development, and provide a glimpse of what is possible with tools that have a sophisticated understanding of digital content.


Attack of the clones: a data-centered approach to file deduplication and appraisal

Brock Stuessi, Mike Kelley Foundation for the Arts

Whether through backups, circulation, or reuse, duplication is an essential characteristic of many creators’ digital workflows. With a growing sense of the importance of appraisal in digital preservation work in order to reduce the environmental and economic costs of storage, how should we as digital preservation stewards approach the complex, duplicitous digital holding that enter our institutions? In this paper, I present a programatic approach to large scale deduplication and appraisal that I developed while working with the digital holdings at the Mike Kelley Foundation for the Arts. This approach borrows tools from data analytics to wrangle an unwieldy web of files across many hard drives represented in a PostgreSQL database. In sharing my approach, I hope to spark conversations around novel approaches to digital appraisal work and to share concepts I arrived at through experimentation others can apply to help solve similar problems. As we look to the future of digital archives acquisitions, I believe the data-centered approaches I share in this paper are an important tool in our collective toolbox to make sense of and distill increasingly complex collections of digital materials.



Cite this resource:
St John Karp, Pratt Institute; Brock Stuessi, Mike Kelley Foundation for the Arts. (March 19, 2024). Session 1. BitCurator Consortium.