BitCurator Access and BitCurator NLP – Updates and Future Directions
Cal Lee, Kam Woods | BitCurator Consortium
The BitCurator environment supports a variety of digital curation activities. The BitCurator Access project extended this to the point of interaction with end users, providing and supporting a variety of access mechanisms.
We developed tools that support access to disk images through three exploratory approaches:
- building tools to support web-based services,
- enabling the export of file systems and associated metadata,
- and the use of emulation environments.
We’ll highlight two BitCurator Access software products: BitCurator Access Webtools which supports browser-based search and navigation over data from disk images, and a set of scripts to redact sensitive data from disk images. Members of the BitCurator user community expressed that they would like tools to help in identifying and exploring information based on specific entities (e.g. people, places, organizations, events) associated with collections.
The BitCurator NLP project aims to address this need by incorporating existing natural language processing (NLP) and visualization tools on top of the existing BitCurator environment and BitCurator Access Webtools. Disk images are internally complex and require the sorts of underlying software that is available through the BitCurator environment and BCA Webtools, adapted for this purpose. Disks can also contain a variety of data and document types, requiring considerable pre-processing to extract content to be processed by NLP tools.
We’ll report on the BitCurator NLP project, which is building from and extend a variety of tools and initiatives to provide services that can be run independently or be called by existing software environments being used by LAMs.Read More
Cal Lee, Kam Woods. (April 28, 2017). BitCurator Access and BitCurator NLP – Updates and Future Directions. BitCurator Consortium.