One of the questions Patricia Hswe and Linda Friend have been receiving at their ScholarSphere demonstrations is, “how long will my content remain accessible in ScholarSphere?” The short answer, and the one they’ve been giving, is, “as long as you leave it there”; the ScholarSphere team is committed to archiving and preserving deposited data, and, unless a user chooses to delete his or her own content, all items will remain safely in the repository. For all current, practical purposes this is true. But a longer and more detailed response to the “how long?” question would need to include phrases like “for the foreseeable future” and “as far as we know.”
When I served as Institutional Repository Coordinator at Duke University, one frequently asked question I received was “What is an Institutional Repository?” My stock answer was that it was an access and discovery platform for Duke faculty and student scholarship as well as born digital institutional records. The follow-up question almost always had to do with preservation of the content; that answer was usually a referral to a list of preferred formats for deposit.
As we head towards the launch of Penn State’s IR, ScholarSphere, these questions now loom large for us. My stock answer at Duke also applies for ScholarSphere as it will offer access and discovery for faculty and student scholarship. ScholarSphere is also built on a robust platform that allows for flexible preservation services. So what is the baseline for content preservation offered by ScholarSphere?
First, all content made available on ScholarSphere will have redundant back-up. All files deposited will get a SHA-1 (Secure Hash Algorithm) checksum which is essentially a digital “fingerprint” in the form of a string of characters that can be generated for any digital file. If the file changes in any way that digital signature will change, indicating the alteration. In addition, ScholarSphere uses FITS (File Information Tool Set) to identify, validate, and extract technical (and some descriptive) metadata from the file, identifying the file type, version, and other information that helps us manage the file. Regular fixity checks will be run against the files to check for changes, such as file corruption. Beyond this initial level of preserving the file for access and discovery, additional preservation services are in the planning stages.
What might these additional preservation services entail? Depending on the Library’s commitment to the files submitted, we may look at normalizing files into standard formats to facilitate the migration of files as formats become obsolete, such as migrating all Word files (such as .docx) to a format like PDF/A, the ISO standardized version of Portable Document Format (PDF). A higher level of preservation would be to preserve both the source file and the normalized copy. For some scholarly works such as certain types of data sets, preservation or emulation of the software used to create the files may also be needed to carry the content forward through time.
The main drivers for the adoption of additional preservation services such as these will be policy and resources. Each of the services listed above requires increasing amounts of resources (staff, expertise, and IT tools) to accomplish. Just as we have policies that guide us in the building and preserving of analog collections as well as limited resources to implement those policies, the same is true with the digital content collected for ScholarSphere. Policy can also help creators make informed decisions with regard to technologies and formats used for their work, which could potentially ease the amount of resources required and enhance the longevity of scholarly content. As ScholarSphere evolves, the Library will be prepared to suggest best practices with regard to different documentary types and file formats.
Penn State ScholarSphere is a new research repository service offered by the University Libraries and Information Technology Services, enabling Penn State faculty, staff, and students to share their scholarly works such as research datasets, working papers, research reports, and image collections, to name a few examples. ScholarSphere will make these works more discoverable, accessible, usable, and thus broadly recognized and known.