The ScholarSphere Users Group (SUG) – Then and Now

In 2012, in the weeks before the Penn State Libraries and Information Technology Services went live with the beta release of the ScholarSphere repository service, we met frequently to draft a fairly detailed FAQ page (see the web-archived version). In doing so, one of us – Nan Butkovich, head, Physical and Mathematical Sciences Library – had the brilliant idea of forming a user group for ScholarSphere that could serve as a resource for people just starting to use the service. Then we released ScholarSphere 1.0, and the rest is history, as they say. But the idea for the user group lingered on the back burner of our minds.

Fast forward almost two years later. In early 2014, as the ScholarSphere Service Team was gearing up to dedicate the next several months to revamping the user interface (UI) and improving the user experience (UX) of the service, we knew we would need a core set of users familiar with ScholarSphere to provide regular feedback on the evolving design. The notion of a user group for ScholarSphere began to percolate again. So, Michael Tribone, UI/UX Developer in ITS Services and Solutions, and I decided to launch the ScholarSphere Users Group (SUG) solely for user response and user assessment purposes. And we decided the group would be based in . . . wait for it . . . Yammer!

Why Yammer? Responsible UX means frequent interactions with users. Because of time constraints, we knew it wasn’t realistic to expect our users to commit to several meetings a month, in person, to show them wireframes and solicit their responses, contribute user stories, participate in user interviews, etc. Yammer was becoming – has become – a key communication channel for user groups at Penn State that typically have face-to-face meetings (typically begin and continue to operate that way, too). What if we tried forming a small user group that would be completely virtual and thus would meet only in Yammer?

The SUG kicked off in February 2014 with an in-person meeting that allowed members to see each other IRL before we went completely virtual. We kept the group under fifteen members, and we made it a private group in Yammer – largely because we’d never done this before and weren’t quite sure what to expect, and because we definitely wanted current users of ScholarSphere in the group. The SUG was instrumental in providing feedback on the emerging re-design of the UI, informing decisions on information architecture, look and feel of the site, and improvements on existing features and functionalities. It was also (and continues to be) a valuable resource for user interviews. In time, Sarah Irwin, Data Services Manager in SaS, came on board to assist with user assessment activities. We hoped the SUG could play a lightweight role in implementing some basic UX, participatory design practices. And it worked!

The original members of this private Yammer group were the following blend of librarians, teaching faculty, staff, and students:

We also added various staff from ITS Services and Solutions to the group, as a way for them to track user needs: Dan Coughlin, Mike Giarlo, Beth Hayes, Jeff Minelli, Jennifer Montminy, Rose Pruyne, and Adam Wead.

So, it’s time to acknowledge the good work of the virtual SUG. MANY thanks to the original “SUG 14″ for their participation in the UI re-design process for ScholarSphere 2.0!

A month following the September 24th release of ScholarSphere 2.0, I wanted to let folks know that the SUG is now a public group in Yammer:

The public presence of this user group in Yammer enables all of the Penn State community to learn more about ScholarSphere; post questions about the service; and contribute insight into, and ideas for, new features and functionalities. It’s also a chance to revive Nan’s original idea for the group, in which community members develop as expert users of the service – and, we hope, act as a knowledge base for the group. We invite everyone to join.

In going public, we also plan to hold in-person, bi-monthly sessions in the Libraries, where users across Penn State can get support depositing files, ask questions about the service, and be apprised of various “tricks of the trade” when it comes to using ScholarSphere. In addition, so that we’re not always learning in a vacuum, so to speak, we plan to use these sessions as opportunities for gaining knowledge in data management planning, data publishing, open-access publishing, and in building one’s online scholarly/professional presence, as well as for eliciting input on emerging new features and functionalities.

So, we hope you will join the public SUG in Yammer and soon be on your way to playing a key role in improving and enriching ScholarSphere!

An early wireframe that the SUG made decisions on in March 2014.

An early wireframe for the UI re-design that the SUG made decisions on in March 2014.


It’s been a while since we’ve blogged about ScholarSphere activities – but not for any lack of them! The service team has been incredibly busy since the release of ScholarSphere 1.0. Developers, especially, are coding away on ScholarSphere 2.0, for release this fall, when we will unveil – TADA! – a fabulous new user interface. In other words, we’ve been “ScholarSphering”! Below is what’s been happening of late.

ScholarSphere Collaborates with Zotero!

  • Ellysa Cahoy (Education Librarian at Penn State) developed the grant, in collaboration with Sean Takats (Associate Professor of History and Director of Research Projects at the Roy Rosenzweig Center for History and New Media).

ScholarSphere 2.0 and the ScholarSphere Users Group

    • In spring 2014 we launched the ScholarSphere Users Group (SUG), consisting of teaching faculty, librarians, and library staff. The SUG is one of the engines driving the 2.0 release!
    • This is a lightweight commitment, since SUG interactions take place mainly in Yammer.
    • Here’s our process:
      • Michael Tribone, UI/UX designer, posts user interface designs to Yammer.
      • SUG members offer feedback, generating ideas for improvements.
      • A revised design goes up, and additional responses are gathered.
      • Rinse. Repeat.
      • We do a quick poll to decide on a design, and, once decided, it’s becomes the design for implementation.
  • The service team will be conducting user interviews with a few SUG members in summer 2014, to help us get a fuller sense of what the UX needs to be like for researchers, what tools they currently use, and how ScholarSphere fits, or could fit, into that workflow.
  • Interested in being an SUG member? Request to join our group in Yammer!
  • Watch for a future blog post that will tell more about the SUG, its members, and its activities.

A Gentle Reminder of these Key Features in ScholarSphere

    • Create sets, groups, or collections of files
  • Get large files (> 500MB) into ScholarSphere via Dropbox
  • Allow you to give permission to others to deposit files on their behalf
  • Enable you to transfer ownership of files (perhaps after you’ve give permission for another person to deposit those files)

Next blog post: On promoting and marketing ScholarSphere

ArchiveSphere FAQs

1. What are the main ways in which the architecture of ArchiveSphere will differ from that of ScholarSphere? 

In terms of system architecture, ArchiveSphere and ScholarSphere, though they will live on different machines due to the extra level of protection we need for the data in ArchiveSphere, are identical: both are Rails web applications that speak to Fedora, as an asset management system with preservation functions, and Solr, as a search index, via a suite of community-developed Ruby components. That is, they’re both Hydra applications.  

In terms of software architecture, there’s a lot of overlap.  Both are based on a gem called Sufia, which was originally developed as the “guts” of ScholarSphere, and is now used by nearly a dozen institutions to power their own repository applications.  And both use core Hydra components such as hydra-head, active-fedora, and blacklight.  We have also been working on two new community components called hydra-collections and hydra-derivatives, which will again be used across both of our spheres and within the Hydra community.

2. What metadata schemas will be used with ArchiveSphere?

We’re looking at PREMIS implementation for preservation metadata — and, in particular, the RDF-based version — which is an exciting challenge to consider. Descriptive metadata needs a little more fleshing out. At the object level in ScholarSphere we primarily use the RDF-based Dublin Core terms vocabulary (with a couple of other elements thrown in where DC had gaps), but ArchiveSphere is an archival repository, and one of the aspects that sets it apart from our work with ScholarSphere is the need to consider aggregate metadata — the description we assign to collections, series, boxes, etc., all of which apply to the individual objects as well.

Obviously, EAD is the standard we have to work with here, but we want to look at ways to make it less EAD-ish on the public side, and more integrated with ArchivesSpace on the administrative end.

3. Is the team considering building in support for forensic processing? What about automated metadata extraction?

Forensic processing is very much on the radar of our group, but it’s not a high priority. Why? For one, Penn State is still in the process of hashing out its forensic workflows, yet development on ArchiveSphere has already started. Furthermore, we feel that other Hydra partners with more mature forensic workflows may be better positioned to take what we’re doing with ArchiveSphere and work in their digital forensic concerns (UVa, Stanford, etc.), which we can then adopt/adapt locally (the joys of community development!). Finally, we have a lot of material that has come to us (and material that continues to be acquired) virtually, without any use of physical media at all, or using physical media supplied by the archives. These constitute our largest born-digital collections.

For now, we plan to create disk images using local workflows, while keeping an eye on how our development might evolve to include disk images down the line. 

We are planning to use FITS (File Information Tool Set, an open source characterization tool from Harvard U. Libraries) to extract metadata and file format features from deposited files, and Tika for full-text extraction and search.

4. Is the team considering incorporating any of the functionality of tools like Archivematica and/or BitCurator?

ArchiveSphere draws a lot of its influence from the work of Archivematica, and while there is some overlap in the functionality between the two, we also feel that there are certain benefits to developing Amatica-like tools within the Hydra framework and community rather than seriously considering integration between Archivematica and ArchiveSphere, which run on different technologies and have different architectural and workflow assumptions.

BitCurator: TBD. Penn State is moving toward adoption of the BitCurator tools for its developing forensic workflows, so opportunities may arise. Archivematica will have support for disk images soon, but it won’t actually wrap the BC tools into their interface (as far as we know). There might be some future development opportunities with Hydra here, but we feel this level of forensic tool integration with repository apps is still a ways off in the profession.

5. Are you planning to facilitate various layers of access? For example, items restricted to Penn State users, items restricted to a group defined by the archivist, items kept dark for a period of time, only metadata (not bitstream) accessible to public, certain administrative metadata fields hidden from public, etc.?

Yes, ALL of the above.

Nuanced access options are highly desirable, and one of the primary motivations for this project.  The first phase of the project, however, is a back-office tool only, so we may not build out this level of access controls in our first release.  It should be noted that Hydra already provides tooling for most of this out of the box, so it’s not “hard,” it’s mostly that we have lots of other priorities for the first phase of the project.

6. Are you planning to facilitate the rendering of various file types within the user interface? For example, video, audio, CAD?

Our collection development priorities and use cases will drive these decisions. For instance, we don’t have CAD files, so no, but we do have design files, so we will need to accommodate formats from QuarkXPress and InDesign. We do have audio and video to consider, but it’s an open question whether we’ll develop custom functionality for this or build on tools developed by other Hydra partners (such as Indiana’s and Northwestern’s work on Avalon, and WGBH’s work on their audiovisual repository).

7. Are you planning to incorporate automated derivative creation for access copies?

Yep. We realize there is some debate about when the best time to do this kind of transformation is, but sorting out the costs and benefits is murky, and for now we’re operating on the assumption that normalization for access will occur at the point of ingest at the same time as normalization for preservation.   (We are using a brand new Ruby component for this called hydra-derivatives.) 

8. Are you planning to facilitate reuse of the collection material by classes and faculty members? For example, in the way that Northwestern’s Digital Image Library proposes to enable users to create their own image galleries. What about support for user-generated remixing and analysis of data, such as visualizations, data mining, mapping, timeline creation, etc.?

Right now, our delivery and access plans are archivally-focused. But once we can deliver born-digital collection materials, wrap them up with metadata about both digital and analog materials, and even begin to incorporate digitized material, we wonder: what is the utility of the traditional finding aid format? We’ll take it as our starting point and then try to disrupt it in strategic ways, and some of these might/should include visualization tools, or interfaces matched to the particular characteristics and usage needs of a particular genre (think email). But still TBD, as it’s not part of current planning cycles. 

Related: we’ve had requests for corpus data sets (e.g. digital newspapers) from remote researchers, so accommodating such needs is definitely on our radar, but as you can see, a lot of features are on our radar and we’re working on multiple Hydra apps concurrently, so we have to push some off for now.

9. Are you planning to incorporate support for web archiving and access to web archive records?

Penn State is an Archive-It partner, but web captures are archived remotely on Archive-It’s servers. Archive-It does provide WARC export of web captures at the end of the partnership (and we are exploring the use of tools like WARCreate to generate local WARC files), but we am have not seen many impressive exemplars of how to deliver WARC for access in locally developed systems. At this point, because of our partnership with A-It, it’s not a high priority use case.

10. Are you planning to facilitate the collection of content as it is created? For example, collecting materials from faculty members as they create their born-digital manuscripts, rather than waiting until they retire to accession them.

Absolutely! We need to work out how material deposited in ScholarSphere ultimately flows into a formal archival repository (university archives/ArchiveSphere), but at a much higher level, we simply need to work out the organizational collaboration that gets everyone on the same page about this. Furthermore, our collecting of faculty members would have to become more active and less reactive (which is not a knock on our fantastic university archives program but an acknowledgement of a problem all archives currently face). We also want to enable this kind of functionality for offices to deposit files into the institutional/university archives.