Monthly Archives: February 2012

Publishing and Curation Services Vision and Policy, Part 2

In my last post a couple of weeks back, I outlined a vision for Penn State publishing and curation services (PCS) in order to frame further discussions about the policies that will guide those services.   Now I’d like to discuss several policies that are implicit in that vision, as well as their implications for how we will offer these services.
1) Everything we do must help researchers and students achieve their scholarly aims. 
Every decision we make needs to be tested against this one. Anyone care to argue with that?  Moving right along, then…
2) Penn State users are the primary contributors of content we publish and curate. 
There will be some exceptions: co-authors or collaborators may be faculty from other universities; a journal will include articles from authors from various schools.  But because we receive our primary funding from Penn State, we will need a contributor or a sponsor of some sort to be directly affiliated with the University.  
3) Non-Penn State consumers of the content are as important as Penn State users. 
This is really important, and, because it’s also somewhat contrary to the primary model of service for an academic library, we are apt to forget it.  Consumers and contributors will interact with content differently and have different needs.  Our researchers’ primary audience for their work may not  be their colleagues at Penn State, it is probably their colleagues around the world.  There are implications here for how we direct our efforts to enhance discovery.  Most people will not come to the Penn State Libraries website, or the Cat, or LionSearch, to find the material Penn State researchers trust us with.   
The repository is not a box in which we punch some holes for people to reach into. We have to leave to the box open, and even strew its contents across the web for others to find.    This means most of our material is available open access.  However, the requirements of the contributor may well trump those of our external researchers, such as when there are embargo requirements on a dataset or a publication. 
4) The services we provide will be for stuff that is primarily scholarly. 
What does this exclude?  Hardly anything.  
If the researcher/contributor can articulate a reason that the material is of some scholarly import, why would we not agree to make it accessible?  Again, this is very different from how libraries have traditionally made decisions about what stuff we care for, e.g. what we collect.  But we have done that in the context of other institutions, such as publishers, making decisions for us by choosing what to publish.  Rarely have we had to evalute the work of our faculty face-to-face.  Our collection development policies broadly support the curricular and research needs of the University.  If material is produced through the Penn State curriculum or research activities, would it not fit? (By the way, I have heretical ideas that talk of “building a collection” may not make sense in the context of publishing services.)
There are some materials that will be better served by developing additional services.  For example, electronic records management will use of much of the same infrastructure as publishing and curation services, but the mission is not the same.  Having already had pretty extensive discussions about what those services will look like, we know they are very different from what we are proposing now.  
There will be some matters, such as resource limits, as well as legal or policy restrictions may limit what we can handle or how we do so.  But the presumption is in favor of the researcher who wants or needs help in sharing and managing their stuff.
5) Most of the stuff we handle will be inactive, if not at a stage of completion. 
We’ve had some internal discussion among the stakeholders about research needs for collaborative workspaces that will allow for teams to more easily share data and tools.  Something like a combination of wikis, GoogleDocs, and DropBox, but with a lot more storage, the ability to execute code, much better object management, and much more security.  There are platforms that can help some communities, and some of our colleagues are piloting similar services through their own publishing services in conjunction with others.     
We need those services, but that’s not what we are developing now. Within the Research Life Cycle, we’re now primarily aiming at the dissemenation and discovery stages.  I think that the vision statement in my last post is big enough to accommodate more support during the active phases of research.  But first let’s get some basics in place first. 
6) The scholarly stuff we handle will persist.
That is my digital preservation policy.  We are a library, and people expect us to keep stuff.  And if the material is to have any value to researchers, it needs to be citeable and continually accessible.  Above all else, researchers value libraries because of our reputation for preservation and stewardship. We should do nothing to call it into question. (How often do libraries promote their weeding projects?) 
So what exactly must persist?  Only the bits? A file or group of files?  The relationships between those files?  The code that prepared derivative data from raw data?  The whole thing, exactly as it is and was?  We worry a lot over format obsolescence, but answering those questions may depend more upon what the content represents, the expectations of the contributor as well as the user community, the value of the content for others, and several other less technological factors. Tim Pyatt pointed out in one recent conversation that depending upon the nature of the material, we may define different tiers of curatorial attention.  Official university scholarship, such as ETDs that are required for the credential, probably get the highest attention, while less valuable (or perhaps poorly documented) materials would be more lightly touched over time.
We definitely have to educate our clients.   I am not saying that we shouldn’t develop guidance for researchers on best practices to make content durable, or that we wouldn’t develop tools to help them prepare materials for our care. Certainly it would be irresponsible for us to claim materials that we can’t adequately care for.  However, while we may not be able to guarantee the readability of a particularly idiosyncratic pile of data, that in and of itself shouldn’t be a reason to reject it.  The museums of the world still hold thousands cuneiform tablets for which no Rosetta Stone has been found.  
7) We must respect and work within a framework of other policies and laws. 
These include copyright law, conditions imposed by funders, or university policies on intellectual property.  Our services will help researchers navigate those policies and laws to achieve their scholarly aims.  This means that we will at times have to interpret contracts (Does the publisher allow pre-prints to be shared?), law (Is that within the bounds of fair use?), and policy (Does NSF require this stuff be open access, or just require that it not be destroyed?).   So yes, we’ll develop our own policies and we will frequently consult other experts.  All librarians need to develop better understandings of the legal and policy regimes we live in. 
Some of the above may seem obvious, but they deserve to be called out because they need to direct our focus for the next year and more.  Penn State is late to develop coherent services around a commonly defined infrastructure (in other words, we ain’t got no IR, y’all). But that means we can learn (borrow, steal) from colleagues at other institutions to help us answer the next round of questions about guidelines, policies, or practices that will help our researchers to achieve their scholarly aims
I’ve only tried to provide a framework for future discussions among our developers, our scholarly communications staff, and our public services stakeholders.  It might be interesting to expand on anyone or all of the above six items in full blog posts.  I invite my colleagues to do so, and I invite any reader to chime in with other implicit policies that derive from the original vision. 

Publishing and Curation Services Vision and Policy, Part 1

In Patricia’s last blog post, she discussed how we are engaging stakeholders to define our repository services.  As that work has progressed, we have begun to bump up against questions that often lead to a call for the development of a policy.  For example: 
  • How will we decide what material to accept?  “That depends upon our collection policy.”
  • What file formats are okay? How long are we promising to keep this stuff? “That depends upon our digital preservation policy.” 
  • What rights clearances do we need people to provide?  “That depends upon our copyright policy.” 
These are important questions, and they are just samples of what has come up. We have many more to ask and answer.  One thing that bothers me, however, is that these questions imply hurdles, or barriers that we have to put in place.  They begin with exclusion: We will not accept some things; we will reject file formats; we will not handle things if rights aren’t cleared.   Furthermore, they are almost showstoppers that lead to more hard questions:  “Who’s going to develop that policy?”  “What do you mean by collections?”  “How can we establish preservation policy when we don’t have infrastructure in place?”   
Policies must be guided by a strong sense of purpose: they should be designed to help you achieve your goals.  Program Sigma is the handle we’ve adopted to refer to a number of activities we’re undertaking this year, and elsewhere we have said that Sigma is aimed at the “development of new services that leverage existing infrastructure and … the design and development of a repository services platform to support the ingest, management, and delivery of digital library collections, student and faculty papers, research data, and electronic business records.”  
That’s good enough for our internal audience, but we need to be able to describe our vision for ourselves and others, and in language that is generally understood.  So, as a starting point, I offer up a vision statement that I have adapted from previous work: 

Penn State Publishing & Curation Services (PCS) organize, publish, and distribute the results of our community’s research and scholarship, allowing our faculty and students to reach a worldwide audience of scholars.   PCS gives researchers the ability to create new publications, to distribute their papers, presentations, publications, datasets* or other creations, and to comply with policies that require and encourage public access.  Academic departments and colleges will use PCS to collect students’ work to create a record of academic achievements and enable future research.  Readers and researchers worldwide will access our researcher’s work via popular web-based discovery tools such as Google and Bing, as well as library-oriented discovery tools and catalogs, e.g., WorldCat, while Penn State users will also use local tools such as the CAT or LIONSearch.  PCS will provide these users with state-of-the-art interfaces and functionality designed to integrate smoothly into their various research and work environments.  The PCS suite of services provides the Penn State community with a powerful publishing platform, one that will help the University meet the goals of the 21st Century Land Grant University by extending the audience for and application of knowledge.

That still has a way to go, and I welcome it being picked apart and re-drafted.  Does this ring true?  If so, does this help us to lay the ground for further service description and development? 
In an upcoming blog post, I will return to this vision statement and try to articulate a few policies that I think are implicit within it.  If you want to play along, you could start to identify those yourself. 
*Note:  Added “datasets” to vision statement at suggestion of D. Salo after initial publication. I’m glad she pointed out that was missing.