Data, Data Everywhere . . .

. . . and more than a “drop to drink.”

With reports of a “data deluge” (The Economist,  February 25, 2010) and the “petabyte age” (Wired, June 23, 2008) in the last few years, it’s little wonder that issues about data – their management, openness, accessibility, usability – are now gaining rapid currency in the library and archive spheres. In 2010 the National Science Foundation made data a more urgent matter when it announced that, beginning in 2011, data management plans would be required in grant proposal submissions to the agency (reference URL for press release). As a result, many libraries and IT divisions, including our own at Penn State, have started the work of addressing research data management. For many of us, this has meant the formation of new working groups or hot teams, consisting of a combination of subject specialists, public services librarians, metadata librarians, digital curators, and digital library architects, to create new services and processes for helping faculty and graduate students manage their data.  

Management of research data is a priority for the Content Stewardship Program. The time is ripe for creation and provision of new services addressing data management not only because of the NSF requirement but also because our users are telling us so. In the year since I started at Penn State Libraries, we have been consulting with a variety of stakeholders in need of data services, from molecular biologists, to graduate students in ecology and environmental sciences, to geoscience researchers eager to share their data, to social scientists with long-term data archiving needs.   

As I have been learning more about managing research data and thus investigating what our peer institutions are doing in this space, I have come across a few key, useful resources. One in particular is guidance on something called the “data interview.” Librarians are familiar with the “reference interview,” a process whereby the librarian asks the inquiring patron a range of questions, with the goal of determining what may be the best resources for, and thus the best path to, helping the patron with her research questions. The data interview is quite similar.

Our colleagues at the Purdue University Libraries Distributed Data Curation Center suggest asking questions like these in the data interview:

  • What is the story of the data? (This kind of open-ended inquiry gives the researcher an opportunity to talk about their methodology, workflow, data, topic of research, etc.)
  • What form and format are the data in? (Are the data bound to proprietary structures or systems?)
  • How could the data be used, reused, and repurposed? (Understanding the various uses of data can help increase its relevance for other research communities.)
  • What are the potential audiences for the data?
  • Who owns the data? (What are the intellectual property rights issues, if any, attached to the data? Will an embargo need to be set up?)
  • Does the dataset include sensitive information?
  • How should the data be made accessible? (What possibilities for access exist for the data? Only via a web interface, or are there machine-to-machine channels to consider?)

(For the complete set of data interview questions and their motivations, see Witt, M. & Carlson, J. [2007, December]. Conducting a data interview. Poster session presented at the 3rd International Digital Curation Conference, Washington, DC.)

Tools like the data interview and data management templates (look for examples in a future blog posting), as well as NSF guides tailored for particular directorates and divisions to follow, enable librarians to learn more about the research taking place at their institutions. They also give researchers a sense of how to think through their data in order to manage it effectively and position data for broader sharing and accessibility.