Over the course of a research project scholars collect data to analyze, write about, and discuss. This guide provides an overview of concepts to consider on how to manage that research data, to preserve and share it for the long term to enhance scholarship and fulfill funder requirements.
The content of this guide draws significantly from the New England Collaborative Data Management Curriculum.
There are a number of definitions for ‘research data’. Here are two examples of commonly cited definitions.
“Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results” (University of Edinburgh).
“The recorded factual material commonly accepted in the research community as necessary to validate research findings” (Excerpted from OMB Circular A-110 36.d.2.i).
Data covers a broad range of types of information:
Since 2013 the United States government requires the results, including data, of federally funded research to be made free available to the public. Many other non-government grand funding agencies and a number of publishers, particularly open access publications, now also have similar requirements.
This video by the NYU Health Science Library provide a humorous overview of some of the concerns and needs for proper data management:
There are serious issues surrounding data management. Some of these challenges include managing the work flows of team science, getting everyone on the team to follow a plan, and making data management a priority. Some issues concern the challenges presented by the frequency of students and post-docs rotating in and out of labs, having data stored in multiple places, and in some cases, having multiple research team members and data spread across the globe. Drs. Stephen Erickson and Karen M.T. Muskavitch (2013) list some examples of serious data management issues that they noted for improvement:
Issues that come from the lack of responsibility for research data:
Best Practices
Here are some best practices for outlining roles for managing data and laboratory notebooks. Unless the distribution of responsibility is clear, misunderstandings can result and compliance jeopardized.
Many research funders require that you have a plan to manage and/or share your data.
These are some questions that are commonly addressed in a data management plan:
Here is a simplified example of a data management plan:
Some of the major issues with managing data are related to locating and making sense of data. Practical lessons from the field of records management apply in these situations.
Common records management failures include:
Best Practices
These are some best practices for creating file names. Poorly constructed file names can cause issues when transferring files from one format to another, or to another operating system.
Often described as 'data about data', metadata contextualizes information. It can help you to answer several important questions:
There are several types of metadata that can help make sense of your data. Metadata can be descriptive, it can be structural to navigate the files, it can be administrative, or it can be technical. Each of these metadata may also allow someone a better chance of finding the information while conducting a search within a collection or database. The more of these details available, the more options the searcher has to locate and make sense of the data.
Best Practices
Here is a list of common metadata fields associated with a data set.
Properly storing, backing up, and securing data are important responsibilities. Your institution and sponsor want you to take these responsibilities seriously to ensure the integrity of your data.
Here are some guiding questions for this exploration:
Best Practices
DataONE also has a primer for to avoid accidental loss of data:
When it comes to data ownership and data retention there are a lot of overlapping policies. University Intellectual Property policies can cover the ownership and retention of data related to patents, the Institutional Review Board wants to ensure that documentation of human subjects' data are retained and/or destroyed appropriately, and the funders and publishers want you to retain data to defend the integrity of your findings, and then there are federal guidelines like HIPAA.
Data retention: how long should I keep my data?
The easy answer to this question is: it depends. There can be a lot of overlapping regulations depending on the type of research you’re conducting, the nature of the data you have, and the sponsors of your research. Here are some examples of overlapping data retention requirements:
Best Practices
Check with your Funder and Publisher Requirements
Questions of data validity: If there are questions or allegations about the validity of the data or appropriate conduct of the research, you must retain all of the original research data until such questions or allegations have been completely resolved.
After a project you may want to consider appraising, and publishing or depositing your data in a repository. There are a variety of factors that impact your ability to share data with outside parties.
Best Practices
Much of this guide was adapted from the New England Collaborative Data Management Curriculum.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.