Skip to Main Content

Research Data Management

History of Data Sharing

Older Forms of Data Sharing

  • Sharing one on one
  • Sharing as part of a small lab team
  • Sharing between faculty and students
  • Sharing a few compiled results (tables, diagrams) within the context of a publication

Newer Forms of Data Sharing

  • Sharing with large numbers of researchers outside a research team
  • Sharing data as a distinct entity not as a supplement to a paper
  • Broad dissemination via the internet
  • Sharing with the public

Why Share Data?

  • Data as a public investment
  • Required by publishers
  • Required by government funding agencies
  • Informs new research
  • Maximizes transparency, accountability and scrutiny of research findings
  • Increases the impact and visibility of research
  • Provides credit to the researcher as a research output in its own right
  • Critical to the success of collaborative research
  • Reduces duplication of effort
  • Provides great resources for education and training

Who benefits?

  • Researcher and research team
  • Scientific communities (including citizen science)
  • Students
  • Public
  • Funding agencies

Reasons Why Data is Not Shared

Faculty/Researchers Reluctant to Share Data

  • Do not understand the need or benefits to sharing data
  • Poor quality data sets due to experimental design, mismanagement
  • Do not want to lose control of data
  • Fear criticism
  • Legal ramifications

Students’ lack of awareness of value of data

  • Do not understand that others might want their data or data could be useful in future project
  • Lack of data management training
  • Uneasy about sharing potentially confidential data
  • In group projects lack of clarity over who is responsible for data

Basic Considerations of Data Sharing

Data sharing refers to the practice of making research data available to others for validation and replication of results. Major national funding agencies such as the NSF, NEH, and NIH require all requests for funding contain a data management plan (DMP) addressing how the proposed project will comply with the agency's Data Sharing Policy. In addition to meeting funder requirements, sharing data benefits the scientific community and society by providing more opportunities for collaboration, better science, and well-informed policy decisions.

The NSF requires that the following question be addressed in all grant proposals' data management plans:

"What will be the policies for data sharing and public access (including provisions for protection of privacy, confidentiality, security, intellectual property rights and other rights as appropriate)?"

When establishing data sharing and access policies and provisions, consider whom you will share your data with, how it will be shared, and when in the research process you will share it. For example:

  • Is the data shared with other researchers or the general public?
  • Whom will you share it with? Who may be interested in your data in the future and what might it be used for?
  • Are there ethical issues or privacy concerns? Do any regulations apply to the data (e.g., HIPAA)?
  • Do you have the right to share the data if it is not produced by you?
  • Will you make the data available before or after you formally publish your results?
  • Is your data understandable by other researchers?
  • Should the data be restricted or embargoed for intellectual property reasons?
  • Will the data be licensed? Will there be any conditions on its reuse?

There may be times when you may consider restricting access to your data if there are ethical issues, legal issues, or time constraints. There may be conditions of confidentiality associated with the research as well. Personal data or sensitive data may not be suitable for sharing with other researcher depending on whether informed consent has been obtained from participants. You may wish to consider anonymization techniques or data aggregation for numeric data, editing of video or sound recordings, use of pseudonyms in qualitative data, etc. In any event, consultation with your university’s research ethics office is advisable. In a case where you expect your research data has the potential to become commercially valuable or exploitable by a third party, you may want to speak with your university’s intellectual property expert.

There are several ways to share research data; the appropriate method will depend on the nature of your research and the content of the data.

  • Informal sharing: provide access to research data upon request
  • Supplemental information: provide research data in support of published articles
  • Institutional repository: deposit research data in local repository
  • Disciplinary data repository: deposit research data in an appropriate community-based repository

Legal Concerns

Sharing data may implicate third-party proprietary rights, including patents, copyrights, database rights, trade secrets or other information protected by license or non-disclosure agreements (NDAs); data subject rights, relating to the subject of the information, such as privacy, defamation, etc.; or content-related legal issues specific to various countries you source or provide the data in, such as import restrictions, obscenity, blasphemy, or other rules. You should assess your project for the types of information involved, and understand if those types of information might cause problems in places where the information would be disseminated.

Legal Concerns - Property Rights

Proprietary rights may crop up in almost any field. You may need to consider the major categories of proprietary rights of patents, copyright, trademarks, and trade secrets and other proprietary rights governed by contracts (NDAs).

Copyrights provide rights holders the right to authorize the reproduction, dissemination, public display and performance, and preparation of derivative works. You should assume that all textual, graphic, audio, and video materials are copyrighted, and maintain rights metadata for these materials, even for works you create and are the copyright holder. This data should include the author; dates of creation or publication; and any permissions you have or know of regarding the information, including general licensing (such as Creative Commons), user-specific licensing (such as a signed contract between you and the rights holder), or other legal authorizations (such as fair use).

While copyright will likely touch many materials you collect, you may not need to get permission to use the works in your project. Many countries provide specific exemptions for certain research and educational uses of copyrighted works. In the United States the fair use doctrine provides a broad and flexible framework that privileges many educational and scholarly uses of copyrighted works. "Facts" are generally free for reuse in the US. However, if your project involves the wholesale reproduction or dissemination of a third-party collection of data, then the European Union provide database protection statutes that apply.

Patents provide strong rights to the patent-holder against any unlicensed use of the patented invention. Patents can cover intangible inventions, such as algorithms or business methods. Dissemination of information about the invention is not protected. If your project includes the dissemination of actual patented inventions, such as software embodying patented algorithms or tangible products such as organisms embodying constructed DNA, then you should investigate the exceptions available for research use, and the licensing available.

Trademarks can include almost any kind of "mark", a logo or slogan or name, that identifies the source of a product to consumers. Trademark law is intended to protect consumers regarding source confusion. So use of trademarks should be done accurately to represent trademark owners and to make it clear that the trademark is the subject of the material, not the sponsor of it. For instance, a project involving assessment of a variety of trademarked substances could use those trademarks to refer to the relevant substances. So long as it is done in a way to minimize any confusion as to sponsorship, trademarks can be used to refer to their relevant products or services.

Legal Concerns - Licenses

Contracts and licenses shape numerous specific relationships on which information is shared. If you have access to information through a licensing arrangement, or an individually negotiated and signed agreement (such as a "non-disclosure agreement" or a "confidentiality agreement"), you are bound to the terms of those agreements. It is important when acquiring access to information that you consider the future uses you may wish to make of it and negotiate for those rights. If you would have rights under fair use or some other general legal exception, then it may be preferable to have your contract not mention a particular right or use at all, rather than have it provided in a very narrow way or excluded altogether.

Legal Concerns - Subject Rights

Rights of individuals whose information is stored in your collection vary across jurisdictions. If you are operating in the European Union a comprehensive privacy framework governs the kinds of information you can collect and what you can do with it. In the United States privacy laws are described as a "patchwork" and if you work with personally-identifiable information about individuals you should apprise yourself of federal and state laws. Prioritize this issue if you work with financial or medical information, or information about children.

Legal Concerns - Funding Agency Requirements

Since the passage of the Freedom of Information Act (FOIA) in 1999 federal funding agencies require grant recipients to make data produced during the awarded research available to the public. In order to ensure the availability of this data many federal funding agencies have emphasized the importance of data management and storage, encouraging grant applicants to plan for a sustainable model of access. In 2013 the White House Office of Science and Technology Policy has issues a directive in support of open access to research, requiring funding agencies "with more than $100M in R&D expenditures to develop plans to make the published results of federally funded research freely available to the public within one year of publication and requiring researchers to better account for and manage the digital data resulting from federally funded scientific research."

In recent years a number of private grant funding agencies have also implemented data sharing requirements. If you have a grant or are applying for a grant be sure to see what, if any, requirements they have. Here are the requirements for a few common US federal grant funding agencies:

National Science Foundation (NSF)
Requirements:

Data sharing:
"Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants."

National Institutes of Health (NIH)

Requirements:

  • Data sharing plan for grants over $500,000 

Sharing Research Resources

  • Published research papers must be open access according to the NIH Public Access Mandate. 

NIH Policy Statement

Data sharing:

NIH endorses the sharing of final research data to serve these and other important scientific goals and expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. "Timely release and sharing" is defined as no later than the acceptance for publication of the main findings from the final data set.

Centers for Disease Control and Prevention (CDC)

Requirements:

Data sharing:

CDC requires recipients for projects that involve the collection or generation of data with federal funds to develop, submit and comply with a Data Management Plan (DMP) for each collection or generation of public health data undertaken as part of the award and, to the extent appropriate, provide access to, and archiving/long-term preservation of, collected or generated data.

Legal Concerns - Publisher Requirements

Over the last decade many publishers have made policy moves to support open science with different levels of open data requirements. These requirements do vary by publisher and journal but do tend to fall into a tier form from encouraging the disclosure of if the data is available for sharing to the requirement that the data is shared in a publicly accessible repository online. When publishing your work make sure you know what the journal’s data sharing requirements are to plan accordingly.

Data Practices that Facilitate Reuse

It's important to store data in as open a format as possible, while still keeping the characteristics of the data intact. If data is held in a proprietary, platform-dependent format it is likely that these data will be inaccessible in the future, or will need to undergo a costly migration process before it is accessible and useable. It is better to convert data to an open, sustainable, platform-independent format such as CSV, or PDF/A.

The Library of Congress recommendations:

Date type Format More information
Data Sets CSV, DBF, CDF, HDF https://www.loc.gov/preservation/digital/formats/fdd/dataset_fdd.shtml
Text-based documents PDF, PDF/A, XML https://www.loc.gov/preservation/digital/formats/fdd/text_fdd.shtml
Still images TIFF, JP2 (JPEG2000) https://www.loc.gov/preservation/digital/formats/fdd/still_fdd.shtml

 

Sharing Within Your Research Team

You may want to share data with other team members working on different aspects of a project. Here are some considerations to aid you organizing and making sharing data easier.

Location: where is it stored?

Folder structure: how easily can you locate folders?

File naming standards: are files logically labeled?

Versioning: how do you know if this data set represents the latest data?

Formats: can this file be easily opened?

Responsibility: whose job is it to oversee this organizational structure? Who is documenting and performing quality assurance?

Communication: does everyone know where information is stored and how to access it?

Documentation and Back up: how do you record information and share information in the laboratory notebook? Are paper notebooks scanned? Where are these stored and how are they shared?

Sharing Outside Your Research Team

There are reasons, requirements, and advantages to sharing your research data outside of your team, but before doing so there are some groups and policies you should consult with before doing so.

  • Consult your university data policy/intellectual property policy
  • Consult funders' policies
  • Consult your PI
  • Consult your IRB office

Other things to consider when sharing your data:

Be aware of patient confidentiality - does the data set need to be anonymized?

Beware data ownership - do you have the right to share these data?

Timing - At which point do you share data?  What is your purpose for sharing data?  Do you share raw data or analyzed data?  Should data be "peer reviewed"?  How do you version this data?  Which data set will you submit to a repository?  What type/how much to share for different steps: publication, archival and preservation, retention requirements by funders, institution, IRB, etc.?

Traditionally, research papers have been published as a single package that contain both the presentation and analysis of ideas along with the supporting data. Due to a more recent shifts in requirement that all relevant data, including preliminary data (unpublished), be made publicly available at the time of publication or earlier in the research phase of the grant cycle. Some funding agencies to require preliminary data to be released for projects when data more rapidly advances research. Researchers face an increasingly competitive funding and publication environment and data produced from competitive grants is highly valued. The question of whether or not to share preliminary data can create tension between the data-producing scientists who have the expectation of the right to publish first and the scientists who wish to publish their own analysis of that data. Another reason data producers may seek to keep their data private is because these valuable data sets can also become a stepping-stone to the success of the next funding opportunity.

Citing Data

Why is it Important to Cite Data?

Acknowledgement of the use of someone else's information or work is a long-accepted practice in scholarly communication. It is important to cite not only the literature consulted but also the data files used, including your own. Plagiarism takes place when you use someone else's data or words and do not acknowledge that you have done so.

Citing data files in publications based on those data serves several purposes:

  • Provides appropriate credit to the data producers and publishers
  • Enables other researchers to access the data for their own use or to replicate research findings
  • Assists in measuring the impact of a data set by tracking references to it in the scientific literature and in online conversations ("altmetrics")
  • Helps data producers know how their data is being utilized

How to Cite Data?

Most citation formats have guidelines for how to cite data that typically will include:

  • Author/Creator(s): the creators of the data
  • Title: the title of the data set
  • Version: the exact version or edition of the data set
  • Publication Date: the date when the data set was published or released
  • Publisher/Archive: the data center or repository that is archiving and distributing the data
  • Identifier/Locator: URL or other locator for the data (e.g. a persistent URL such as a DOI or a handle)

For specifics reference your relevant citation style manual. Some more detailed basic information  about data citation can be found from the Digital Curation Centre: https://www.dcc.ac.uk/guidance/how-guides/cite-datasets

Data Citation Examples

Common Citation Style Examples

Here are a few Examples of data citations in APA, MLA, and Chicago format created by the University of Michigan.

APA (6th edition)

Minimum requirements based on instructions and example for dataset reference:

Milberger, S. (2002). Evaluation of violence against women with physical disabilities in Michigan, 2000-2001 (ICPSR version) [data file and codebook]. doi:10.3886/ICPSR03414

With optional elements:

Milberger, S. (2002). Evaluation of violence against women with physical disabilities in Michigan, 2000-2001 (ICPSR version) [data file and codebook]. Detroit: Wayne State University [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi:10.3886/ICPSR03414

MLA (7th edition)

Minimum requirements based on instructions and examples for books and web publications:

Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Inter-university Consortium for Political and Social Research, 2002. Web. 19 May 2011.

With optional elements:

Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State U [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2002. Web. 19 May 2011. doi:10.3886/ICPSR03414

Chicago (16th edition)

Bibliography style (based on documentation for books):

Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State University, 2002. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research, 2002. doi:10.3886/ICPSR03414.

Author-Date style:

Milberger, Sharon. 2002. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State University. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research. doi:10.3886/ICPSR03414.

License and Attributions