Skip to Main Content

Research Data Management

Who Owns Data?

It depends…

  • Who is funding your research?
  • What is your institution’s data ownership policy?
  • Are you working under a grant or contract?
  • Are you the PI or student working under a PI in a lab?

Data ownership can be very complex. Institutional policies can help determine who owns data produced at an institution and under what circumstances someone may take data, share data, or publish data. Data is seen as assets and therefore research stakeholders (institutions, funders, scientists, etc.) want to protect their intellectual property rights and there are issues to be aware of such as licensing.  Before starting a project have an idea of the ownership issues related to any resulting research products, including data. The question of "who owns data?" can pertain to the individual or the entity that has the legal rights to the data and can retain the data after the completion of the project.

Date ownership can also depend on who funds the research. Funders sponsor research for a variety of reasons:

  • Government agencies fund research to improve the general health and welfare of society
  • Philanthropic organizations are interested in advancing particular causes
  • Private funders are interested in profits, along with benefits to society

These different reasons often determine who claims ownership of research data.

Federally funded grants

In most cases with federally funded research, the government gives the research institution the right to use data collected with public funds as an incentive to put research to use for the common good. Thus the research institution owns the data but allows the principal investigator on the grant to be the steward of the data. The PI may control the course, publication, and copyright of any research, subject to institutional review. Graduate students, postdocs, or faculty involved in performing research on a particular grant would be wrong to assume that they own the data that they are collecting. The PI takes responsibility for the collection, recording, storage, retention and disposal of data.

Data and lab notebooks collected by students and research fellows for a research project belong to the grantee institution. Students should not take the data with them when they leave unless they have made appropriate arrangements with the project PI.

Retaining copies of data might be allowed with permission. When the PI faculty member leaves the grantee institution, they must negotiate with the institution to keep their grants and data. Many universities have offices and policies in place to ensure that such transfers respect both the rights of the researcher and of the institution(s).

Is it a grant or a contract?

With government funding, researchers should distinguish between grants and contracts. Under grants, researchers must carry out the research and submit reports, but control of the data remains with the institution that received the funds. With contracts, the researcher is required to deliver a product or service, which usually is then controlled by the government. If research is supported with government funds, make sure to know whether it is a grant or a contract as this significant difference will determine who can publish and use your data.

Private funding companies

Private funders may seek to retain the rights for commercial use of the data.

Philanthropic organizations

Their policies can vary. They may retain or give away ownership rights depending on their interests

Legal Concerns Related to Data

Intellectual property is an intangible asset. Intellectual property generated in an academic setting usually involves copyrights, trade secrets, and patents.

The creation of intellectual property is an outcome of research conducted at universities. As it benefits both the university and society, universities develop policies and procedures relating to the ownership, use, management, and compensation for intellectual properties created with their resources. Because policies vary by institution, be sure to familiarize yourself with your institution's policies. In cases where government funded research data is protected by intellectual property rights, rights holders should facilitate data access for the benefit of public research

Data can be licensed so you need to think about data licensure both as a creator of data and as a user of others' data. For your research project, you will need to articulate how you will be providing permissions or licensing to your data or copyrighted works from your project. For others' data, you need to obtain permissions and cite the data appropriately.

Copyright

A copyrightable work is an original creative work set in a tangible format that is covered by the copyright laws of the United States or other countries. An important point to consider is that in the United States, while data and facts cannot be copyrighted, creative expressions of data, such as a chart or a table in a publication ARE copyrightable. In addition, be aware that in certain foreign jurisdictions such as the European Union, database compilations, including factual data, ARE protected by law. Databases are generally protected by copyright law and are referred to as "compilations", a collection of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship. The individual data within the database may or may not be protected by copyright; however, the selection and/or arrangement of data as a whole will be protected by copyright if it contains enough creative, original expression. With only limited protection through copyright law, database developers generally protect their databases by using a legal contract so that users must comply with wishes of the copyright owner as to how that data may be accessed and used.

For more information on copyright and intellectual property see our other guides:

Creative Commons (CC)

Creative Commons licenses are legal tools that authors can use to set the default legal permissions for using their data. By easing some of the restrictive language of traditional copyright, Creative Commons licenses have facilitated open sharing and reuse of data. The CC0 (Creative Commons Zero license) is strongly recommended for scientific data and has been widely adapted for releasing data into the public domain. Scientists who apply CC0 licenses to their databases agree to waive any copyright interests in the work and place them as completely as possible in the public domain, enabling others to freely build upon, enhance or reuse the works for any purposes without restriction. For more information on Creative Commons see the linked guides above.

Data Retention Legalities

Data retention refers to the length of time one needs to keep (or securely destroy) a project's data according to the institutional (IRB), funder, state and/or federal guidelines. Data retention can be very complex so researchers should consult their IRB and institutional policies regarding the retention and secure destruction of data. Data retention also depends on a number of overlapping agencies and on factors like whether or not the data contains patient health information (HIPAA) or other identifiers. From a legal and ethics perspective, researchers would want to know what the minimum years for retention would be and how long they would need to retain data documentation for any audits or misconduct investigations.

Ethical Considerations for Sharing

Any research institution that accepts federal funding is required by law to have policies in place to oversee its research programs. These policies include monitoring conflicts of interest, reporting misconduct, and ensuring safety regulations are followed. They also establish standing committees to review human (Institutional Review Board or IRB) and animal (Institutional Animal Care and Use Committee) research protocols.

The purpose of an IRB is to protect the rights and welfare of those individuals who contribute to the research process by participating as subjects. The IRB also protects the institution and the researcher by ensuring that those individuals considering being part of a research study are adequately informed before consenting to participate, and that participants are not exposed to excessive risk.

In the context of data management the IRB has three roles. First, since funders often now ask to see data management plans, members of the IRB look more closely at these plans to see if adequate thought has been given to the plan and if what is written is feasible. Second, the IRB reviews data collection forms to limit the amount of personal identifiable information that is being collected. Third, the IRB reviews the research protocol to see how the data will be safeguarded. This includes documenting who will have access to the data collected, and under what conditions, sometimes called the privacy or confidentiality rules. These rules need to consider who will have access to the data technically, physically and for administrative purposes.

There are federal and state rules and regulations regarding data security for specific types of data. For instance, personal identifiable data, such as names and social security numbers, are protected by many state and federal laws. At the federal level, health data are protected by the federal Health Insurance Portability and Accountability Act (HIPAA), student data are protected by the federal Family Education Rights and Privacy Act (FERPA), and financial data are protected by the federal Financial Services Modernization Act (FSMA).

As researchers work to collect and analyze data they must ask themselves if each piece of data is necessary to address the original research question or hypothesis and if the data element in combination with other data could identify an individual. For example, age alone may not identify a person, but age in conjunction with zip code and medical condition may lead to identification. To protect confidentiality in these instances, researchers should not collect the data at all, or if it is crucial, should substitute the actual data with codes known only to the primary researcher.

Privacy levels required by funding agencies and publishers

Each funding agency and publisher has guidelines for maintaining privacy regarding human and animal subjects, as exemplified in this guideline from the National Institutes of Health (NIH):

Data should be redacted to strip all individual identifiers, and effective strategies should be adopted to minimize risk of disclosing a participant's identity. Options to protect privacy include: withholding part of the data, statistically altering the data in ways that will not compromise secondary analyses, requiring researchers who seek data to commit to protect privacy and confidentiality, and providing data access in a controlled site, sometimes referred to as a data enclave. Some investigators use hybrid methods, releasing a redacted dataset for general use but providing access to more sensitive data through a user contract or data enclave. In most instances, sharing data is possible without compromising participant confidentiality and privacy.

License and Attributions