Skip to Main Content

Research Data Management

Storage, Backup, and Security

Storage, backup, and security are fundamental and interrelated components of a data management strategy and together ensure the ongoing integrity of research data. During the early planning stages of a project, researchers must ensure coordination for these three elements. For example, the choice of hardware for storage must be compatible with the subsequent choices for backup. Both primary and backup storage must have adequate security mechanisms in place. It is a continuum, from hard drives to automatic backup to encryption; the project planning and data management must account for all of it.

Storage

Data storage is fundamental to any research project. Without safe, reliable, or accessible storage, your research will not have a home. Storage refers to the media to which you save your data files and software. Planning for data storage is important because all storage media are vulnerable to risk and will likely become obsolete over time.

There are several different options for storing data, each with benefits and cautions to consider.

Personal Computers and Laptops

Personal computers and laptops make use of an internal hard drive to store all your data and system files. The internal hard drive is the most immediate storage option and when it is functioning normally, can quickly and reliably access the your files. These are convenient for storing data locally as you are working on it, but should not be the only storage system. Local drives can fail or PCs and laptops may be lost or stolen leading to loss of your data.

Network Storage

Network storage drives are typically managed by IT staff within your Schooland accessed using LAN or internet connections. Networked storage provides backup and security protocols that may be difficult implement individually. At the same time, networked storage may have restrictions on access or file size that may impact your research. If you are working on a cross-institutionally networked storage may not be a good solution for you.

External Storage Devices

An external hard drive sits outside of your computer and is connected via data cable. They store data like a hard drive, and most allow you to schedule automatic backups of data. External hard drives have many benefits including storage of old files, backing up important data, convenience, storage for copying and transferring. It also provides security, both through encryption and the simple detachment of the drive.

USB flash drives are typically removable and rewritable. They have a more compact shape, operate faster, hold much more data, can have a durable design, and operate more reliably due to their lack of moving parts. As with any removable storage media, physical labeling is important to identify what data the device(s) contain.

Cloud Storage (Remote Storage)

Cloud storage services provide users with an online system for storing and backing-up computer files. Using banks of servers located around the globe, these services store and synchronize data files and offer redundant backup services for users. Remote storage solutions are extremely convenient, but should not be the only storage solution in a data strategy. As third-party providers, commercial cloud storage and sharing services may have size, cost, or privacy limitations that could pose a risk to your data.  Be sure to read the fine print and not rely on commercial options solely for storing your data. Commercial web applications can be discontinued unexpectedly, and you will want to know what happens to your data in that scenario.  You want to know details about privacy and about how much storage you have, for how long, and for how much money.

Physical Storage

Just as you would name any digital data files according to a standard naming convention, labeling is critical for physical storage of data. Any analog materials, from paper hardcopies of survey data to refrigerated lab specimens, should be appropriately labeled with the minimum metadata required to correctly identify the item. This could include creator, date created, associated project and project files, and ownership information.

If moving, shipping, or storing analog materials, appropriate identification of any containers as well as the items within the containers will help to avert confusion if materials are misplaced or mishandled. Maintain a manifest of contents and label data down to the smallest discrete item.

Backup

Backup is an essential component of data management, mitigating the risks of accidental or malicious data loss. Backup allows you to restore your data in the event of a loss. Backup is important for all data, but particularly for research data that is unique or difficult to reproduce.

Examples of data loss:

  • Disasters (floods, fires)
  • Theft
  • Hardware or software malfunctions
  • Unauthorized access

A backup strategy is a plan for ensuring the accessibility of research data during the life of a project. What your strategy will be depends on the amount of data you are working with, the frequency that your data changes, and the system requirements for storing and rendering it. Consider your data storage and backup strategy before you start collecting and creating your data.

Store at Least Two Copies of Your Original Data

Best practice recommends that you store at least two copies of your original or master data files, an external locally-held copy and an external remote copy. Redundant storage kept in different geographic locations ensures that if a disaster occurs in one place, a copy of your data still exists.

Create an Appropriate Backup Routine for Your Project

Consider what kind of backup procedure to use. One way is to consider what would be required to restore research data in the event of data loss. Would you need just the data files themselves, the software that created them, or customized scripts written for data analysis? Depending on your research project, you may want to perform full, differential, or cumulative backups.

  • Full: A full backup will replicate all the files on your computer. Full backups take a long time and require the most storage space, but are also the most complete and can restore data quickly.
  • Differential Incremental: A differential incremental backup copies only those files that have changed since the last incremental or full backup. To run differential incremental backups you must first create a full backup as a point of reference. Incremental backups are fast and require the least storage space. However, restoring data using incremental backups is time consuming and requires each differential incremental backup made since the last full backup.
  • Cumulative Incremental:  A cumulative incremental backup copies only those files that have changed since the last full backup. A complete backup is created if no previous backup was done. Using a cumulative incremental backup procedure, you would need only two data sets to restore your files, your last full backup and your last differential backup.

Another important part of a backup strategy is the frequency with which you run backups. If you are making frequent or important changes to your data, you should backup your files on a daily basis. If you modify your data files less frequently, a longer backup schedule may be sufficient. If you are working from a networked computer, your central IT division may already have backup protocol in place. It is important to estimate the length of time that your data needs to be accessed and preserved and the amount of data that you will need to store over that time. These variables will determine your best choices for storage media and a backup strategy

Create Digital Surrogates of Analog Materials

If you are working with analog materials, consider making digital surrogates as backup copies to your original documents. Scanning paper lab notebooks, survey results, notes, or other printed material will ensure that you can restore the data in the event of data loss.

Test Your System

Always test your backup system. Once you have a storage and backup routine in place, go through the exercise of accessing the backup files to be sure that your procedure works and that you will be able to restore your data if you need to.

File Formatting

Equally important as the media to which you store your data, are the formats in which your data are stored. Researchers should choose software that is non-proprietary, in an open documented standard, and in common use. It should also be formatted in a standard representation. Consider these questions:

  • Who might be using it?
  • How will it be used differently in the future?
  • Is there a risk of data corruption, missing data or data loss?
  • Will there be application performance issues?
  • Will there be technical compatibility issues?
  • Does migration require downtime?

Data Migration

The rapid changes in software and hardware raise compatibility issues, even in a matter of a few years. Data migration is the process of translating data from one format to another, either to utilize a new computing system or as a mechanism to preserve data for the very long term.

Assigning Responsibility for Data Storage and Back Up

Responsibility for backup and storage of data will often be guided by hardware and software decisions. If a central service is selected, then that service should have frequent open communication with all parties. Regularly scheduled backups outlining when, who, and how the backups will occur should be conducted. This is another instance where the data can be lost if these consistent practices are not followed. If researchers and their hardware are more dispersed, then the individual is responsible for his or her own backup and storage. Multiple handling responsibilities can lead to unclear backup and storage plans. Large, cross-institutional or cross-departmental projects with multiple partners creating and managing data would benefit from shared storage and backup strategies with defined roles for all partners.

Security

There are different levels of security to consider for your research data.

  • Access: This refers to the mechanisms for limiting the availability of your data
  • Systems: This covers protecting your hardware and software systems
  • Data Integrity: This refers to the mechanisms for ensuring that your data is not manipulated in an unauthorized way
Protect Access to Your Data

Unique User ID/Password

Unique user IDs ensure that activity can be traced to specific individuals and can authorize access to a server and its data. A resource manager program uses unique user IDs for auditing and for checking authorization. User IDs and passwords are assigned to one person and one person only. Follow standard Password best practices as you would with other important computer systems. How passwords are stored in a given system is of critical importance with regard to security. Passwords should be stored in a system that employs some form of encryption. In the event that the server itself is compromised, there is a better chance that the passwords contained within it will not be compromised.

Access Through a Centralized System

Modern centralized storage solutions are often more reliable than dispersed individual machines. Centralized storage helps overcome the consequences of damaged hard disks by saving data on several drives. They can provide backup services, physical and virtual security, environmental controls, restoration services, and scalability.  Centralized data centers have dedicated staff that troubleshoot hardware and connectivity issues. If your institution provides a centralized data computer and storage option, determine if its services can meet your needs.

Role-based access rights

Role specific access levels to data, that grants limited privileges, can be assigned and used for your daily work. An administrator account grants full privileges to make changes, so if an unauthorized person or a virus performs some form of attack at the administrator level, your computer is vulnerable to more damage. Different levels of user account limits what an attacker can do.

Limitations of wireless devices to protect access

Mobile devices have differing issues than computers. One of the greatest hazard to tablet and phone security is loss or theft of the device itself. It is important to set a password for your device. As well Wi-Fi networks, particularly public Wi-Fi, can have security concerns, try to operate on an encrypted network and consider using a Virtual Private Network (VPN). Download apps from reputable or official app stores.

Protect your computer systems

Updated Anti-virus Software

Virus protection software should be updated daily and should be running in the background continuously.

Up-to-Date Versions of Software and Media Storage Devices

All key software packages should be updated regularly including OS, browsers, productivity packages, etc. Up-to-date software works in concert with virus protection software and will limit your computers' vulnerability.

Use of a Firewall

The function of a firewall is to block unwanted network traffic from reaching your computer or server, which reduces the threat of malevolent intrusion.

Use of Intrusion Detection Software

While a firewall functions to block unwanted traffic while permitting legitimate communications, intrusion detection software detects and alerts administrators about intrusion attempts, though it does not block these attempts.

Physical Access

Take standard physical security measure with your equipment. If possible confine servers to climate controlled, locked rooms accessible by trusted staff. Lock all physical equipment, materials, and spaces. Software lock your machines when not in use, and set a time outs for them.

Ensure Your Data's Integrity

Encryption

Encryption is coding information that cannot be read or deciphered unless someone has the decoding key. Encryption can be used for data in transit or data at rest in a storage medium.

Electronic Signature

An electronic signature is the electronic equivalent of paper based signing authority. The strongest electronic signature is called the standard electronic signature or digital signature. It is meant to ensure the authenticity of the signer and the document. Changes made to a document after it has been signed invalidate the signature thereby protecting the document from information tampering.

Watermarking

Watermarking embeds a digital marker for authorship verification and can alert someone of alterations made to data files. It is most often used with media and images. Watermarking software exists for tampering detection and to embed metadata.

License and Attributions