Data is often thought of in quantitative terms. Much research data is indeed quantitative, but 'number' and 'data' are not synonymous. Data is typically considered in two broad categories: quantitative and qualitative.
Quantitative data can include experimental measurements, e.g. lab instrument data, sensor readings, survey results, and test/simulation models. Qualitative data can include text, audio, images, and video. Some definitions of data are quite broad, and include objects such as laboratory specimens.
Some types of research data and data files are fairly ubiquitous and can be found distributed across disciplines:
Research data in Social Sciences can include:
Research data in Hard Sciences can include:
Stages of Data Related to Research Data Life Cycle
These file format characteristics ensure the best chances for long-term access:
Here are some examples of preferred formats for various data types (from https://lib.stanford.edu/data-management-services/file-formats):
For a list of common file formats and evaluations of format quality and long-term sustainability see http://www.digitalpreservation.gov/formats/fdd/browse_list.shtml
Some archives specify the optimal data formats they use for long-term preservation of data.
Proprietary systems and file formats can resist attempts at data integration, reuse, and sharing. These barriers can often be addressed by converting proprietary formats to open formats. The protocols and solutions for doing so can be discipline-specific (e.g. https://docs.openmicroscopy.org/bio-formats/6.9.0/supported-formats.html), but some general guidelines apply.
Information can be lost when converting file formats. When data is converted from one format to another - through export or by using data translation software - certain changes may occur to the data:
After conversion data should be checked for errors or changes.
To mitigate the risk of lost information:
Data documentation explains the who/what/where/when/why of data:
Good data documentation helps you, the researcher. Clear documentation makes it easier to interpret your findings later, helps facilitate collaboration, sharing, and reuse, and can also help ensure successful long-term preservation of your research findings.
Data documentation practices vary by discipline. These methods include lab notebooks data dictionaries and codebooks in the social sciences, and well-documented/commented code for computer science (or for really any project that uses code and/or scripting).
While formats and methods for documentation differ, the general idea is always to describe:
Note that for collaborative research projects, it’s important to come to some agreement among members of the project team that will help ensure consistent data documentation practices by all.
No matter what, you need to have:
Why Use File Naming Conventions?
Naming conventions make life easier.
What File Naming Convention Should I Use?
Has your research group established a convention?
If not, general guidelines include:
Much of this guide was adapted from the New England Collaborative Data Management Curriculum.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.