Kirjaston oppaat: Open and Responsible Research Guide: Research Data

Yhteystiedot

researchdata@oulu.fi

Research data management - what and why?

In this guide, research data refers to (digital) data that is used to answer research questions and on which the research results are based on. Research datasets can consist of various types of files that can be for example text, images, videos, numerical tables of measurements or even databases. However, charts, reports or publications derived from the data are not included here as research material unless they have as such served as material for the actual research such as in literature reviews or meta-analyses.

Research data management means taking care of the data. It also includes taking care of aspects related to data security, data protection, research ethics and law at all stages.

In the Declaration on Responsible Research, University of Oulu has committed to promoting responsible management of research data. Several research funders and publishers also have requirements related to research data management. More detailed information can be found in the funding calls and publishers' instructions.

Good practices in research data management are summarized in the so-called FAIR principles, according to which research data should be findable, accessible, interoperable and re-usable. Then

the data on which the research results are based on or its metadata is easy to access
data is easier to understand - and re-use, if allowed
the rights of everyone involved in the research data have been taken into account

CSC - It Center for Science provides a set of services (Fairdata) based on FAIR principles and is free of charge to researchers affiliated with Finnish research organizations under certain conditions (find out more here).

For taking FAIR principles into practice, research data should be cited utilizing their persistent identifiers that can be for example a DOI. Citations should be included both in the text and in the bibliography, if possible. Exact format of the citation depends on the publication channel, but in its simplest form it can be for example:
Doe, J. & Smith, J. (2020). Dataset that we collected. Version1. [Data set]. Zenodo. doi:10.5284/1234567.

In research publications, Data Availability Statement (DAS) lists utilized data sets, how to access them (preferably using links with peristent identifiers) and availability or possible restrictions. Publishers can have more detailed instructions, but it can be formatted for example like this:

Data availability:

The following dataset created for this work can be found in [name of the repository] at [persistent identifier of the data].

Research Data Management in a Nutshell

Planning

Create a data management plan and update it later if needed. University's data support can help you (researchdata@oulu.fi).

Personal data and confidential information

Find out whether your research data will contain personal data or other confidential information as early as possible. Please note that the subjects must be informed about the processing of personal data even before data collection and the data must be stored in a secure environment at all stages of your research. You can get more information from Data Protection Stewards at each faculty, University of Oulu's information security team (tietoturva@oulu.fi) and Campus ICT (ict@oulu.fi).

Rights

Agree on ownership, re-use and other rights for your data beforehand. You need to consider rights also if you use data created by others by reviewing the license information that defines re-use conditions and make sure you cite the data properly.

Storage location

Make sure your research data is stored in a secure location by limiting access rights if necessary. If you need help, Campus ICT (ict@oulu.fi) can help with selecting the storage location. Also CSC - IT Center for Science offers storage solutions that are free of charge for researchers who are affiliated with Finnish higher education institutions.

Files and folders

Use a logical folder structure, name files informatively, and try to make use of standards and commonly used file formats.

Processing

Before processing your research data, make sure that you keep the original so-called raw version so that you can return to it if necessary. If you need to edit your raw data, make notes of all steps. Editing must always be justified and should not distort the original data. Personal data can be anonymized (a person cannot be identified) or pseudonymized (a person cannot be identified without additional information) if necessary.

Metadata

As your research work progresses, metadata is automatically accumulating. Make sure to make organized notes of it. In its simplest form, metadata can be stored in a so-called "read me" text file. In some cases metadata can be obtained directly from research instruments. If you have a lot of metadata or you plan to publish it, it is recommended to store it in a data repository. Open access to metadata makes your data visible to not only to the scientific community but also to the general public. It is often possible to publish your metadata even when the actual research data cannot be shared. For storing and publishing metadata, you can use for example the CSC's Qvain service (tutorial video), that also creates a persistent identifier (a DOI or URN) for your metadata that you can use for unambiguous citations.

Archiving and sharing open data

Your research data may be needed later or you may want to share it with others. In this case you should consider storing your data in a data archive, or repository. Include metadata and sufficient documentation with description and explanations, so that your data can be understood. Granting open access to research data or parts of it is recommended if there are no reasons not to. You can specify the re-use terms with a license in the associated metadata. Research data can be archived, for example, in the CSC IDA service or other reliable repository.

More information:

Finnish Social Science Data Archive's Data Management Guidelines (in English)

Improve the quality and impact of your research through data management - A guide for making your data FAIR (in Finnish, Swedish and English)

CSC's services for research (in English)

CSC's Fairdata-services for researchers (in English)

CSC's Data management checklist (in English)

CSC's research data management self-study course (in English)

CSC's video on FAIR principles (English audio)

Open research data and methods: National policy and executive plan by the higher education and research community (in English)

Finnish Social Science Data Archive's guide on citing archival data (in English)

DCC's guide on how to cite datasets (in English)

Preparing a data management plan (DMP)

Data management plan includes planning of:

what kind of data will be used
how the data will be obtained
will new data be created or is data that has already been collected being re-used
how the data will be stored
what parts of the data can be made openly available for example in a repository
are there any ethical or legal concerns (such as GDPR) and how they will be taken into account

By making a data management plan, you can prepare in advance for potential problems and risks that you may encounter during a research project. The plan can also be updated as the project progresses. Data management plan does not include methods related to the scientific analysis of the data that are part of the research plan.

Check more detailed guidelines for a data management plan in each funding call. If you need help with preparing the plan, university's data support can assist you (researchdata@oulu.fi).

IMPORTANT: Some funders require a pre-approved data management plan. If you get a positive funding decision that requires pre-approval of your data management plan by our Data Stewards, please contact data support: researchdata@oulu.fi.

Those participating in creating research data have many kinds of rights and obligations and it is important to pay attention to them as early as possible when planning the project. This includes everyone involved in the creation or collection of the data. It is a good idea to agree on ownership, conditions for using the data and the possible sharing of the data for further use, preferably even before the data has been collected. Authors (or 'creators' in case of research data) need to be listed in a similar manner as for research publications, so everyone who contributed to producing the data should be mentioned appropriately. Roles can be discussed at the planning stage if needed.

Special attention should be paid to rights and obligations related to research data, for example considering

data protection
informing research participants
other discipline-specific obligations such as those in medical or human sciences
rights of indigenous peoples and CARE principles
spatial data on endangered species
company secrets etc.

If you plan to use data collected by others, make sure to check the license information so that you know possible limitations for re-using the data. Also make sure to give credit to the creators of the data by citing the data appropriately.

Research data can also be published as peer reviewed journal articles (data papers), where the contents of the research data as well as how it was created and processed is described in detail. Making a data article requires detailed notes, so it is good to prepare for this in advance already when making a data management plan. A detailed and thorough description of the data can greatly improve re-use value and understandability of the data.

A data publication does not include scientific analysis, discussion or conclusions, but it might be possible to publish them in a separate research publication. In this case the research publication itself does not need to contain a thorough description of the data, but should instead have a citation to the data publication where the details can be found. Publishers have different policies regarding data articles and publishing separate research publications using those data, so detailed guidelines need to be checked from each publisher.

More information:

Finnish Social Science Data Archive's Data Management Guide: Data Management Planning (in English)

Instructions for planning the management of sensitive and confidential data working group's additional instructions for planning the management of confidential and personal data (in English)

Data protection guide by the Office of the Data Protection Ombudsman (in English)

Finnish Social Science Data Archive's Data Management Guide: Informing research participants about the processing of their personal data (in English)

Data Protection in Research (Patio intranet) (in English)

University of Oulu Data Protection Policy (In English)

Research integrity and Ethics at the University of Oulu (in English)

Ethical Guidelines for Research Involving the Sámi People in Finland (in Finnish, Sámi and English)

Finnish Social Science Data Archive's Data Management Guide: copyright and agreements (in English)

Guidelines by the Finnish National Board on Research Integrity TENK (in English)

Finnish Social and Health Data Permit Authority Findata (in English)

Storing and further use of research data containing personal data (in English; Patio intranet)

A guide on re-using research data by OpenAIRE (in English)

Choosing where to store, organizing and naming files

Storage space for research data should be chosen carefully:

avoid storage media that can easily be lost or broken, such as USB sticks or laptop hard drives
make sure your data is backed up
if you move data between storage locations, you can use checksums to ensure integrity of the data
if your research involves personal data or otherwise confidential information, take it into account when choosing the storage solution and restrict access to the data if necessary
use of commonly used open file formats whenever possible, so that no specific device or software is required to open them

Here you can download more specific instructions about information processing at University of Oulu.

Name and organize folders and files logically. Here are some basic guidelines:

organize files into folders but take into account when naming the files that the location of the files may change later (e.g. do not use the same file name within folders with different names)
avoid special characters and spaces in the names of folders and files (for example space can be replaced by _)
if you are using dates, it is recommended to use year-month-date (YYYY-MM-DD) as this will sort the data correctly
avoid using confidential information in names of files or folders
always keep the so-called raw version of the data and save the modified versions separately (name the versions informatively or use a version control tool)

More information:

Finnish Social Science Data Archive's Data Management Guide: file formats and software(in English)

Finnish Social Science Data Archive's Data Management Guide: physical data storage (in English)

CSC's guidelines for digital preservation video about checksums (in English)

Guide for choosing storage space at University of Oulu (in English; Patio-intranet)

Metadata of research datasets and additional documentation

Metadata gives additional information about data and makes it more understandable. It is recommended that at least the following information is published as metadata in a data repository even if the data itself cannot be shared:

persistent identifier (such as DOI or URN) for the data
title and description (Please note that if you can only publish your metadata, make sure to include enough content in the desciption field to explain what your data is about)
list of creators (also recommended to use ORCID identifiers)
when the data or metadata was published

It is recommended to also include:

descriptive subject headings and keywords
license that defines conditions for re-use
whether access to the data is open or restricted

If the data is later deleted, metadata records will remain available in the repository to provide information for previous citations to the data.

Datasets are easier to understand of they are accompanied by additional documentation. Publishing additional documentation related to the collection of research data can greatly improve reproducibility of research and transparency of the research process. This documentation can accompany the actual data files for example in a form of "readme" text file that can include for example information on

order of the data files or folder structure
explanations on variables (e.g. abbreviations). units of measurement, how missing data should be interpreted etc.
laboratory or field work diaries

When sharing additional documentation, please note that it can also contain confidential information.

More information:

Guidelines for describing research data (in English)

Finnish Social Science Data Archive's Data Management Guide: Data description and metadata (in English)

University of Jyväskylä's guide to data documentation (in English)

Making a research project understandable: guide for data documentation (in English)

Metadata Standards Catalog by subject by the Research Data Alliance (in English)

Finto - Finnish Ontology and Vocabulary Service (in English)

BARTOC - Basic Register of Thesauri, Ontologies & Classifications (in English)

ELSST – European Language Social Science Thesaurus (in English)

LOV - Linked Open Vocabularies (in English)

OLS - Ontology Lookup Service for Biomedical Ontologies (in English)

Data repositories and long-term preservation

Open access to research metadata as well as the research data itself or parts of it whenever possible is important for reproducibility and transparency. Conditions for re-use of research data can be defined in a license that is included in the metadata record. Furthermore, metadata is key for improving findability of research data. Public metadata record also tells about existence and can provide valuable information on how the research was conducted even if the data itself cannot be shared.

Research data can be shared using for example in IT Center for Science (CSC)'s IDA-service or other trustworthy repository. It is important that the repository can create a persistent identifier for the data (such as DOI) that can be used for citations and that others can easily access the data. Some repositories can assign a persistent identifier already before the data is made available for others, making it possible to add the identifier in the publication at the manuscript stage. Persistent identifier should be used also for software code. For example in GitHub, it is possible to get a persistent identifier using their connection with the Zenodo repository (see guidelines here).

For publishing only metadata you can use for example CSC's Qvain service. Using this service you can store metadata for research data stored in the IDA service, or for a data set located elsewhere (including research data that cannot be shared). Instead of using the graphical user interface, you can also use the Metax-metadata API to share your metadata.

Metadata is stored in the same location when using the Qvain-service and the end user API for Metax, from where you can also link them to your national researcher profile.

There can be following differences between repositories:

general or discipline-specific
geographical location of the research data storage
support for versioning
backup
options for restricting access
metadata structure

In data repositories, research data can be searched using search functions that utilize content of the metadata records. Metadata is stored in a repository using an input form (or in some cases directly using an API), in which case they are stored in a structured format that computers can utilize. This format is the so-called metadata standard, or schema.

If there are discipline-specific metadata associated with research data that are not supported by the metadata standard of general repositories, it is recommended to search for a repository that has been developed specifically for the current research discipline. They can be browsed for example using the re3data.org service. Especially trustworthy discipline-specific repositories have earned a Core Trust Seal certificate. Alternatively, additional discipline specific metadata can be included as additional documentation with the data files in a general purpose repository, but those metadata are not findable by the repository search functions.

Examples of international general repository services:

Some research datasets can have especially long-term re-use value even for decades or hundreds of years. In this case, the data needs to remain usable and to be securely preserved, for example by making sure file formats are accessible and that the data remains intact.

Ministry of Education and Culture are offering Digital Preservation service for research data. The service is produced by CSC and it is free of charge to researchers affiliated with Finnish research organizations. For long term preservation to be possible, the researcher's home organization needs to have rights to manage the data. The research data must be accompanied by sufficient metadata and other documentation so that the data becomes as self-explanatory as possible. Data can be made openly accessible but it is not mandatory.

The suitability of the data for the service will be checked together with the data support team (researchdata@oulu.fi).

More information:

National services for digital preservation (In English)

Digital Preservation Service for Research Data (in English)