Siirry pääsisältöön

Research Data Management

University of Oulu Library's Guide to Research Data Management

What is research data management?


In this guide, research data refers to (digital) data that is used to answer research questions and on which the research results are based on. Research datasets can consist of various types of files that can be for example text, images, videos, numerical tables of measurements or even databases. However, charts, reports or publications derived from the data are not included here as research material unless they have as such served as material for the actual research such as in literature reviews or meta-analyses.

Research data management means taking care of the data. It includes:

  • planning
  • obtaining new data or re-using data that has already been collected
  • storage, organizing, naming files
  • processing prior to analysis
  • archiving and sharing research data and it's associated metadata
  • decision on for how long the data needs to be stored
  • consideration of research ethics and legal aspects at all stages

Why is it worth investing in research data management?


In the Declaration on Responsible Research, University of Oulu has committed to promoting responsible management of research data. Several research funders and publishers also have requirements related to research data management. More detailed information can be found in the funding calls and publishers' instructions.

Good practices in research data management are summarized in the so-called FAIR principles, according to which research data should be findable, accessible, interoperable and re-usable.

Graphican presentation that in the word FAIR-principles F is for findable, A is for accessible, I is for interoperable, R is for re-usable

Then

  • the data on which the research results are based on or its metadata is easy to access
  • data is easier to understand - and re-use, if possible
  • the rights of everyone involved in the research data have been taken into account

CSC - It Center for Science provides a set of services (Fairdata) based on FAIR principles and is free of charge to researchers affiliated with Finnish research organizations under certain conditions (find out more here).

Research data management in a nutshell:


Planning

Create a data management plan and update it later if needed. University's data support can help you (researchdata@oulu.fi).

Personal data and other confidential information

Find out whether your research data will contain personal data or other confidential information as early as possible. Please note that the subjects must be informed about the processing of personal data even before data collection and the data must be stored in a secure environment at all stages of your research. You can get more information from Data Protection Stewards at each faculty, University of Oulu's information security team (tietoturva@oulu.fi) and Campus ICT (ict@oulu.fi).

Rights

Agree on ownership, re-use and other rights for your data beforehand. You need to consider rights also if you use data created by others by reviewing the license information that defines re-use conditions and make sure you cite the data properly.

Storage location

Make sure your research data is stored in a secure location by limiting access rights if necessary. If you need help, Campus ICT (ict@oulu.fi) can help with selecting the storage location. Also CSC - IT Center for Science offers storage solutions that are free of charge for researchers who are affiliated with Finnish higher education institutions.

Files and folders

Use a logical folder structure, name files informatively, and try to make use of standards and commonly used file formats.

Processing

Before processing your research data, make sure that you keep the original so-called raw version so that you can return to it if necessary. If you need to edit your raw data, make notes of all steps. Editing must always be justified and should not distort the original data. Personal data can be anonymized (a person cannot be identified) or pseudonymized (a person cannot be identified without additional information) if necessary.

Metadata

As your research work progresses, metadata is automatically accumulating. Make sure to make organized notes of it. In its simplest form, metadata can be stored in a so-called "read me" text file. In some cases metadata can be obtained directly from research instruments. If you have a lot of metadata or you plan to publish it, it is recommended to store it in a data repository. Open access to metadata makes your data visible to not only to the scientific community but also to the general public. It is often possible to publish your metadata even when the actual research data cannot be shared. For storing and publishing metadata, you can use for example the CSC's Qvain service (tutorial video), that also creates a persistent identifier (a DOI or URN) for your metadata that you can use for unambiguous citations.

Archiving and open access

Your research data may be needed later or you may want to share it with others. In this case you should consider storing your data in a data archive, or repository. Include metadata and sufficient documentation with description and explanations, so that your data can be understood. Granting open access to research data or parts of it is recommended if there are no reasons not to. You can specify the re-use terms with a license in the associated metadata. Research data can be archived, for example, in the CSC IDA service or other reliable repository.

Preparing a data management plan (DMP)


Data management plan includes planning of:

  • what kind of data will be used
  • how the data will be obtained
  • will new data be created or is data that has already been collected being re-used
  • how the data will be stored
  • what parts of the data can be made openly available for example in a repository
  • are there any ethical or legal concerns (such as GDPR) and how they will be taken into account

By making a data management plan, you can prepare in advance for potential problems and risks that you may encounter during a research project. The plan can also be updated as the project progresses. Data management plan does not include methods related to the scientific analysis of the data that are part of the research plan. 

Check more detailed guidelines for a data management plan in each funding call. If you need help with preparing the plan, university's data support can assist you (researchdata@oulu.fi).

IMPORTANT: Some funders require a pre-approved data management plan. If you get a positive funding decision that requires pre-approval of your data management plan by our Data Stewards, please contact data support: researchdata@oulu.fi.

 

Rights and ethical considerations regarding research data


Those participating in creating research data have many kinds of rights and it is important to pay attention to them as early as possible when planning the project. This includes everyone involved in the creation or collection of the data. It is a good idea to agree on ownership, conditions for using the data and the possible sharing of the data for further use, preferably even before the data has been collected. Authors (or 'creators' in case of research data) need to be listed in a similar manner as for research publications. It is important to consider, however, that the author list may differ between the research publication and its associated research data.

Special attention should be paid to rights and obligations related to research data, for example considering

  • data protection
  • informing research participants
  • other discipline-specific obligations such as those in medical or human sciences
  • rights of indigenous peoples
  • spatial data on endangered species
  • company secrets etc.

Choosing storage space


Storage space for research data should be chosen carefully:

  • avoid storage media that can easily be lost or broken, such as USB sticks or laptop hard drives
  • make sure your data is backed up
  • if you move data between storage locations, you can use checksums to ensure integrity of the data
  • if your research involves personal data or otherwise confidential information, take it into account when choosing the storage solution and restrict access to the data if necessary
  • use of commonly used open file formats whenever possible, so that no specific device or software is required to open them

Here you can download more specific instructions about information processing at University of Oulu.

Organizing and naming files


Name and organize folders and files logically. Here are some basic guidelines:

  • organize files into folders but take into account when naming the files that the location of the files may change later (e.g. do not use the same file name within folders with different names)
  • avoid special characters and spaces in the names of folders and files (for example space can be replaced by _)
  • if you are using dates, it is recommended to use year-month-date (YYYY-MM-DD) as this will sort the data correctly
  • avoid using confidential information in names of files or folders
  • always keep the so-called raw version of the data and save the modified versions separately (name the versions informatively or use a version control tool)

Re-use of research data and licensing


You can search data repositories for available research data by for example using CSC's Etsin service. There you can also find metadata imported from The Language Bank of Finland and Finnish Social Science Data Archive. If you plan to use data collected by others, make sure to check the license information so that you know possible limitations for re-using the data. Also make sure to give credit to the creators of the data by citing the data appropriately. License type is chosen by those who have created the data and the information is included in the metadata record. CC BY is the recommended license type for research data so that principles of open science are taken into account.

Research metadata and documentation


Metadata gives additional information about data and makes it more understandable. It is recommended that at least the following information is published as metadata in a data repository even if the data itself cannot be shared:

  • persistent identifier (such as DOI or URN) for the data
  • title and description
  • list of creators (also recommended to use ORCID identifiers)
  • when the data or metadata was published

     It is recommended to also include:

  • descriptive subject headings and keywords
  • license that defines conditions for re-use
  • whether access to the data is open or restricted

If the data is later deleted, metadata records will remain available in the repository to provide information for previous citations to the data. 

By including other documentation with the data makes it easier to understand. This documentation can be archived with the actual data files for example in a form of "readme" text file that can include for example information on

  • order of the data files or folder structure
  • explanations on variables (e.g. abbreviations). units of measurement, how missing data should be interpreted etc.
  • laboratory or field work diaries

Sharing research data and associated metadata and storing in a repository


Open access to research metadata as well as the research data itself or parts of it whenever possible is important for reproducibility and transparency. Conditions for re-use of research data can be defined in a license that is included in the metadata record. Furthermore, metadata is key for improving findability of research data. Public metadata record also tells about existence and can provide valuable information on how the research was conducted even if the data itself cannot be shared.

Research data can be stored for example in IT Center for Science (CSC)'s IDA-service or other trustworthy repository. It is important that the repository can create a persistent identifier for the data (such as DOI or URN) that can be used for citations and that others can easily access the data. For storing and publishing metadata you can use for example CSC's Qvain service. Using this service you can store metadata for research data stored in the IDA service, or for a data set located elsewhere (including research data that cannot be shared). Instead of using the graphical user interface, you can also use the Metax-metadata API to store your metadata.

Metadata is stored in the same location when using the Qvain-service and the end user API for Metax, from where you can also link them to your national researcher profile.

 

Choosing a repository


There can be following differences between repositories:

  • general or discipline-specific
  • geographical location of the research data storage
  • support for versioning
  • backup
  • options for restricting access
  • metadata structure

In data repositories, research data can be searched using search functions that utilize content of the metadata records. Metadata is stored in a repository using an input form (or in some cases directly using an API), in which case they are stored in a structured format that computers can utilize. This format is the so-called metadata standard, or schema.

If there are discipline-specific metadata associated with research data that are not supported by the metadata standard of general repositories, it is recommended to search for a repository that has been developed specifically for the current research discipline. They can be browsed for example using the re3data.org service. Especially trustworthy discipline-specific repositories have earned a Core Trust Seal certificate. Alternatively, additional discipline specific metadata can be included as additional documentation with the data files in a general purpose repository, but those metadata are not findable by the repository search functions.

Examples of international general repository services:

 

Publishing additional documentation


Publishing additional documentation related to the collection of research data can greatly improve reproducibility of research and transparency of the research process. This additional documentation can often be published in data repositories by including a so-called "readme" file along the data files, where it is possible to describe the research data in a more free-form manner. When sharing additional documentation, note that the documentation might also contain confidential information.

Digital preservation service for research data (DPS)


Some research datasets can have especially long-term re-use value even for decades or hundreds of years. In this case, the data needs to remain usable and to be securely preserved, for example by making sure file formats are accessible and that the data remains intact.

Ministry of Education and Culture are offering Digital Preservation service for research data. The service is produced by CSC and it is free of charge to researchers affiliated with Finnish research organizations. For long term preservation to be possible, the researcher's home organization needs to have rights to manage the data. The research data must be accompanied by sufficient metadata and other documentation so that the data becomes as self-explanatory as possible.

The suitability of the data for the service will be checked together with the data support team (researchdata@oulu.fi).

Data publications (data articles)


Research data can also be published as peer reviewed journal articles, where the contents of the research data as well as how it was created and processed is described in detail. Making a data article requires detailed notes, so it is good to prepare for this in advance already when making a data management plan. A detailed and thorough description of the data can greatly improve re-use value and understandability of the data.

A data publication does not include scientific analysis, discussion or conclusions, but it might be possible to publish them in a separate research publication. In this case the research publication itself does not need to contain a thorough description of the data, but should instead have a citation to the data publication where the details can be found. Publishers have different policies regarding data articles and publishing separate research publications using those data, so detailed guidelines need to be checked from each publisher.

Citing research data and using persistent identifiers


When utilizing research data (or software code), it should be cited appropriately. Data (or code) should get a persistent identifier (PID) that can be for example a DOI. It is recommended to use this identifier in citations, because then the exact dataset and version are easier to find. Some data repositories allow reserving an identifier already before the data is made available so that it can be included in the research publication already when preparing the manuscript.

Persistent identifier should also be used when citing software code. For example in GitHub, it is possible to get a persistent identifier for software code using their connection with the Zenodo repository (see guidelines here).

Citations should be included both in the text and in the bibliography, if possible. Exact format of the citation depends on the publication channel, but in its simplest form it can be for example: 
    Doe, J. & Smith, J. (2020). Dataset that we collected. [Data set]. Zenodo. doi:10.5284/1234567.

Data Availability Statement (DAS) where utilized data sets and their location and availability is described. Publishers can have more detailed instructions, but it can be formatted for example like this:

         Data availability:

         The following dataset created for this work can be found in [name of the repository] at [persistent identifier of the data].