Research data support: researchdata@oulu.fi
Data Stewards at faculties.
In this guide, research data refers to (digital) data that is used to answer research questions and on which the research results are based on. Research datasets can consist of various types of files that can be for example text, images, videos, numerical tables of measurements or even databases. However, charts, reports or publications derived from the data are not included here as research material unless they have as such served as material for the actual research such as in literature reviews or meta-analyses.
Research data management means taking care of the data. It includes:
In the Declaration on Responsible Research, University of Oulu has committed to promoting responsible management of research data. Several research funders and publishers also have requirements related to research data management. More detailed information can be found in the funding calls and publishers' instructions.
Good practices in research data management are summarized in the so-called FAIR principles, according to which research data should be findable, accessible, interoperable and re-usable.
Then
CSC - It Center for Science provides a set of services (Fairdata) based on FAIR principles and is free of charge to researchers affiliated with Finnish research organizations under certain conditions (find out more here).
Finnish Social Science Data Archive's Data Management Guidelines (in English)
Improve the quality and impact of your research through data management - A guide for making your data FAIR (in Finnish, Swedish and English)
CSC's services for research (in English)
CSC's Fairdata-services for researchers (in English)
CSC's Data management checklist (in English)
CSC's research data management self-study course (in English)
CSC's video on FAIR principles (English audio)
Open research data and methods: National policy and executive plan by the higher education and research community (in English)
Create a data management plan and update it later if needed. University's data support can help you (researchdata@oulu.fi).
Find out whether your research data will contain personal data or other confidential information as early as possible. Please note that the subjects must be informed about the processing of personal data even before data collection and the data must be stored in a secure environment at all stages of your research. You can get more information from Data Protection Stewards at each faculty, University of Oulu's information security team (tietoturva@oulu.fi) and Campus ICT (ict@oulu.fi).
Agree on ownership, re-use and other rights for your data beforehand. You need to consider rights also if you use data created by others by reviewing the license information that defines re-use conditions and make sure you cite the data properly.
Make sure your research data is stored in a secure location by limiting access rights if necessary. If you need help, Campus ICT (ict@oulu.fi) can help with selecting the storage location. Also CSC - IT Center for Science offers storage solutions that are free of charge for researchers who are affiliated with Finnish higher education institutions.
Use a logical folder structure, name files informatively, and try to make use of standards and commonly used file formats.
Before processing your research data, make sure that you keep the original so-called raw version so that you can return to it if necessary. If you need to edit your raw data, make notes of all steps. Editing must always be justified and should not distort the original data. Personal data can be anonymized (a person cannot be identified) or pseudonymized (a person cannot be identified without additional information) if necessary.
As your research work progresses, metadata is automatically accumulating. Make sure to make organized notes of it. In its simplest form, metadata can be stored in a so-called "read me" text file. In some cases metadata can be obtained directly from research instruments. If you have a lot of metadata or you plan to publish it, it is recommended to store it in a data repository. Open access to metadata makes your data visible to not only to the scientific community but also to the general public. It is often possible to publish your metadata even when the actual research data cannot be shared. For storing and publishing metadata, you can use for example the CSC's Qvain service, that also creates a persistent identifier (a DOI or URN) for your metadata that you can use for unambiguous citations.
Your research data may be needed later or you may want to share it with others. In this case you should consider storing your data in a data archive, or repository. Include metadata and sufficient documentation with description and explanations, so that your data can be understood. Granting open access to research data or parts of it is recommended if there are no reasons not to. You can specify the re-use terms with a license in the associated metadata. Research data can be archived, for example, in the CSC IDA service or other reliable repository.
Data management plan includes planning of:
By making a data management plan, you can prepare in advance for potential problems and risks that you may encounter during a research project. The plan can also be updated as the project progresses. Data management plan does not include methods related to the scientific analysis of the data that are part of the research plan.
Check more detailed guidelines for a data management plan in each funding call. If you need help with preparing the plan, university's data support can assist you (researchdata@oulu.fi).
IMPORTANT: Some funders require a pre-approved data management plan. If you get a positive funding decision that requires pre-approval of your data management plan by our Data Stewards, please contact data support: researchdata@oulu.fi.
Finnish Social Science Data Archive's Data Management Guide: Data Management Planning (in English)
Those participating in creating research data have many kinds of rights and it is important to pay attention to them as early as possible when planning the project. This includes everyone involved in the creation or collection of the data. It is a good idea to agree on ownership, conditions for using the data and the possible sharing of the data for further use, preferably even before the data has been collected. Authors (or 'creators' in case of research data) need to be listed in a similar manner as for research publications. It is important to consider, however, that the author list may differ between the research publication and its associated research data.
Special attention should be paid to rights and obligations related to research data, for example considering
Data protection guide by the Office of the Data Protection Ombudsman (in English)
Data Protection in Research (Patio intranet) (in English)
Research integrity and Ethics at the University of Oulu (in English)
Ethical Guidelines for Research Involving the Sámi People in Finland (in Finnish, Sámi and English)
Finnish Social Science Data Archive's Data Management Guide: copyright and agreements (in English)
Guidelines by the Finnish National Board on Research Integrity TENK (in English)
Finnish Social and Health Data Permit Authority Findata (in English)
Storage space for research data should be chosen carefully:
Here you can download more specific instructions about information processing at University of Oulu.
Name and organize folders and files logically. Here are some basic guidelines:
Finnish Social Science Data Archive's Data Management Guide: file formats and software(in English)
Finnish Social Science Data Archive's Data Management Guide: physical data storage (in English)
CSC's guidelines for digital preservation video about checksums (in English)
Guide for choosing storage space at University of Oulu (in English; Patio-intranet)
You can search data repositories for available research data by for example using CSC's Etsin service. There you can also find metadata imported from The Language Bank of Finland and Finnish Social Science Data Archive. If you plan to use data collected by others, make sure to check the license information so that you know possible limitations for re-using the data. Also make sure to give credit to the creators of the data by citing the data appropriately. License type is chosen by those who have created the data and the information is included in the metadata record. CC BY is the recommended license type for research data so that principles of open science are taken into account.
A guide on how to choose a license for research data by OpenAIRE (in English)
A guide on re-using research data by OpenAIRE (in English)
Metadata gives additional information about data and makes it more understandable. It is recommended that at least the following information is published as metadata in a data repository even if the data itself cannot be shared:
It is recommended to also include:
If the data is later deleted, metadata records will remain available in the repository to provide information for previous citations to the data.
By including other documentation with the data makes it easier to understand. This documentation can be archived with the actual data files for example in a form of "readme" text file that can include for example information on
Finnish Social Science Data Archive's Data Management Guide: Data description and metadata (in English)
University of Jyväskylä's guide to data documentation (in English)
Making a research project understandable: guide for data documentation (in English)
Metadata Standards Catalog by subject by the Research Data Alliance (in English)
Finto - Finnish Ontology and Vocabulary Service (in English)
BARTOC - Basic Register of Thesauri, Ontologies & Classifications (in English)
ELSST – European Language Social Science Thesaurus (in English)
LOV - Linked Open Vocabularies (in English)
OLS - Ontology Lookup Service for Biomedical Ontologies (in English)
Open access to research metadata as well as the research data itself or parts of it whenever possible is important for reproducibility and transparency. Conditions for re-use of research data can be defined in a license that is included in the metadata record. Furthermore, metadata is key for improving findability of research data. Public metadata record also tells about existence and can provide valuable information on how the research was conducted even if the data itself cannot be shared.
Research data can be stored for example in IT Center for Science (CSC)'s IDA-service or other trustworthy repository. It is important that the repository can create a persistent identifier for the data (such as DOI or URN) that can be used for citations and that others can easily access the data. For storing and publishing metadata you can use for example CSC's Qvain service. Using this service you can store metadata for research data stored in the IDA service, or for a data set located elsewhere (including research data that cannot be shared). Instead of using the graphical user interface, you can also use the Metax-metadata API to store your metadata.
Metadata is stored in the same location when using the Qvain-service and the end user API for Metax, from where you can also link them to your national researcher profile.
There can be following differences between repositories:
In data repositories, research data can be searched using search functions that utilize content of the metadata records. Metadata is stored in a repository using an input form (or in some cases directly using an API), in which case they are stored in a structured format that computers can utilize. This format is the so-called metadata standard, or schema.
If there are discipline-specific metadata associated with research data that are not supported by the metadata standard of general repositories, it is recommended to search for a repository that has been developed specifically for the current research discipline. They can be browsed for example using the re3data.org service. Especially trustworthy discipline-specific repositories have earned a Core Trust Seal certificate. Alternatively, additional discipline specific metadata can be included as additional documentation with the data files in a general purpose repository, but those metadata are not findable by the repository search functions.
Examples of international general repository services:
Publishing additional documentation related to the collection of research data can greatly improve reproducibility of research and transparency of the research process. This additional documentation can often be published in data repositories by including a so-called "readme" file along the data files, where it is possible to describe the research data in a more free-form manner. When sharing additional documentation, note that the documentation might also contain confidential information.
Some research datasets can have especially long-term re-use value even for decades or hundreds of years. In this case, the data needs to remain usable and to be securely preserved, for example by making sure file formats are accessible and that the data remains intact.
Ministry of Education and Culture are offering Digital Preservation service for research data. The service is produced by CSC and it is free of charge to researchers affiliated with Finnish research organizations. For long term preservation to be possible, the researcher's home organization needs to have rights to manage the data. The research data must be accompanied by sufficient metadata and other documentation so that the data becomes as self-explanatory as possible.
The suitability of the data for the service will be checked together with the data support team (researchdata@oulu.fi).
Research data can also be published as peer reviewed journal articles, where the contents of the research data as well as how it was created and processed is described in detail. Making a data article requires detailed notes, so it is good to prepare for this in advance already when making a data management plan. A detailed and thorough description of the data can greatly improve re-use value and understandability of the data.
A data publication does not include scientific analysis, discussion or conclusions, but it might be possible to publish them in a separate research publication. In this case the research publication itself does not need to contain a thorough description of the data, but should instead have a citation to the data publication where the details can be found. Publishers have different policies regarding data articles and publishing separate research publications using those data, so detailed guidelines need to be checked from each publisher.
National services for digital preservation (In English)
Digital Preservation Service for Research Data (in English)
When utilizing research data (or software code), it should be cited appropriately. Data (or code) should get a persistent identifier (PID) that can be for example a DOI. It is recommended to use this identifier in citations, because then the exact dataset and version are easier to find. Some data repositories allow reserving an identifier already before the data is made available so that it can be included in the research publication already when preparing the manuscript.
Persistent identifier should also be used when citing software code. For example in GitHub, it is possible to get a persistent identifier for software code using their connection with the Zenodo repository (see guidelines here).
Citations should be included both in the text and in the bibliography, if possible. Exact format of the citation depends on the publication channel, but in its simplest form it can be for example:
Doe, J. & Smith, J. (2020). Dataset that we collected. [Data set]. Zenodo. doi:10.5284/1234567.
Data Availability Statement (DAS) where utilized data sets and their location and availability is described. Publishers can have more detailed instructions, but it can be formatted for example like this:
Data availability:
The following dataset created for this work can be found in [name of the repository] at [persistent identifier of the data].
Finnish Social Science Data Archive's guide on citing archival data (in English)
DCC's guide on how to cite datasets (in English)