Research Data Guide

Managing research data

Shortly about metadata

If someone  is unfamiliar with your research data what would she/he need in order to find, evaluate, understand and reuse them?

Metadata is  structured information about information - e.g. about research data. Any data file in any format should have metadata fields as without metadata the data sets are meaningless.

Metadata can broadly be broken into

  • Descriptive metadata: Enables indexing, discovery and retrieval
  • Technical metadata: Describes how a dataset was produced, stuctured and how it should be used
  • Administrative  metadata: Enables access and management of data

Metadata describes:

  • who is the responsible researcher
  • when, where  and why was the data collected
  • what are the research methods used
  • how the research data should be cited
  • how to access the research data
  • what are the limitations for use
  • what are the file formats 
  • what are names of research data sets
  • etc

The main purpose of metadata is to improve finding of research data and therefore it should be standardized, structured and machine and human readable. Metadata should be collected during the research process and the responsible person for metadata is the researcher.

Metadata sources

Metadata standards

Vocabularies, ontologies and classifications

  • ELSST - a multilingual thesaurus

  • Finto: Finnish service for the publication and utilization of vocabularies, ontologies and classifications.

General standards

Discipline specific standards

Climate and forecast

Ecology and evolution: Standard  file formats in ecology and evolution

  • Genbank files for nucleotide or peptide sequences
  • Nexus files for phylogenetic trees.

Software engineering and interfaces, data presentation, data communication, data interchange and archiving by physical media, access systems and interconnection, wireless proximity systems, multimedia

Statistical and social science data

Need support?

"readme" style metadata

Create a a readme file for the information about a data file and for ensuring that the data can be correctly interpreted. Prefer standards-based metadata if available. Consider:

  • Create one readme file for each data file, whenever possible. It is also appropriate to describe a "dataset" that has multiple, related, identically formatted files, or files that are logically grouped together for use (e.g. a collection of Matlab scripts). When appropriate, also describe the file structure that holds the related data files.
  • Name the readme so that it is easily associated with the data file(s) it describes.
  • Write your readme document as a plain text file. Avoid  formats such as MS Word whenever possible.
  • Format multiple readme files identically. Use the same order and terminology for information.
  • Write dates using standardized formats.
  • Follow the scientific conventions for your discipline for taxonomic, geospatial and geologic names and keywords. Whenever possible, use terms from standardized taxonomies and vocabularies.

Details of data description

Title: Name of the dataset or research project that produced it.
Creator: Names and addresses of the organization or people who created the data.
Identifier: Number used to identify the data, even if it is just an internal project reference number.
Subject: Keywords or phrases describing the subject or content of the data.
Funders: Organizations or agencies who funded the research.
Rights: Any known intellectual property rights held for the data.
Access information: Where and how your data can be accessed by other researchers. More e.g. in Avoin tiede ja tutkimus -hanke. Oikeuksien hallintaan liittyvät metatiedot -selvitys (in Finnish only)
Language : Language(s) of the intellectual content of the resource, when applicable.
Dates: Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule.
Location: Where the data relates to a physical location, record information about its spatial coverage.
Methodology: How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook.
Data processing: Along the way, record any information on how the data has been altered or processed.
Sources: Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed.
List of file names: List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', '').
File Formats: Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data.
File structure: Organization of the data file(s) and the layout of the variables, when applicable.
Variable list: List of variables in the data files, when applicable.
Code lists: Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data').
Versions: Date/time stamp for each file, and use a separate ID for each version (see file organization). Read e.g. Kielipankin aineistojen elinkaari- ja kuvailumalli
Checksums: To test if your file has changed over time (see data backup)

Source: MIT Libraries

Questions and answers

line above footer

Oulun yliopiston kirjasto
PL 7500
Vaihde: 0294 480000