Research Data Guide

Managing research data

Shortly about metadata

Reflect: If someone  is unfamiliar with your research data what would she/he need in order to find, evaluate, understand and reuse them?

Metadata is  structured  and machine readable information about information - e.g. about research data. Any data file in any format should have metadata fields as without metadata the data sets are meaningless.

Metadata can broadly be broken into

  • Descriptive metadata: Enables indexing, discovery and retrieval (e.g. title, subject, keywords)
  • Technical metadata: Describes how a dataset was produced, stuctured and how it should be used
  • Administrative  metadata: Enables access and management of data (e.g. rights, timestamp of transaction)
  • Structural (e.g. data directory)

Metadata describes e.g.:

  • who is the responsible researcher
  • when, where  and why was the data collected
  • what are the research methods used
  • how the research data should be cited
  • how to access the research data
  • what are the limitations for use
  • what are the file formats 
  • what are names of research data sets

The main purpose of metadata is to improve finding of research data and therefore it should be standardized, structured and machine and human readable. Metadata should be collected during the research process and the responsible person for metadata is the researcher.

Making a research project understandable - Guide for data documentation

"readme" style metadata

Create a a readme.txt file for the information about a data file and for ensuring that the data can be correctly interpreted. Prefer standards-based metadata if available. Consider:

  • Create one readme.txt file for each data file, whenever possible. Avoid  formats such as MS Word whenever possible. It is also appropriate to describe a "dataset" that has multiple, related, identically formatted files, or files that are logically grouped together for use (e.g. a collection of Matlab scripts).
  • When appropriate, also describe the file structure that holds the related data files.
  • Name the readme so that it is easily associated with the data file(s) it describes.
  • Format multiple readme files identically. Use the same order and terminology for information.
  • Write dates using standardized formats.
  • Follow the scientific conventions for your discipline for taxonomic, geospatial and geologic names and keywords. Whenever possible, use terms from standardized taxonomies and vocabularies.

Discipline specific standards

Climate and forecast

Ecology and evolution: Standard  file formats in ecology and evolution

  • Genbank files for nucleotide or peptide sequences
  • Nexus files for phylogenetic trees.

Language and linguistics

Software engineering and interfaces, data presentation, data communication, data interchange and archiving by physical media, access systems and interconnection, wireless proximity systems, multimedia

Statistical and social science data

Need support?

Metadata sources

Metadata standards

  • Dublin Core metadata standard: Originally fifteen generic, widely used elements (Creator, Contributor, Publisher, Title, Date, Language, Format, Subject, Description, Identifier, Relation, Source, Type, Coverage, and Rights)

  • DataCite Metadata Schema: Closely connected to the DOI system - is a list of core metadata properties chosen for the identification of a resource. Consist e.g. relation types to describe relations between RD (e.g. supplement to, version, part of, identical to etc.)

  • Disciplinary metadata

  • Metadata Standards by Subject / Research Data Alliance RDA

Vocabularies, ontologies and classifications

  • ELSST - a multilingual thesaurus

  • Finto: Finnish service for the publication and utilization of vocabularies, ontologies and classifications.

General standards