Metadata and documentation


Researchers should make sure that sufficient documentation or metadata (i.e. information about the data) is created and maintained to enable research data to be found, used and managed throughout its lifecycle.

Documentation and metadata requirements will differ depending on the discipline and the nature of the research. They should be identified during data management planning and adopted for use by all the researchers working on the project.

What is data documentation?

Data documentation explains how data has been created or digitised, what the data means, what its content and structure is and any data manipulations that may have taken place. Data documentation provides provenance or context for the data so that the data can be understood in the long term.

It may include information such as:

  • why the data was collected
    e.g. information about the research project aims and objectives
  • how data was collected
    e.g. instruments and processes used, hardware and software used
  • how data is structured
    e.g. names, labels and descriptions for data elements, and any rules relating to the values that are in them (coding schemes, classification schemes)
  • quality control measures, and any modifications to the data over time
    e.g. using standardised methods and protocols for capturing observations, alongside recording forms with clear instructions.
  • confidentiality and consent agreements
    e.g. details of restrictions relating to confidentiality and consent and how these affect data re-use
  • listings of data objects
  • any other information aimed at helping data users to analyse and interpret the data
    e.g. user guides or manuals

Examples of data documentation:

  • laboratory notebooks
  • experimental protocols
  • codebooks
  • data dictionaries ( a thesaurus of commonly used terms and what that term describes)
  • software syntax and output files
  • information about equipment settings and instrument calibration
  • database schema
  • readme files
  • methodology reports
  • provenance information about sources of derived data.

Why document data?

People may want to examine or use research data - to understand or verify research findings, to review submitted publications, to replicate research results, to design a similar study, or to archive data for access and reuse.

Data documentation assists with:

  • making data discoverable
  • identifying the data
  • associating the data with its owners and creators
  • creating links between the data and other related data or publications
  • providing context for the data, e.g. by placing its creation at a certain time and place
  • enabling data quality assessment, and validation of research results
  • enabling a research group to re-use their own research efficiently into the future e.g. abbreviations in a column header may make sense at the time, but can be easily forgotten post-project.