Metadata

Metadata is data about data: in a broad sense, metadata is all the information that you provide about your project, dataset, variables, code, etc. Read some nice examples here. Providing metadata is incredibly important, since metadata makes data:

Without metadata, a lot of data are just numbers that cannot be interpreted.

Source: https://dataedo.com/kb/data-glossary/what-is-metadata

Example research metadata

  1. A project readme containing the information below. Often in a readme.txt. Find an example template here or use the information below:
    1. Creator (PI): name and affiliation of PI
    2. Title: project title
    3. Funding sources: names of funders, incl. grant numbers and related acknowledgements
    4. Data collector/producer: who is responsible for data collection + date and location of data production
    5. Description: project description, incl. relevant publications
    6. Sample and sampling procedures: target population and methods to sample it (or link to document describing this), retention rates for longitudinal studies
    7. Coverage: topics, time period and location covered
    8. Source: if relevant, citations to original source from which data were obtained
  2. Metadata for a specific data file, containing, for example, file description, data format, relationship with other files, date of creation and versioning information, etc. This can be a readme.txt or other filetypes, such as nameofdatafile.json or nameofdatafile.xml
  3. A codebook (data dictionary), which specifies what all variables in your dataset mean. See the codebook chapter for more information.
    1. Question wording or meaning
    2. Variable text: question text or item number
    3. Respondent: who was asked the question?
    4. Meaning of codes: interpretation of the codes assigned to each variable
    5. Missing data codes, e.g., 999
    6. Summary statistics for both valid and missing cases
    7. Imputation and editing: identify data that have been estimated or extensively edited
    8. Constructed and weight variables: how were they constructed
    9. Location in the data file: field or column location, if relevant
    10. Variable groupings: if you categorize variables into conceptual groupings
  4. Metadata in systems, such as a data repository. This type of metadata is often enforced and interoperable so that you don't have to manually create this type of metadata.

Interoperable metadata

Metadata standards

Metadata standards are frameworks for metadata fields. They describe how metadata fields should be formatted, so that they will become machine-readable and therefore interoperable. An enormous amount of metadata standards is available which all differ per discipline, but the best known metadata standards for the social sciences are:

As an individual researcher, you are often not directly confronted with these standards. It is just good to know that different repositories can use different standards. See more standards here.

Controlled vocabularies

Where metadata standards tell us what to call the metadata fields, controlled vocabularies come in handy when we have to fill in those fields. Using controlled vocabularies enables machines to identify identical values, instead of everyone using a different term for the same thing.

Whereas some fields have very extensive controlled vocabularies, psychology does not have many. A few links: