Data retrieval
Data management coordinators
Specifies who has access to or works with the data (data creation, data entry, data analysis or other ways of working with data).
Backup
Indicates whether a copy of the data is stored together with, or separate from, the original data file. This could be remote storage (cloud-based), or storage on a local external hard drive.
Description/metadata
The process of creating cohesion between various data objects and/or of describing different data objects. This includes overview objects (tables of contents, indexes), established or developed metadata schemas, descriptive documents (README files) or standards/ontologies. Metadata thus describe data in a structured way, and may include the name of the data collector, the date of collection, place etc.
Protection of the right to privacy
Focuses on the protection of test subjects’ right to privacy. This may include HIPAA compliance (US rules for the protection of health information), anonymising or other ways of protecting the right to privacy.
Biobank
A structured collection of human biological material that is available according to specific criteria, and where the information linked to the biological material is attributable to specific individuals. According to Danish legal practice, biobanks are seen as manual registers and therefore subject to both EU and Danish data protection legislation.
Cloud solution
Cloud refers to ‘cloud computing’. A cloud solution is software and services made available via the internet by a company such as Microsoft OneDrive.
Data Processing Agreement (DPA)
The transfer of personal data from a data controller to a data processor must be protected by a data processing agreement. This must fulfil certain minimum requirements set out in article 28 of the General Data Protection Regulation and article 29 of Regulation (EU) 2018/1725.
The Danish Data Protection Act (Databeskyttelsesloven)
Danish act on data protection that supplements the EU regulation (GDPR). See https://www.retsinformation.dk/eli/lta/2018/502
DataCite
An international non-profit organisation that issues Digital Object Identifiers (DOls) for research data and objects (e.g. physical, digital or abstract objects). A DOI is persistent (it never changes), making it ideal for references in articles etc. A DOI is usually shown as a clickable link which the reader can use to directly access the object (e.g. an article or dataset) referred to.
Data format
Data are stored in files, and files can be in a number of formats. Textual files are often stored with the suffix .doc or .txt. Excel files are usually saved as .xls or .xlsx. Files containing images can be saved as .jpg, .tif or .bmp, or one of the many other formats. The file extension indicates what type of data the file contains, and in some cases, which software you need to open it.
Data classification
Categorisation into data classes based on what impact it would have on the university and researcher if the data were lost or compromised. Data classification is part of a risk assessment, and is used to make decisions about appropriate measures to safeguard data.
Data corruption
Data corruption is when files contain errors after copying, moving or opening a file, working on it, and saving it again. Sometimes an error is added to a file such that the file either has erroneous content, or cannot even be opened/read by the computer. In other cases, data corruption occurs over time due to the impact of radiation, magnetism, high temperatures etc. on the physical medium containing the file. This process is also known as ‘bit rot’. To protect against this, various technical measures must be taken, especially in connection with long-term data preservation (see ‘Long-term preservation’ in this glossary).
Data licence
A legal agreement that regulates the terms and conditions for the reuse of data by others. Examples are Creative Commons licences or Open Source Software licences.
Data management plan (DMP)
A plan, typically prepared at the start of the project, which describes the actions to be taken to collect, process, store, protect, share, preserve and possibly reuse research data in a research project. A DMP is a good tool for aligning expectations between researchers, and is increasingly required by funders and institutions. Researchers can create their own plan or use existing templates, e.g. from their funder (research foundation) or institution.
Data repository
A place for researchers to store datasets and metadata related to their research, typically with a view to exchanging data with other researchers – where data are stored in accessible formats for the long term.
Data size
The size of the data generated, either individually or in total. This can be stated either in analogue terms in connection with laboratory notes or samples, or digitally in MB, GB etc.
Dataset
A structured collection of research data.
Data sharing
Indicates whether the data are shared internally (e.g. with other members of the research group), externally with other researchers, externally to meet the requirements of funders (research foundations), openly with the public, or a mix of the above.
DMPonline
Software for creating a data management plan (DMP). DMPonline offers various DMP templates. The service is provided by the Danish e-Infrastructure Cooperation (DeiC) and managed jointly by the Royal Library and the Technical University of Denmark (DTU Library). It can be found here: https://dmponline.deic.dk/ Links to an external site.. It is available to all employees at Danish research institutions and their partners.
DOI
DOI is an abbreviation for Digital Object Identifier – a system designed to identify and rediscover online materials. A DOI can be viewed as a permanent online fingerprint, assigned to an article so it can always be found via the internet. A DOI is also persistent (it never changes), making it ideal for references in articles etc. A DOI is usually shown as a clickable link which the reader can use to directly access the object (e.g. an article or dataset) referred to.
Electronic Research Data Archive (ERDA)
System at UCPH for storing, sharing, analysing and archiving non-sensitive research data. ERDA provides secure centralised storage for researchers’ own and shared files, interactive analysis tools and archiving, with a view to data security and publication. It can be used as a secure network drive, no matter where you are, and a separate file synchronisation service is also offered similar to Dropbox, but with data stored locally at UCPH. For sensitive data, the Sensitive Information Facility (SIF) at UCPH must be used (see the entry for SIF in this glossary).
FAIR principles
A number of guiding principles to make research data Findable, Accessible, Interoperable and Reusable (Wilkinson et al., 2016, DOI: 10.1038/sdata.2016.18 Links to an external site.). Researchers must follow the FAIR principles when data are to be shared with others within their research areas (‘as FAIR as possible’). The aim is to help maximise data reuse across technical, geographical and disciplinary boundaries, promote research collaboration and have a positive impact on how well research is utilised.
UCPH’s Policy for Research Data Management 10
- F for Findable: ensure there is searchable evidence that a dataset exists, even if the dataset is not directly accessible.
- A for Accessible: provide information on how access can be granted to datasets being shared through Open Access repositories or in some other way.
- I for interoperable: as far as possible, use common standards and/or vocabulary for file formats, metadata and data documentation, so others can open the datasets, work with them and combine them with data from similar projects.
- R for reusable: clearly state the context in which a dataset has been prepared, and the conditions for reuse.
Funding source
Agency that has granted funding for the research to be carried out.
Researcher
Anyone who performs or supports research activities at the University of Copenhagen, including academic staff, PhD students, visiting researchers and affiliated researchers.
Research data
Physical material and digital data that are collected, observed, generated, created or reused as part of research activities at UCPH. Covers all the material and data the research is based on, such as samples, lab notes, interviews, texts and literature, digital raw data, audio and video recordings, computer code and precise catalogues of these materials, as well as data upon which the analysis and results are based, e.g. clinical records, sequence data, spreadsheets, interview files etc.
Research data management
A collective term covering the planning, collection, processing, storage, securing, sharing and archiving of primary material and research data.
Research integrity
High integrity in research means that research has been conducted honestly, transparently and responsibly. Read more about integrity at ku.dk: https://forskning.ku.dk/integritet/kodeks-og-retningslinjer/
Head of research
A head of research is defined as a researcher who is the principal investigator (PI) on a research project and/or who manages a research unit and/or has been delegated similar responsibilities.
Research project
A project where a researcher/student or a team of researchers/students seeks to answer research questions by collecting information, analysing it and drawing conclusions based on this. Not all research takes place as activities in separate projects. In some cases, research takes the form of ongoing activities that gradually increase the researcher’s knowledge in a particular area.
Research results
Conclusions based on research data.
Confidentiality agreement
An agreement with the company or organisation supplying data. The agreement states what students or researchers may do with the data provided.
Confidential data
Information which – by law or agreement – must be protected against unauthorised access, use, disclosure, alteration or destruction. This includes personal data, confidential company information and classified information.
Retrieving data
Localising data for reuse, e.g. in academic repositories, on websites of colleagues or in other relevant places.
Sensitive data
The term ‘sensitive data’ is often used in the context of GDPR, and means sensitive personal data in this context. This is a special category of personal data and covers information about a person’s race or ethnic origin, political, religious or philosophical beliefs, trade union membership, genetic data, biometric data processed for the purpose of unique identification, health, sexual relationships or sexual orientation. Sensitive personal data must be protected in line with stricter guidelines than ordinary personal data (name, address etc.).
GDPR
The General Data Protection Regulation is an EU regulation governing how companies and public authorities process personal data.
Reuse
Making data available for reproduction, education or the like.
Visiting researcher
A researcher employed at another institution or company who visits UCPH for a limited time. When visiting researchers carry out research projects and/or handle research data at UCPH, they must observe UCPH’s Policy for Research Data Management, and any other university policy, legislation or agreement that applies to research at UCPH (e.g. legislation on personal data).
High Performance Computing (HPC)
Calculations performed using many computer servers (sometimes called ‘nodes’), connected in parallel to a high-speed network. HPC systems make it possible to handle large volumes of data and solve problems requiring extensive processing. HPC is used in many research fields, ranging from life sciences, physics and mathematics to medicine, linguistics and social sciences.
Intellectual property rights
Legal rights that exist or are assigned to protect intellectual creations. Examples include copyright, patent rights, design rights and trademark rights.
Information security
Processes and methods designed and implemented to protect sensitive data from unauthorised access, use, misuse, disclosure, destruction, alteration or disruption. This applies to digital data and physical items such as museum specimens, biobanks, herbariums, geological collections, audio and video recordings, paper-based administrative files, patent registers etc.
ISO
ISO stands for the International Organisation for Standardisation, and is a scheme, under which companies and authorities can apply for certification. An ISO certification can be attained through compliance with the requirements set out in the certification. There are ISO standards for almost everything, from products to services.
Contracts and agreements
It is often necessary to sign an agreement or contract with a company in order to work together. Such an agreement states the parties’ obligations and the conditions that apply to their collaboration.
Data management requirements
Institutions, funders and journals that the faculty uses sometimes impose data management requirements. In these cases, special data management requirements apply within the relevant subject area. For example, some funders (research foundations) require researchers to complete and submit a data management plan (see ‘DMP’ in this glossary). Journals and scientific publishers are increasingly requiring that the data on which an article is based must be published, and that the article must include a link to the dataset.
Encrypted/unencrypted
Information that is encrypted cannot be read by anyone who does not have the encryption code or key. During encryption, letters and numbers are replaced with other letters and numbers which look random to the human eye. An algorithm encodes the information, and it can only be decoded or decrypted by someone with the right key. This protects data against unauthorised access.
Curation
Curating data involves adding metadata, creating and managing different versions of data, aggregating data and managing data collections. Curation also means ensuring data quality and that data management and processing are in line with laws, rules and guidelines.
Long-term preservation
Long-term data preservation is an activity that goes beyond storage on a hard drive or magnetic tape, and in principle means preservation forever. In practice, any preservation of data for more than 5-10 years will often be called long-term preservation. Long-term preservation includes constantly monitoring that the bits the files consist of do not change over time (‘bit rot’). Correct long-term preservation also entails constantly updating file formats to the latest version, or retaining the software used to create the file also, so that one can always open and work with the file. The two types of long-term preservation are also called ‘bit preservation’ and ‘logical preservation’.
Storage infrastructure
The place you store your data. This can be your laptop hard drive, a cloud solution like OneDrive or a network drive at the university.
Data life cycle
The data life cycle provides a high-level overview of the steps involved in successful data management and data preservation for use and reuse. The life cycle describes the steps through which data pass, from generation to archiving or deletion.
Metadata
Information describing the properties of an item or dataset, which allows future identification, retrieval and handling of the given item or dataset, e.g. name, unit of measure, date, contact information etc. Metadata can take many different forms, from free text to structured machine-readable content. Some subject areas or repositories have specific requirements for metadata format and content, perhaps in the form of a formal standard.
Network drive
A network drive is often referred to by a letter, such as ‘S drive’ or ‘L drive’, and is a drive on a server to which you have access via the network. If you want to have access to a network drive without being on the network, you must install a VPN connection.
Cost
The financial resources required to perform data management tasks. This includes paying for personnel, storage, software etc.
Storage
Reporting where the digital files or analogue specimens are kept either during or after the project (examples: local computer, department shared drive, cloud-based storage).
Retention period
Specifies how long data are kept after a project is completed, at the end of a grant, or the like.
Open Access
Free, unrestricted online access to research outputs such as journal articles, books and data sets. In UCPH’s Policy for Research Data Management, Open Access refers to datasets only (Open Data).
Open Data
Datasets that can be freely used, re-used and redistributed by anyone. Open Data are typically deposited in online data repositories where they can be accessed without restrictions on reuse, possibly subject only to requirements to attribute (cite/provide credit to the dataset creators) or share alike. The conditions attached to a dataset will often be described in the Creative Commons licence or some other licence.
Copyright
A legal right giving the creator the exclusive rights to control their work, for example to make copies of the work or to publish, distribute, reproduce, modify, adapt, transform, publicly display or perform the work. In order to obtain copyright protection, the work must be original and in a fixed form. Examples of research outputs that can be protected by copyright:
- Writings and texts such as articles, monographs, contribution to books and anthologies
- Images and visuals such as figures, graphs, diagrams, drawings, photographs, maps, PowerPoint presentations, software, videos
- Audio and sound such as music, recordings of interviews, sound recordings
Training/education
Formal and informal education and training for members of the research team. You will often be asked to describe this, e.g. in applications for research funding from a foundation, or in connection with clinical research, where you must follow a special set of rules called Good Clinical Practice (GCP).
Data organisation
Activities that include the use of file naming conventions and folder structures that make it easier to find your way around a dataset. It can also involve structuring data in a spreadsheet in such a way that they can be read-in by certain software.
Persistent identifier (PID)
A long-lasting reference to a document, file, web page, or other object. In the context of FAIR data, a persistent identifier is an unbreakable and actionable link associated with a digital object on the internet. Examples of persistent identifiers are Digital Object Identifiers (DOIs) typically used for journal articles and datasets, and Open Researcher and Contributor IDs (ORCIDs) used to identify authors of scholarly work. See the entry for DOI in this glossary.
Personal data
Data relating to persons, who can be identified directly or indirectly using those data. Examples are images, names or references to CPR numbers or economic, social, cultural, physical, physiological or mental characteristics. Recordings of a person’s voice, e.g. in connection with an interview, are also regarded as personal data, as identification based on the person’s voice cannot be ruled out. Personal data are also referred to as personal information or personally identifiable data.
General Data Protection Regulation (GDPR)
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC.
Policy
A set of rules and guidelines for data management that UCPH wants all students and employees to observe.
Primary data
Primary data are essentially data collected directly from the source, e.g. through observations, questionnaires, interviews or experiments. Primary data are often specific to a given study and not available from other sources.
Principal Investigator (PI)
The lead researcher on a research project. See the ‘Head of research’ entry in this glossary.
Project or research statement
A general description of the research project or a faculty member’s overall work.
Project members
Researchers and students who contribute to the research conducted in a project.
Cleaning
A dataset might contain information that is not identical in structure or not consistent. For example, if you collect information about which country a group of people come from, it can be entered in different ways: DK, Danmark, Denmark, Dnemark, USA, the US, America etc. In order to work with these inconsistent data, they must be ‘cleaned’ so that the country names are stated in the same way. It may also mean removing erroneous information or making sure that measurements are expressed using the same unit (e.g. cm, litre or mmol).
Reproducibility
Reproducibility in research means that others can repeat your research project. This means that the project must be adequately described, and there must be access to data, software and hardware (if applicable).
Danish National Archives
The purpose of the Danish National Archives is to collect original physical documents and digital files of historical value and preserve them indefinitely, to ensure this information remains available. In 2020, the Danish National Archives drafted an executive order Links to an external site. covering registration of research data produced at Danish universities, in order to assess which research data to transfer to and preserve in the Danish National Archives.
Risk assessment
An analysis to assess risks to data confidentiality, integrity and accessibility. The risk assessment can be used to map which safety requirements must be complied with and which precautions must be taken to prevent breaches in confidentiality and loss of data (integrity). Integrity in the context of data means that data cannot be changed by unauthorised persons or as a result of technical errors. For personal data, a GDPR-risk assessment assesses the risks to the rights of data subjects. If the GDPR risk assessment reveals a high risk for the data subjects, a Data Processing Impact Assessment (DPIA) must also be completed along with the GDPR risk assessment.
Secondary data
Secondary data are data collected in a different context to the current study. Secondary data have often already been processed and organised in a meaningful way and are available from books, articles, reports, statistics, public databases etc.
Sensitive Information Facility (SIF)
Facility at UCPH for storing and sharing sensitive data, including personally identifiable data classified as sensitive under the EU’s General Data Protection Regulation (GDPR). SIF makes it possible to share data with project members from UCPH as well as external partners, but only if there are legal agreements governing data transfers. SIF can be accessed from anywhere via an internet connection, and uses compulsory multi-factor authentication for secure login. The Electronic Research Data Archive (ERDA) at UCPH can be used to store, share, analyse and archive non-sensitive research data (see the entry for ERDA in this glossary).
Server
A server is a computer that is always turned on, and can be accessed via the internet or an internal network. Websites are stored on servers that are accessed via the internet. A server can also be local, for example providing storage space for employees’ files. Your private emails are stored on an email server with your ISP or a dedicated service, such as Gmail, while your UCPH emails are stored on a UCPH email server.
Security
Identifying physical and electronic security of the data. May include encryption, password access, or something similar, but may also include physical security, such as a lock on the door or on the freezer containing biological specimens.
Time spent
Time associated with performing data management tasks. Can be a very extensive, especially if working with data that are unstructured and need cleaning. See the entry for ‘Cleaning’ in this glossary.
Affiliate researcher
A researcher not paid by the university, but engaged by the university or a research group to perform certain duties or functions.
Third party
An individual, company, other university etc. collaborating on a research project at UCPH, without being employed at UCPH, that has not entered into a collaboration agreement with UCPH.
Supervisor
An experienced researcher providing guidance to a less experienced researcher or student.
Company data
Data originating from a company. These are often covered by an agreement stating how to handle them and who they may be shared with.
This glossary is drawn from:
Goben, A. and Griffen, T. (2019) In Aggregate: Trends, Needs, and Opportunities from Research Data Management Surveys. College & Research Libraries: 903-924
n.n (2021) Ordliste. Forskningsmanagement og GDPR. https://kunet.ku.dk/arbejdsomraader/forskning/data/ordliste/Sider/default.aspx . Accessed in April 2022.
University of Copenhagen. (2022). Policy for Research Data Management (Version 1). Copenhagen: https://kunet.ku.dk/work-areas/research/data/Documents/UCPHPolicyforResearchDataManagement2022-EN.pdf
The glossary was edited by Lorna Wildgaard and Asger Væring Larsen.