7. Sharing data openly
Learning objectives
When you have completed this lesson, you will be able to:
- Explain how sharing data benefits both you and others
- Explain what 'Open Data' is
- Describe how to share your project in your data management plan
____________________________________________________________
What is data sharing?
Data sharing is the practice of allowing others to access and use your data, code, protocols, documentation and the like. Data sharing can occur within organizations, between different organizations, or among individuals. In lesson 6 we addressed storage tools for the sharing of active data with project members while the project is ongoing (in a collaboration). In this lesson, we will focus on the sharing of finalized datasets by uploading these datasets in data repositories to make them openly available to others. This is also called data publication.
Before we dive into why this is important and what you should think about when sharing data, let us first introduce some key concepts:
Data repository
A data repository is a storage facility where researchers and students can deposit (upload) digital data sets and other digital research objects, as well as metadata associated with their project, for the purpose of sharing data with others.
Open data
Open data, also known as Open Access to data, refers to free, unrestricted online access to research data. To openly share/provide Open Access to your data you typically deposit them in data repositories without setting restrictions on or conditions for data access. This means that others can simply download a copy of your data set without asking your permission first.
Persistent identifiers
A persistent identifier (PID) is a long-lasting, unbreakable internet link to a resource such as a document, dataset, web page, piece of software or research article. When you upload your data to a data repository, the repository will in many cases create a persistent identifier for your data sets. The most common persistent identifier for articles and datasets is a Digital Object Identifier (DOI).
Data license
A data license is a bit of legal text that specifies a standard set of terms and conditions for the reuse of data sets by others. For example, if you want others to cite you when they use your data, you should attach a license specifying that you want to be cited, such as a CC-BY (‘by attribution) license. Many data repositories will give you the option to choose between license types when you upload your data there.
____________________________________________________________
Why share your project's data?
Watch the video below to see why you should consider sharing your data.
If you experience access denied, reload the page or try another browser
For English subtitles, please look for the CC icon in the lower right corner of the video and press English.
Sharing data in repositories…
- means your work can contribute to other projects and reduce duplication.
- gives others the opportunity to validate and build on your work.
- requires you to present your data in an organised and well-described way, so that others (including your future-self) understand what the data show, and how you collected, analysed and processed the data.
- can help you preserve data beyond project end, ensuring that the data remain accessible to you in the future (see lesson 8.Data preservation)
____________________________________________________________
What to share?
Before you share your data via a data repository, you will have to check whether you are actually allowed to do so. Certain rules may prohibit data sharing while others may mandate it. Here are some examples:
- If you work with personal data, you will need to protect the privacy of the persons you obtained the data from. In most cases you will not be able to share personal data and will have to destruct the data when your project ends. You can only share data openly using data repositories if the data are completely anonymized (you can no longer identify the persons). In some very specific cases, you may be able to share personal data in non-anonymized form, if the participants in your study provided informed consent for this, but in this case there should also be a legal agreement in place (a data sharing agreement) and you should use a secure infrastructure (lesson 6.Data storage and security) for sharing instead of a data repository.
- If you collaborated with external parties to generate the data, such as researchers at other universities or companies, you will have to determine who holds the rights to the data and can determine whether the data can be shared. This is often described in a contract with the external party.
- If your results are based on existing data obtained from data providers, such as Statistics Denmark or the Danish National Archives it is unlikely that you are allowed to pass these data on to others. You should investigate the data providers’ policies.
- If you plan to publish results in a journal article, you will have to check the policy of the publisher. Publishers increasingly require data sets associated with the research article to be shared using data repositories.
! IMPORTANT Always talk to your supervisor to decide: 1) whether the data can be shared or not 2) whether the data you want to share can be made openly available via a repository, or need to be shared via more restrictive sharing methods.
|
You could for example decide that your data are preserved internally at UCPH and only shared from individual to individual upon request (see lesson 8. Data preservation).
In the remainder of this lesson, we will only address situations where data can be shared openly via a data repository.
____________________________________________________________
How to share your project's data openly
Once you have determined that you are allowed to share your data openly, you will need to decide on the best way to do so.
Below are some tips to consider:
Investigate which data repository to use |
|
In general, there are three different types of repositories.
Choosing a repository to share your data can be complicated. Here is a list of considerations that will help you choose:
|
|
Make sure you include documentation and metadata |
|
It does not make sense to share datasets that no one can understand. Therefore, it is important that a description of the data is included in the repository record. There are two main ways to do this: By uploading documents that describe the data along with the data set Besides uploading a ReadMe file, you can also consider uploading other relevant information that can provide context to the data, such as a project description, a protocol, an interview guide, a literature list, powerpoint slides presenting your project, etc. By filling in information about the dataset in the upload form |
|
Get a persistent identifier for your data |
|
Many repositories create a persistent identifier for your dataset when you upload your data. These identifiers are long lasting links to your data and an example is a DOI, a digital object identifier. Persistent identifiers are important for various reasons. First of all, like with journal articles, you (and others) can use the persistent identifier to cite your data, for example when you want to refer to your data in your thesis, website or future journal article. You simply copy the persistent identifier that the repository has generated for you and add it to your citation. Here are examples of what a dataset citation can look like: Clarke, Harry (2022). A provisional checklist of European butterfly larval foodplants [Dataset]. Dryad. https://doi.org/10.5061/dryad.6hdr7sr35 Links to an external site. Meyers, Charlène, 2022, "Transcription of 20 sight translations by translation students coded with voiced pauses and silent pauses", https://doi.org/10.34934/DVN/KHQE0P Links to an external site., Social Sciences and Digital Humanities Archive – SODHA, V1 Secondly, the persistent identifier will make it much easier for others to find and access your data. They can simply click the persistent identifier from any online document or website that cites the data and it will lead them directly to your dataset record in the repository. |
|
Include information about how your data can be reused by others |
|
A last step to take when sharing your data openly is to communicate how others can reuse your data. Can others do whatever they want with the data? Or are there restrictions? For example, can the data be used for commercial purposes? And do you want others to cite your data when they reuse them? Talk to your supervisor and determine together whether you should have any requirements for reuse. There are various ways to communicate how your data can be reused. First of all, you can simply describe your requirements in any documentation you upload along with the dataset (e.g. the ReadMe file) or in the metadata record the repository asks you to fill in. Alternatively, you can use a license to communicate your terms. More about reuse licenses The Creative Commons licenses are the most common form of generic licenses. Creative Commons licenses are a set of seven licenses that you can combine to accurately define how others can use and share your data. You can decide if and how the user must cite you when using the data, if the data can be used for commercial purposes, or if you allow the data to be altered or adapted in any way. There are type-specific licenses too. GNU licenses, for example, are for software and other types of practical works, such as programmes, while Unsplash licenses are for photographs that can be freely shared. When choosing a license and sharing the data and materials make sure you are the rightsholder of the data. Only the rightsholder can license the data. Once you have licensed your data you cannot revoke the license. ! IMPORTANT Using licenses goes two ways. Not only can licenses be used by you to communicate how others can use your data. It is also very important that you check whether there are licenses associated with existing data that you yourself wish to use in your project. See lesson 3. Rights, requirements and responsibilities. |
___________________________________________________________
How to share data
Check out the decision tree below decision tree. It can help you decide when and how to share your data. You can download the decision tree from here Download here.
____________________________________________________________
Data sharing in practice
Morten Arendt Rasmussen, supervisor at the Faculty of Science, explains how students can share raw data and code using either university repositories or external data and code sharing services.
____________________________________________________________
Test yourself
Check whether you captured the main points of this lesson:
____________________________________________________________
Continue with your DMP
Please continue working on your data management plan (DMP) by filling in the last questions under Section 7.Sharing data openly.
7.a Will any of the data/material in the project be shared openly with others?
If yes, describe which datasets.
7.b If yes, how will the data/material be shared openly? Address what repository you expect to use and what documentation will be sent along with the data/material.
If you haven't begun filling out your DMP yet, you can find the DMP template here: UCPH DMP Template for Students
Download UCPH DMP Template for Students .
Remember to discuss the data management plan with your supervisor at the start of your project. Keep the DMP stored along with your data.
____________________________________________________________
Practical tips for data sharing
- Discuss with your supervisor whether you have any data in your project that can or should be shared openly. Use the decision making tree in this lesson.
- As a starting point use a discipline or data type specific repository. Discuss what repository is typically used for your type of data. Have a look at the repository browsers re3data.org
Links to an external site. and FAIRsharing
Links to an external site. to search for available discipline specific repositories.
- If no obvious discipline specific repository exists, go for a generalist repository. A recommendation is Zenodo
Links to an external site., because it is easy to use and provides documentation and user support. When you save your project or metadata describing your project in Zenodo, your project will be visible on Google and other search engines for about the next twenty years.
- You could also use UCPH’s institutional repository ERDA and the service Data DOI. Data DOI is a service offered to researchers at UCPH, who mainly wish to archive their data long term and provide research data sets with a DOI.
- Make sure you include information that explains the data when you upload them. For example, by uploading an associated ReadMe file. Find a template here.
- If the repository creates a persistent identifier for you (e.g. a DOI), cite the identifier in your thesis or future research article.
- Communicate any conditions you have for the reuse of data by others, for example by using the repository option to attach a license. A Creative Commons Attribution (CC-BY) license will be a good choice for most (but not all) datasets. It asks others to cite you (and your collaborators, if you have any) when reusing your data. If you have any questions about licenses contact your faculty’s library.
- If your bachelor or masters project does not contain personal or confidential data or is in any other way restricted, you can deposit a copy of your project at the university library via Digital Exam. After your project has been graded and you have passed, your project will be publicly available through the Royal Library (soeg.kb.dk).
Find a guidance on KUnet > Study information > [Choose your study portal] > Master's thesis and other projects > Submission and assessment > How can I make my master's thesis available for loan?
Find published theses, bachelor projects and other student assignments from UCPH, Aarhus University and Roskilde University, i.e. projects students have given permission to share here Links to an external site..
- Look up terms related to research data management in the RDM Glossary.
____________________________________________________________
Learn more
Below, you will find some external resources for some optional further reading on some of the topics mentioned in this lesson.
About the licenses. Creative Commons. (n.d.). Retrieved August 23, 2024, from: https://creativecommons.org/licenses/ Links to an external site. Links to an external site.
GNU operating system.(2007). Retrieved August 23rd, 2024, from: https://www.gnu.org/licenses/licenses.html Links to an external site.
____________________________________________________________
Published in 2024