8. Data preservation

COBL_learningoutcome_40x40px_2017_15.png  Learning objectives

When you have completed this lesson, you will be able to: 

  • Explain the advantages of preserving data after project end. 
  • Describe what should be considered when determining what data will be preserved and how to preserve it.
  • Draft your own data preservation plan. 

____________________________________________________________

 

COBL_litterature_40x40px_2017_18.png  Why preserve data after your project?

When you reach the end of your project, you will have finalized the collection, processing and analysis of your data. You will likely capture your main conclusions in a document, such as a bachelor or master thesis. However, research data management does not end with the submission of your thesis.  There is one last step to take: the preservation (also called ‘archiving’) of your digital data, physical material and relevant documentation. Data preservation is defined as the actions you take to ensure that your research data are kept in a way in which they remain available and usable, possibly for several years after your project. Data preservation is important because it will allow your future-self, other project members and your supervisor to revisit the material and understand how you conducted the project. This could provide insights useful for similar studies, and possibly allow the research data from your project to be reused in new projects. 

____________________________________________________________

 

COBL_litterature_40x40px_2017_18.png  What to preserve

You do not necessarily have to preserve everything you produced in your project. You will have to decide what to keep and what to discard. For example, you could consider keeping:

  • The data sets on which you base the conclusions described in your thesis. Perhaps you need to consult them again in the future when questions arise about your thesis?
  • Any data that may end up being the basis of a scientific publication.
  • Data that you assess to be valuable to others. Perhaps they can be used in another project?
  • Data that were based on rare samples or specimens. These data are not easily recreated.

There may also be some data that you cannot preserve, for example:

  • Personal data, if the participants have not consented to you keeping their data after project end.
  • Confidential data, such as business information, if the contract with the company you worked with requires data destruction or a return of the data to the company.
  • Very large data files or physical material, if there is insufficient storage space or funding to keep them.
  • Any physical material that will deteriorate in quality over time so that they cannot be used for new projects.

Always discuss your plan for data preservation with your supervisor. What data do they want to be kept after the project? Are there any requirements or rules for data preservation in your research group, at your institute, or the place where you conducted your internship? Is it costly to preserve the digital data and material, and if so, are there sufficient resources available to cover these costs? Don’t limit your discussion to digital data only; also think about what to do with code and codebooks, (lab)books, specimens, samples, artifacts and any documentation in your project.

____________________________________________________________

 

COBL_litterature_40x40px_2017_18.png  Making a data preservation plan

It is a really good idea to make a plan for the preservation of the data, materials and documentation that you decide to keep after your project. You could describe your plans in your project’s data management plan. Consider the following questions: 

 Where should I preserve my data and material?

Where you should preserve your data after project end depends on a number of decisions:

  • Who should have access to the data after the project?
    Is it only you, or should your supervisor or other project members also have access? If it is the latter, you will have to store the data in a location that remains accessible to others (and perhaps yourself) when your enrolment at the university ends. This will require moving your data, for example from your personal university network (T) drive to the research group’s common network drive. Please note though, that you will not have access to this group drive yourself after the end of your enrollment.

  • How long should your data be preserved?
    The longer you want to keep your data, the more important it is to check whether that length of storage can be guaranteed. Your personal T-drive will close down three months after your enrollment ends, so it is not a good place to preserve data after storage. DeiC Storage (see lesson 6) guarantees 10 years of storage, and that may be sufficient for your data. If you end up publishing your data in a data repository (see lesson 7) this could also be a place to preserve data, but you will need to check the policy of the data repository to see how long they guarantee to store the data. If you work with normal data, not personal data or confidential data that is, (see lesson 6) you could also consider keeping a copy of the data on a portable hard drive that you keep stored at home.

Data repository vs. Data archive 

In Lesson 7 you were introduced to the concept ‘data repository’.  

  • Data repositories are online database infrastructures that can be used to deposit completed datasets with the main purpose to publish and share data with others, and not necessarily to preserve the data long term.  

  • Data archives are database infrastructures that can also be used to deposit completed datasets, but here the main aim is long term data storage (possibly indefinitely) and not necessarily data sharing.  

This is reflected in how the databases are set up and what functions they offer. Some data repositories also guarantee a minimum storage period (e.g. 10 years), whereas some data archives also offer sharing functionalities. In other words, you may also be able to use a data repository to preserve your data. In the end, your choice of data repository/archive depends on your needs. Always check the policies of the data repositories / archives to see what services they provide.   

 What data format should my data be preserved in? 

One consideration you should make is the format in which to save your digital data. Some file formats require specific software to be opened. Is it reasonable to expect that it is possible to access this software in 5 years from now? If the answer is ‘no’, it is best to preserve the data in a different format that can be opened and viewed on any operating system using any kind of software (‘open format’).  Saving your data in open, unencrypted and uncompressed formats will make your data more usable in the future. If you can’t save your data in an open format, then make sure to include the name of the software needed to open the file in your project’s documentation (for example in a Readme file saved with your data, see Lesson 5)

Here are some examples of common open formats:

Text

 

 

  • Plain text (.txt)
  • Portable Document Format (.pdf)
  • Open Document Format (.odf)

Tables, spreadsheets

 

  • Comma-separated tables (.csv or .txt)
  • Open Document Spreadsheet (.ods)

Images

 

 

  • Tag Image File Format (.tiff or .tif)
  • Joint Photographic Experts Group (.jpg)
  • Graphics Interchange Format (.gif)

Audio

 

  • Waveform Audio File Format (.wav)
  • Free Lossless Audio Codec (.flac)
Video
  • Moving Pictures Expert Group 4 (.mp4)

 What documentation should accompany my preserved data and material?

Preservation of data and material only makes sense if they can be found and understood in the future, by yourself and others. Therefore, make sure that documentation about content of data files and physical material is easily findable and accessible. Ensure that:

  • You properly label any physical objects with a date, name and keywords describing the project, and with information about any digital data associated with the material.
  • You use informative filenames for the files you store.
  • You preserve information and metadata describing how you collected and processed the data and material along with the data, such as a project plan or protocol, your data management plan, or ReadMe file.
  • You tell your supervisor where the documentation can be found.

Read lesson 5 for more tips on data documentation.


Example demonstrating the importance of proper documentation for data preservation

The in-house storage of the skeleton of an enormous dinosaur could go unnoticed for decades, because the dinosaur bones hadn’t been catalogued, there were no real records of them and some bones had been used in other displays with no indication that they were part of a whole. 
Read the story here:  Royal Ontario Museum discovers long-lost dinosaur skeleton in own archives Links to an external site. , City News, Nov 14, 2007.

1024px-Barosaurus_neck.jpeg

On top: A part of the long-lost barosaurus skeleton found in the archives of the Royal Ontario Museum.
Photo: Stacie DaPonte from Toronto, ON, Canada, CC BY-SA 2.0 Links to an external site., via Wikimedia Commons

____________________________________________________________

 

COBL_videolecture_40x40px_2017_4.png  Data preservation in practice

Morten Arendt Rasmussen, supervisor at the Faculty of Science, explains which types of data he believes should be preserved after project end. 

If you experience access denied, reload the page or try another browser
For English subtitles, please look for the CC icon in the lower right corner of the video and press English.

 

Supervisor Nicole Schmitt, supervisor at the Faculty of Health and Medical Sciences, addresses the long-term preservation of data from student projects. 

____________________________________________________________

 

COBL_quiz_40x40px_2017_1.png  Test yourself

Check whether you captured the main points of this lesson:

Quiz: Data Preservation

____________________________________________________________

 

COBL_tasks_40x40px_2017_10.png  Finish your DMP

Complete your data management plan (DMP) by filling in the last questions under section 8.Preservation:

 

8.a Describe what data/material/project documentation should be kept once your project is over.

8.b Describe where the data/material/project documentation will be stored after project end, and how a copy of the data will be made available to your supervisor(s).

 

If you haven't begun filling out your DMP yet, you can find the DMP template here:  UCPH DMP Template for Students Download UCPH DMP Template for Students .

Remember to discuss the data management plan with your supervisor at the start of your project. Keep the DMP stored along with your data.

____________________________________________________________

 

COBL_sparks_40x40px_2017_19.png  Practical tips for preserving data

  1. Always start by checking whether there are any rules for the preservation or destruction of your data. Some hints for where to find these rules are in the following tips.

  1. Check UCPH’s general rules for data preservation in UCPHs Policy for Research Data Management. For example, according to the policy, a copy of all digital datasets underlying research publications must be kept at UCPH for a minimum of 5 years after the end of the project, or the date of publication, whichever comes last. Bachelor and Master theses are not strictly considered research publications. However, it is recommended that you, by default, preserve a copy of the data underlying the results presented in Bachelor and Master theses for a minimum of 5 years after the end of your project, unless otherwise determined by your supervisor.

  1. If you work with personal data, check the guidelines for preserving personal data on the study information pages of your study programme under Planning your studies > Rules and dispensations > How to collect and process personal data.

  1. If a contract has been set-up in your project, for example because you work with a company, check the terms and conditions for data preservation in the contract.

  1. Determine what IT infrastructure you should use to preserve your digital data. Here are some suggestions for data classified as normal data (see lesson 5) where a maximum of 10 years of storage after the project is sufficient:

 

UCPH Group Drive

DeiC Storage

A data repository (see lesson 7)

You will need access to data after enrolment ends

Not suitable

Suitable

Suitable*

Other UCPH students/employees need access

Suitable

Suitable

Suitable*

Externals can be given access

Not suitable

Suitable

Suitable*

* You should still check the conditions of the repository you pick, to ensure, for example, that it guarantees at least 5 years of storage.

Please note that other solutions are necessary for personal data and confidential data. Other solutions may also be necessary for data classified as normal data, when for example dealing with very large data volumes, specific file formats or other specific requirements. Lastly, research groups may have their own databases in which data are preserved. Always ask you supervisor first and consult KU-IT if you need help.

  1. Preserve your digital data along with a ReadMe file that explains the data. Here is a template that you could use: ReadMe template

  2. Look up terms related to research data management in the RDM Glossary.

____________________________________________________________

 Published in 2024