Take care of your data – data classification and risks
Time
The lesson is expected to take about 25 minutes to complete. There is also a reflection exercise, which it is good to do jointly with your fellow students.
About the lesson ‘Take care of your data – data classification and risks’
In this lesson, you will be introduced to data, data classification and risk assessment. Data comes in many different shapes and sizes. Data may be publicly accessible, but data may also be confidential. What you need to think about is: What kind of data am I working with? Where should my data be stored to be secure? What risks are associated with the way I work with my data? How can I reduce these risks?
Data classification and risk assessment can be slightly complex, but you will be greatly helped by the tools and methods presented in this lesson – and by using your common sense.
The lesson will mention several concepts that may be new to you. You can find definitions of these concepts in the glossary.
Learning outcomes
When you have completed the lesson, we expect that you:
can explain why it is important to classify the types of data you are using for your assignments and projects.
can perform simple data classification by identifying the data types you are working with.
can perform a risk assessment in relation to data security.
can cite methods for handling data securely to prevent data leaks and loss.
know where to find further information and guidance on secure data management on the UCPH intranet.
Source
The lesson was produced by the University of Copenhagen as part of its learning resources for digital literacy, 2023.
Course directors:
Lorna Wildgaard (PhD), Specialist Consultant, Copenhagen University Library (KUB), Research Support Services
Asger Væring Larsen, specialist consultant, Copenhagen University Library (KUB), Research Support Services
What are data – and which data should you take extra care of?
If you experience access denied, try reloading the page. For English subtitles, please look for the CC icon in the lower right corner of the video and press English.
You must take extra care of the following types of data:
Confidential data: This might be information about identifiable persons, or information that must be kept secret for research or business reasons. Please note that anonymised or pseudonymised personal data can often be traced back to specific individuals if it is possible to combine the data with other data. Read more in the section ‘Compensatory measures’ below.
Data that is time-consuming, expensive or impossible to recreate.
You must take good care of personal data – both sensitive and non-sensitive data
You must protect all personal data that can be attributed to a physical person – both sensitive and non-sensitive data. It is a question of respecting people by protecting their data and identity so that the data cannot be misused by others.
The Danish Data Protection Agency describes personal data as follows:
Personal data is any kind of information that can be attributed to an identifiable person, even if the individual can only be identified through combining the information with additional data.
Examples of personal data are civil registration (CPR) numbers, vehicle registration numbers, a picture, a fingerprint, a voice, medical records or biological material when it is practically possible to identify a person on the basis of the information or when combined with other information. The information is said to be ‘personally identifiable’.
The General Data Protection Regulation divides personal data into three types:
Personal data (non-sensitive data)
Special categories of personal data (sensitive data)
Information about criminal convictions and offences or security measures
The division is based on the fact that different conditions and procedures apply to the processing of personal data, depending on the sensitivity of the data.
[...]
Personal data (non-sensitive personal data)
Personal data covers all information that is not classified as special categories of data (sensitive personal data). Examples include ID information such as name, address, age and education, and also financial matters such as tax and debt. Personal data also covers significant social problems, other private matters, illness or official matters. Information in the form of photos, family relationships, housing, car, qualifications, job applications, CV, date of employment and position, work area and work phone are also deemed personal data.
Special categories of personal data (sensitive personal data)
Sensitive personal data is expressly defined in the General Data Protection Regulation, and access to processing such data is more restrictive than for ordinary personal data.
Sensitive personal data is information relating to:
Race and ethnic origin
Political beliefs
Religious or philosophical beliefs
Trade union membership
Genetic data
Biometric data for the purpose of uniquely identifying a person
In addition, special rules apply to information about criminal offences.
If you work with the above types of data, you should carry out a risk assessment and classification of your data to establish how thorough you should be.
The two types of analysis have been developed to ensure that workplaces, organisations and the authorities can handle other people’s data in a proper manner. For example, UCPH must process information about its students and employees without the risk of data loss or leaks. This is not a process that individuals normally perform on their own data in such a structured manner, but it is important that you are aware of how the methods work, and that you are able to carry out the risk assessment and the data classification if, one day, you need to handle personal data or other confidential data as part of your job or as part of a research project.
If you do not work with personal data, sensitive personal information or other confidential data, you do not need to take any special precautions – apart from using your common sense.
You just need to make sure that:
your data are stored in such a way that you can find them again.
you have a backup of your data – save a copy of the data somewhere else, for example on an external hard drive or at OneDrive (link to guide).
By having two or three copies of your data, you can be sure that you will not lose them if your computer breaks down or is stolen.
In doing a risk assessment, you assess the degree of security that is needed to protect the data you are going to work with. In this way, you can avoid over-protecting all your data, and instead concentrate on protecting the data that really needs to be protected.
When working with personal data or other confidential data, you should carefully consider:
which information you want to gather – avoid collecting more data than you need.
how you are going to collect the data.
where you are going to store the data.
who you need to share the data with both during and after the project, and whether you are permitted to share the data in the first place.
what is going to happen with the data once the project/assignment is finished.
The risk assessment consists of the following three parts:
Data classification: You identify which data you are working with, and how you should protect them against misuse and loss.
Assessment of how your solution complies with security requirements: In this context, ‘solution’ means where you plan to save your data, but also, for example, how you collect data, transfer the data for storage, and how you might want to share the data with others securely – i.e. the actual data management.
Assessment of compensatory measures: What can you do to rectify a situation with a built-in risk, for example if only one student in your study group has a copy of your data on a USB flash drive? A compensatory measure could be to have a backup of your data on a shared drive.
What you do in practice depends, among other things, on how confidential the data are, and the type of data in question.
As mentioned, a data classification should only be performed if there is reason to do so – i.e. if you need to handle personal data or data which for other reasons must be kept confidential. In most cases, you will be told by a teacher or supervisor where the data should be stored, and who should have access to the data, but it is good to know how these decisions have been arrived at so that you will be able to make similar decisions at a later point in time.The idea is to protect data in a systematic and appropriate way.
You should carefully consider the following:
Confidentiality
How confidential are the data you are processing? Is it health information about identifiable persons, or the height and weight of 100 tobacco plants? Sensitive personal data should be considered confidential, while the height and weight of tobacco plants need not. One usually operates on a scale of 1 to 4, where 1 is the most confidential, or restricted (for example the health information of named persons). Data which you have received from a business or public authority may also be confidential, even though the data are not personal data.
Integrity
Integrity is about who has access to change the data. Is it just you? Or do cooperation partners or teachers also have access to the data?
Accessibility
Accessibility is about the stability of access to your data. For example, is it OK that a server is not available every so often because of maintenance, or is it important for you to be able to access your data 24/7. This might be the case, for example, if a device is measuring something continuously for three days and stores the data on a particular server. In this case, the server needs to be running all the time, otherwise data will be lost.
What you can do: Make a table of the three elements and their classification on a scale from 1 to 4. It might, for example, look as follows:
Data classification
1-4 where 1 is highest
Note (example text)
Confidentiality
2
Data are not sensitive personal data, but ordinary personal data
Integrity
1
Data may only be changed by me and not by anyone else
Accessibility
3
Data can be inaccessible in special cases, for example in the case of server breakdowns or maintenance
The video below shows how University of Copenhagen employees are required to work with data classification.
How do I assess whether my solution complies with the security requirements?
Once you have classified your data, you need to find a suitable storage solution. One solution, for example, could be to save your data on your computer with a backup on an external hard drive, your T drive, OneDrive etc.
Take a close look at the solution to assess whether it meets the security requirements:
Is the solution suitable for personal data?
Does the solution allow you to easily share data with your teacher, supervisor or others?
Are backups made of the data?
Do you need a code to log on and access the data?
Can you access your data when you need to?
Is the solution good enough compared to the risk of data being lost, changed or leaked.
Avoid USB flash drives If your data are not confidential, no harm will be done if, for example, they are stored on an unencrypted USB flash drive, whereas it is a no-go for personal data or other data that must not fall into other people’s hands.
In general, you should avoid using USB flash drives as they are small and can therefore easily be lost. However, the risk of something unfortunate happening must be assessed in conjunction with the consequences. If the consequence is very minor, you might be able to live with a risk. You can probably live with the risk of losing a USB flash drive with data on it if you keep a copy on your computer or network drive, but not if the data is confidential. If saving personal data on a USB flash drive is your only option, you must ensure that the data are encrypted so that only person(s) who have the code can read the data.
Use a risk/consequence table to assess your solution We have prepared a table to help you think through your solutions and procedures. You can use the table to assess the risk of ‘something’ happening to your data in connection with your chosen solutions and procedures.
Risk/Consequence
Unimportant
Of some importance
Severe
Critical
Will occur
Very likely
Quite likely
Unlikely
Green: No changes necessary Yellow: Introduce compensatory measures (see ‘Assessment of compensatory measures’ below) Red: Find another solution
An example of how to use the table: For example, assess the risk of someone seeing personal data in hard copy format if you leave your work on your desk at home after finishing work for the day.
If the risk is ‘very likely’, because you are living with others who have access to your desk, you should look at the row where it says ‘Very likely’. You then need to assess the consequence as being serious, as you cannot be certain that the people in question won’t use the information for something (e.g. share the information with others). You will now see that the colour is red beside the ‘Serious’ column.
Red means that you have to find another solution. Another solution might be (1) to only ever work with pseudonymised data on paper, so that the people in question are not able to see who the health information belongs to; (2) to scan the papers and only work with the data digitally; or (3) to lock the papers in a cabinet when you are not working with them.
Compensatory measures are the things you can do to modify a solution with a built-in risk – typically those marked yellow in the above table.
If, for example, you work with sensitive personal data in paper format, the papers should not be left on your desk overnight. Not even at home. This is because of the risk of someone who has nothing to do with the assignment or the project being able to see the information. Even transporting the documents from the university to your home involves a risk of loss if, for example, your bag is stolen or you accidentally leave it on a bus.
Compensatory measure 1: Do not transport your data anywhere, but keep the data in a locked cupboard in your teacher’s office when you are not working with them. Alternatively, scan the documents and keep the digital versions in a safe place so that you can work with them at home, for example.
Compensatory measure 2: Anonymise or pseudonymise your data. The identity of the data subjects will then be better protected.
Anonymisation and pseudonymisation are effective methods for protecting the persons whose data you are processing. GDPR and the Danish Data Protection Act apply only to data which have not been anonymised, and where the persons involved are thus identifiable. In other words, if you anonymise your data, they are no longer covered by the GDPR and the associated storage requirements. However, you need to be careful, because pseudonymisation is not the same as anonymisation, and pseudonymised data ARE covered by GDPR.
But what is the difference?
Pseudonymisation
In case of pseudonymisation, you remove direct identifiers from your data. This might, for example, be a name or civil registration (CPR) number, which you then replace with a pseudonym – i.e. a number or code. On a list which is kept in a different location to the data, you link the name and the pseudonym so that you will always be able to identify the person who appears in the dataset if, for example, you need to contact them.
Pseudonymisation is recommended as an extra precaution while working with your data and possibly sharing the data with other project group members, but you still have to regard the data as being confidential, and they must be protected against unauthorised access. Only after complete anonymisation can you share data freely without the consent of the data subjects.
Anonymisation
In the case of anonymisation, you typically also replace the direct identifier with a code, but without having a list linking codes and names. This means that it is NOT possible to identify the person(s) who appear in the dataset. In an anonymised dataset, it must not be possible for anyone to identify the data subjects – not even the person who created the dataset. As a rule of thumb, it must not be possible for you to recognise yourself if you feature in the dataset. However, it is not always enough to just replace names and civil registration numbers with codes. To prevent data from being identifiable if combined with other available information, you can use anonymisation to change the values of the data. For example, ages can be replaced with age ranges (0-9, 10-19, 20-29 etc.) instead of people’s exact age. This removes certain information from the dataset, but it makes any identification harder. In order for a dataset to be completely anonymous, it must not be possible for anyone to identify the data subjects – not even through unlawful means.
Anonymising data correctly can be tricky, so ask your teacher or supervisor for advice.
Watch this short video which explains the difference between pseudonymisation and anonymisation.
The video mentions “Trump protection methods” (2:30), which refers to a law about data protected against the unauthorised or unlawful processing and disclosure of personal data which was adopted in the USA during the Trump presidency.
How would you classify your own data, and what risks are associated with handling them?
How would it affect your choice of data storage if you collect sensitive personal data, e.g. in the form of interviews stored as audio files?
What do you need to be aware of?
Assignments on KUnet
Go to your study page on KUnet, open ‘study information’ and answer the following questions:
Where can you find information about collecting and handling personal data?
Open the page on collecting and handling personal data. Where can you find a consent form?
Read the consent form. When would you use the form in a project?
Imagine you are working in a group and will be handling personal data. In order to handle personal data in a group project, you need to fill in a form for joint data controllership. Where on KUnet> can you find the template for joint data controllership?
When you start writing your bachelor project or master’s thesis, you will need to learn more about research data management. Where on KUnet can you find more information about research data management for students?
KUnet > Study Information > [Select your study portal] > Planning your studies > Rules and exemptions > How to collect and process personal data
The study information pages contain:
The rules for collecting and storing personal data.
Consent forms informing data subjects how you intend to work with their data.
The form you must sign if you collect personal data as part of a group project.
Permission forms that your supervisor must sign if you will be working with data collected from a hospital etc.
A guide to setting up OneDrive, which is where we recommend that you store non-confidential data, forms and agreements.
Information about your personal T-drive, which is where we recommend that you store confidential data.
How should you process personal data?
As a UCPH student, it is your responsibility, in collaboration with your supervisor, to collect and store personal data correctly. In order for you to complete a project involving personal data without violating applicable data protection law, it is important that you understand how to comply with the rules and work responsibly with other people’s data.