Qualitative data

For questions, contact your privacy officer (PO). On the My EUR page of the Privacy Office (PO) you can find contact details for your faculty’s PO.

Things to think about before data collection

One of the best ways to protect the privacy of research participants is not to collect certain identifiable information at all. While planning your research, please, consider data minimisation. Limit the collection of personal information to data directly relevant and necessary to the purposes of your study. For example, if possible in your study, before data collection, you can ask participants to anonymise their experiences by avoiding mention of full personal names, exact dates, employment locations, or detailed information related to third persons.

Planning anonymisation at an early stage of the research (for instance, in the data management plan) will help you to identify the resources needed in the different stages of the research life cycle.

In the absence of consent, the data you disclose must be anonymous. Anonymisation is best planned early in the research process, to help reduce anonymisation costs.It should be noted that anonymization in qualitative data deals with ‘balancing’ two different priorities: protecting the identities of participants and maintaining the value and integrity of the data. Excessive removal of information in qualitative data such as text or audio/video recordings can lead to distortion of data, making them unusable, unreliable or misleading. To balance privacy protection and keeping data useful, anonymisation should be considered alongside informed consent and access controls.

Pre-planning and agreeing with participants during the consent process, on what may and may not be recorded or transcribed, can be a much more effective way of creating data that accurately represents the research process and the contribution of participants. For example, if an employer’s name cannot be disclosed, it should be agreed in advance that it will not be mentioned during an interview. This is easier than spending time later removing it from a recording or transcript.

Personal data contains information that directly or indirectly identifies a natural person (for definitions and examples see this link). Generally speaking, direct identifiers and strong indirect identifiers need to be removed or replaced with pseudonyms. Indirect identifiers can either be removed or categorized. In the case of qualitative data, categorising means coarsening identifying information, which is a better choice when the indirect identifier is essential for comprehending the data. For example, instead of mentioning the age of a participant, use categories such as [20-25 years old]. This concerns such indirect identifiers as: Postal code, District/Part of town, Municipality of residence, Region, Municipality type, Year of birth, Age, Household composition, Occupation, Education, Mother tongue, Nationality, Workplace/Employer, Crime or punishment, Position of trust or membership + all special categories information.

Best practices for pseudonymisation/anonymisation of qualitative data

Anonymisation of audio-visual data, such as editing of digital images or audio recordings, should be done sensitively. Bleeping out real names or place names is acceptable, but disguising voices by altering the pitch in a recording, or obscuring faces by pixelating sections of a video image significantly, reduces the usefulness of data. These processes are also highly labour intensive and expensive.

If confidentiality of audio-visual data is an issue, it is better to obtain the participant’s consent to use and share the data unaltered. Where anonymisation would result in too much loss of data content, regulating access to data can be considered as a better strategy.

Plan anonymisation and experiment with a couple of files at the time of transcription or initial write-up. Longitudinal studies may be an exception if relationships between waves of interviews need special attention for harmonised editing.
Use pseudonyms or generic descriptors to edit identifying information, rather than blanking-out that information.
Use pseudonyms or replacements that are consistent throughout the research team and the project. For example, using the same pseudonyms in publications and follow-up research.
Identify replacements in text clearly, for example with [brackets] or using XML tags such as <seg>word to be anonymised</seg>.
Use 'search and replace' techniques carefully so that unintended changes are not made, and misspelled words are not missed.
Create a copy of the files to be anonymised and anonymise the copied files. This way, possible errors in anonymisation can still be fixed.
Back up the original unedited version of the files (but store them separately) for use within the research team and for preservation. For persons who have both the unedited version and the anonymised version, the data is pseudonymised.
Create a pseudonymisation key (also known as an anonymisation log) of all replacements, aggregations or removals made and store such a log securely and separately from the anonymised data files.

Find and highlight direct identifiers by reading the transcript.
Assess indirect identifiers:
- Can the identity of a participant be known from information in the data file?
- Can a third party be disclosed or harmed from information in the data file?
Assess the wider picture:
- Which identifying information about an individual participant can be noted from all the data and documentation available to a user? Remove (or pseudonymise) direct identifiers.
- Which indirect identifiers are essential for understanding the data? Redact or categorize the indirect identifiers.
Re-assess any remaining disclosure risk.

Methods for anonymising qualitative data

Things to think about before data collection

Best practices for pseudonymisation/anonymisation of qualitative data

Further reading

Compare @count study programme

@title

Our channels

Methods for anonymising qualitative data

Things to think about before data collection

Data minimisation

Consent

Planning

Best practices for pseudonymisation/anonymisation of qualitative data

Audio-visual data

Interview transcripts

Step by step anonymising qualitative data

Further reading

Share this page

Compare @count study programme

@title