LibGuides: Research Data Management Strategy: Secure and Organize

De-Identification

Articles

Current recommendations/practices for anonymising data from clinical trials in order to make it available for sharing: A scoping review
Exchanging words: Engaging the challenges of sharing qualitative research data

Access Control

Open Access - data can be accessed online by any user
Registered Users - data are accessible only to users that have registered with the repository. Suitable when there may be a risk of linking indirect identifiers.
Restricted Access - data accessible on request. Suitable for sensitive data.
Embargo - access to data temporarily restricted.

Data Protection Terminology

Some kinds of data are sensitive, and cannot be shared for legal or ethical reasons. This can include:

Personal identifiers
Sensitive ecological data
Sacred or protected cultural practices

De-identification means removing identifying data from a dataset. Once a dataset has been de-identified, the dataset can be shared without disclosing identifying information.

Removing identifiers is important to protect the confidentiality of research participants. But there is always a risk of re-identifying data, and changing technology introduces new ways to re-identify data. Managing that risk is an important part of sharing research data.

There are several ways of approaching de-identification:

Anonymization

Anonymization refers to the processing of personal data in a manner that makes it impossible to identify individuals from them. For example, the data can be rendered down to a general level (aggregated) or converted into statistics so that individuals can no longer be identified from them. The prevention of identification must be permanent and make it impossible for the controller or a third party to convert the data back into identifiable form with the information held by them.

Example: Anonymization | Research Data Management (ubc.ca)

Pseudonymization

Pseudonymization means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific person without the use of additional information. Such additional information must be kept carefully separate from personal data. Pseudonymized data can still be used to single individuals out and combine their data from different records.

Example: Pseudonymization | Research Data Management (ubc.ca)

Types of Information

TCPS2 (2022) provides the following categories as guidance for assessing the extent to which information could be used to identify an individual:

Directly identifying information – the information identifies a specific individual through direct identifiers (e.g., name, social insurance number, personal health number).
Indirectly identifying information – the information can reasonably be expected to identify an individual through a combination of indirect identifiers (e.g., date of birth, place of residence or unique personal characteristic).
Coded information – direct identifiers are removed from the information and replaced with a code. Depending on access to the code, it may be possible to re-identify specific participants (e.g., the principal investigator retains a list that links the participants' code names with their actual names so data can be re-linked if necessary).
Anonymized information – the information is irrevocably stripped of direct identifiers, a code is not kept to allow future re-linkage, and risk of re-identification of individuals from remaining indirect identifiers is low or very low.
Anonymous information – the information never had identifiers associated with it (e.g., anonymous surveys) and risk of identification of individuals is low or very low.

Research Data Management Strategy: Secure and Organize

De-Identification

Articles

Access Control

Data Protection Terminology

Types of Information

Infographic Data De-Identification