Some kinds of data are sensitive, and cannot be shared for legal or ethical reasons. This can include:
De-identification means removing identifying data from a dataset. Once a dataset has been de-identified, the dataset can be shared without disclosing identifying information.
Removing identifiers is important to protect the confidentiality of research participants. But there is always a risk of re-identifying data, and changing technology introduces new ways to re-identify data. Managing that risk is an important part of sharing research data.
There are several ways of approaching de-identification:
Anonymization |
Anonymization refers to the processing of personal data in a manner that makes it impossible to identify individuals from them. For example, the data can be rendered down to a general level (aggregated) or converted into statistics so that individuals can no longer be identified from them. The prevention of identification must be permanent and make it impossible for the controller or a third party to convert the data back into identifiable form with the information held by them. |
Pseudonymization |
Pseudonymization means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific person without the use of additional information. Such additional information must be kept carefully separate from personal data. Pseudonymized data can still be used to single individuals out and combine their data from different records. Example: Pseudonymization | Research Data Management (ubc.ca) |
TCPS2 (2022) provides the following categories as guidance for assessing the extent to which information could be used to identify an individual:
Content by Vancouver Community College Library is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License