The internet has been described as a social phenomenon, a tool and also a (field) site for research. When processing large amounts of data in your research, bear in mind that all big data research on social, medical, psychological, and economic phenomena engages with human subjects; all these data are people. As a researcher, you have the ethical responsibility to minimise potential harm to them.
A number of issues have been raised in research that is internet-mediated and/or uses social media data. These relate, among others, to:
- whether all data that are available are also public and whether it is fair to use them in research;
- meeting conditions of free and voluntary informed consent in the context of social media research;
- anonymity;
- risk of harm through tracing or exposing the social media user’s identity and profile;
- uncertainty about whether some users being studied are children or belong to other vulnerable groups.
In using social media data in your research, bear in mind that even data sets comprising thousands of tweets involve human beings who could be directly or indirectly affected by research. There is considerable evidence that even anonymised data sets may make individuals identifiable if they contain enough personal information. Research with anonymised data sets may cause harm to a group through, for instance, discrimination against or stigmatisation of entire populations. Consider the ‘mosaic effect’, if you plan to combine large amounts of data from various sources that appear not to be attributable to particular individuals in isolation. While they may look relatively harmless in their own right, there is a chance that they may cause a breach of privacy when combined.
Through the availability of government data sets and their machine readability, the effect is also relevant in social research, as there is a risk that disparate threads can be easily combined in a way that yields private information or information that could be harmful to individuals if placed in a new context. Linking diverse sources of social media data can produce the same effect.
If on assessing the risks, you anticipate any risk of harm to individuals whose data you are using, you must:
- paraphrase all data that will be republished (to prevent others being led to the individual’s online profile);
- seek informed consent from people whose data you intend to use in its original form in research outputs; or
- consider a more traditional research approach that better ensures consent and confidentiality.
Remember that just because data is publicly accessible, that does not mean that it can be processed by anyone for any purpose. When ascertaining whether data is open for use or is to be considered private, bear in mind the online environment where it is posted and the reasonable expectations of privacy which the user may have. Password-protected profiles and closed group discussions are obviously intended by their users to be private.
When processing social media platform data:
- make sure you are sensitive to the issues raised;
- comply with the EU General Data Protection Regulation (GDPR);
- consult your host institution's data protection officer and/or ethics advisor;
- find out if you need to obtain ethical approval for collecting data.
For more information, please refer to the document Ethics and Data Protection (Section: The use of previously collected data (‘secondary use’). The GDPR includes specific safeguards related to automated processing or profiling of personal data.
References to more literature on social media and big data research
- Annette Markham & Elizabeth Buchanan. Ethical Decision-Making and Internet Research. Recommendations from the AoIR Ethics Working Committee (Version 2), p.3. Available at: http://www.aoir.org/reports/ethics2.pdf
- Matthew Zook et al. (2017). Ten simple rules for responsible big data research. Editorial. Plos Computational Biology, March 30, 2017, p.1. Available at: http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005399&type=printable
- Leanne Townsend & Claire Wallace. Social Media Research: A Guide to Ethics. University of Aberdeen. Available at: https://www.gla.ac.uk/media/media_487729_en.pdf
- David E. Pozen (2005.) The Mosaic Theory, National Security, and the Freedom of Information Act. The Yale Law Journal 115:3, Dec 2005, pp. 628-679. Available at: https://www.yalelawjournal.org/note/the-mosaictheory-national-security-and-the-freedom-of-information-act
- Adam Mazmanian (2014). The mosaic effect and big data. The Business of Federal Technology, May 13, 2014. Available at: https://fcw.com/articles/2014/05/13/fose-mosaic.aspx