Reproducibility vs Ethics
In the past few years, public conversation about the collection and use of personal data by governments and tech companies has been largely focused on the political and legal aspects of big data. A parallel but distinct debate arises in the context of scientific research that uses sensitive geographic data, in which growing calls for increased reproducibility and transparency must be balanced with those for increased ethical protection. Mei-Po Kwan’s recent talk for the American Geographical Society highlights these issues in the context of research and policy related to Covid-19. The pandemic has upended the previous landscape of data collection and publishing, as individuals, governments, and institutions weigh the risks of publishing potentially sensitive data with the rewards of improved public health through measures like contact tracing. Crucially, these changes have been drastically different in different social, political, and geographic contexts - while there has been a general increase in willingness to collect and share personal data, this has not occurred in the same way or for the same reasons everywhere, and there has also been a strong reaction against any increased data use - and the risks and rewards of using sensitive data vary greatly according to the population and process being studied. Researchers have also made important advances (in both technology and methodology) in their ability to mitigate the risks of using sensitive data, and these techniques are themselves dependent on the data and research question at hand. All of this is to say that the ethical concerns of doing reproducible research, as well as the methods used to address those concerns, are highly context-specific and demand ethical policies that are adaptable to the context-specific requirements of the research. The same is true of the research itself and the aspects of study design that make it reproducible.
Making research reproducible can also be an ethical obligation in itself, since many of the potential benefits to those whose data is used in research are lost if the insights produced from that data are not supported by reproduction or replication. Ultimately it is public policy and culture that have the most power to solve public issues and improve conditions for vulnerable populations, and reproducibility is essential for guiding policy and building public trust in scientific research. Improving ethics and privacy in big data research absolutely deserves time, energy, and money, but ethical research does no good if it is lost to the ether of poorly documented methodology and the proverbial ‘file drawer’. This does not mean that reproducibility should take precedence over ethics in research, but that questions of ethics and reproducibility are tightly bound and should be addressed together.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.