By Marcus Durand on Thursday, May, 3rd, 2018 in Blog: Collaboration and Knowledge Management,Health IT,Latest Updates. No Comments
This piece is the fifth installment of a six-part series called “Open to potential: How embracing open data can advance public health practice, governance, and research.” See links to the previous Parts 1-4 below, at the base of this post.
As a longtime member of the International Health Section of the American Public Health Association, I have led a team of volunteer researchers over the past two years to collect and analyze data for the Global Health Jobs Analysis project. Born out of a pair of conversations I had at the 2015 Annual Meeting in Chicago, we put together a team to collect and analyze data on the global health job market for graduates of U.S.-based MPH programs, with the aim of giving the Section’s students and aspiring global health professionals insight into this highly competitive field. The project team was committed to making both the results and the underlying data available to our members. We conducted every step of the process – data collection, analysis design, and statistical computing – with the goal of making it as transparent as possible, so that anyone who wanted to could view the data and retrace our steps. We also chose to submit the manuscript for consideration only to those journals that would make that possible.
We were thrilled when our paper was accepted and eventually published by BMC Public Health, an open access journal that makes its peer and editorial review process visible to the public. One of the submission requirements is that the data be placed in an open data repository, if possible, to allow peer reviewers and other researchers to better scrutinize the analysis and test for reproducibility. Our team had already committed to doing that, so there was no resistance to setting up our data repository. Nonetheless, doing so gave us a window into the amount of extra effort involved – as well as a sense of the tectonic shift happening in the research community as advocates try to make this a norm. Publisher Springer Nature produced this infographic to show the benefits of open research data.
Making research data open (within the boundaries of ethical obligations such as ensuring patient confidentiality) has gained significant traction over the past five years. This is due in part to policies established by the U.S. and European governments to make the data produced by publicly-funded research open. As we mentioned in the previous installment of this series, open government advocates have made a compelling case that the public should have access to the data that their money goes toward creating. Proponents of open research have also argued that opening research data strengthens the scientific process. In addition to holding researchers accountable to donors, making the data available helps to ensure integrity and reproducibility.
However, most things worth doing take a lot of work and encounter their own unique challenges along the way. Open research data is no exception. As valuable as a data repository is to the research community at large, setting one up – and keeping it current – involves a lot of extra work. Unlike government programs that can integrate open data into their operations systematically, the burden of maintaining a data repository falls to individual scientists or research teams. This can be a daunting task, particularly when study authors are already stretched to capacity with the laborious processes of manuscript submission and peer review.
Furthermore, disincentives to share research data are hard-baked into the research industry. A highly vocal minority in the community have pushed back against the open research data movement, arguing that requiring scientists to make their data available will rob them of the chance to publish multiple studies using the data that they initially prepared and put them at a competitive disadvantage for grants:
“A key motivation for investigators to conduct RCTs is the ability to publish not only the primary trial report, but also major secondary articles based on the trial data,” the researchers wrote. “The original investigators almost always intend to undertake additional analyses of the data and explore new hypotheses. … Once the investigators who have conducted the trial no longer have exclusive access to the data, they will effectively be competing with people who have not contributed to the substantial efforts and often years of work required to conduct the trial.”
While such arguments are not necessarily mainstream thinking, they are common enough to make it difficult for junior researchers who might otherwise be open to sharing their data to do so.
Rather than relying on individual researchers to move open research data forward, journals and funders can address these issues by requiring scientists to make their data publicly available. This would go a long way in creating a culture of open data, in much the same way that organizations must take a holistic approach to establishing knowledge management as a norm in their structure and operations. A handful of high-profile publications blazed this trail, and many others are now following suit. The PLOS journal series has chronicled their own transition, explaining how they integrated open data principles into every step of the publication and review process – and the changes seen as the research community has adapted. Interestingly, observers have drawn connections between data management and RIM for research libraries, noting how essential these are for reproducibility:
In the years before the National Science Foundation (NSF) released its data management plan (DMP) requirement, libraries and library organizations were building socio‐technical infrastructure for data management services, and more broadly, E‐Science support, in the information science profession. […] Ideologically, studies have argued, data management is similar to information management and is something libraries and librarians know much about.
This piece is the fifth installment of a six-part series: “Open to potential: How embracing open data can advance public health practice, governance, and research.”
See part 1 of this series: Open Data: What is it and Why is it Important to Public Health?
See part 2 of this series: Open Data: Why Knowledge Management is Critical to its Sustainability
See part 3 of this series: Open Data: Inspiring Action and Improving Practice in Public Health
See part 4 of this series: Open Data in Government: Transparency, Accountability, & Improving Service Delivery