By Marcus Durand on Monday, January, 22nd, 2018 in Blog Posts,Health IT,Latest Updates. No Comments
This piece is the third installment of a six-part series called “Open to potential: How embracing open data can advance public health practice, governance, and research.” See links to the previous Parts 1 & 2 below, at the base of this post.
Open data holds potential for public health on a number of different levels. It is a relatively new way of sharing and using data and it can be difficult to distinguish between optimism and hype, because that potential remains largely unexplored. However, a number of different initiatives have been testing the waters and proofs of concept are not difficult to find. One example is a project undertaken by Allegheny County to better understand cardiovascular disease mortality in Pittsburgh. The health department, together with the University of Pittsburgh, is combining housing, environmental, restaurant and Medicaid data to “deepen Allegheny County’s understanding of where — and why — people are dying from cardiovascular disease”:
There’s census-tract-level information on where people are taking anxiety medication and data on establishments that have Clean Indoor Air Act exemptions, meaning people can smoke inside. Fast food establishment info includes where there’s a dollar menu. There’s also data on obesity, poor housing conditions, traffic counts and walk scores.
…
With the data, the University of Pittsburgh is using its platform, the Framework for Reconstructing Epidemic Dynamics (FRED), to “model how different census tracts look with regard to different cardiovascular risk factors as well as outcomes in terms of mortality,” Hacker said. “Then we’re going to look to see if any of these social determinants may actually contribute to the differences between those different census tracts. That will help us in terms of planning for interventions.”
The Allegheny County project and others like it underscore the wide variety of types of data that can be useful for public health research, policy, and practice. “Classic” public health data such as CDC NCHS Vital Statistics and The DHS Program Health Survey Data, are incredibly valuable to public health work done by researchers both inside and outside government. For example, data from the CDC’s National Health and Nutrition Examination Survey (NHANES), one of the largest and most comprehensive health survey programs in the world, led to a policy mandating the removal of lead from gasoline. Publicly available data from the survey has allowed researchers to monitor declines in blood lead levels since then and demonstrate the effectiveness of the regulation. However, as the Allegheny County project shows, all sorts of publicly available data can help to answer public health questions, including data sources related to demographics and segregation, housing, land use, disability, education, employment, environment, and public safety. Even social media can be useful, as demonstrated by its use in predicting county health outcomes.
One of the ways that open data can contribute to public health – and which is generating the most excitement among open health data enthusiasts – is helping practitioners identify problems they did not know about. One study, published in JAMA, and discussed on NPR), that did exactly that, was a county-level analysis on life expectancy done by a group of researchers at the Institute for Health Metrics and Evaluation. They discovered that the gap between the counties with the highest and lowest in life expectancies at birth was a whopping 20 years – and that in the last three decades, it has gotten worse – using CDC mortality data to calculate life expectancy by county based on population estimates from the US Census Bureau. Similarly, the CDC 500 Cities project hopes to help local-level public health jurisdictions identify public health challenges (and better understand the risk factors that contribute to them) by providing 27 measures of chronic disease collected by the CDC Behavioral Risk Factor Surveillance System (BRFSS) on the census-tract level. In addition to being available to researchers for download and analysis, the program will also make the data available to policy makers, journalists, and the public by allowing users to visualize it using an interactive data portal.
Using open data to better address the problems we already know about is somewhat less flashy, but no less important. The Allegheny County project is a great example, as county health officials can use the results of the analysis conducted by their University of Pittsburgh colleagues to better target policies (such as zoning or housing laws) that can have a positive impact on cardiovascular disease rates. Another study in the AJPH that demonstrates the potential of social media data recently came out of the University of Maryland. The researchers took a sample of 80 million geotagged tweets, analyzed them for indicators of happiness, food, and physical activity, and compared them to county-level health indicators using data from CDC WONDER. According to the lead author, “Twitter may be useful as an additional source of information on the health of communities by helping to detect health concerns or evaluate the success of health interventions. Moreover, Twitter can be potentially used to watch changes in real time.” Open data maps are also frequently used with social media in disaster response to direct first responders to where help is most urgently needed.
The possibilities are as exciting as they are wide-ranging. However, notice that all of the high-quality analyses referenced here are combined with data collected and cleaned by publicly-funded government agencies and their contract service providers. Any data set is subject to the limitations that come from its sources and the methods used to collect it, and user-generated data in particular (like social media) comes with its own set of asterisks. There is simply no substitute for high-quality public health survey and surveillance data. While not open data per se, Harvard Business Review reports on the abject failure of Google Flu Trends – which used search data to try to predict flu outbreaks – brought into sharp focus the limits of user-generated data, which, unlike data systematically collected by public health agencies, has no quality checks. As we look at ways to effectively leverage open data sources, it is important to keep in mind that there will always be things that non-governmental open data cannot do.
This piece is the second installment of a six-part series: “Open to potential: How embracing open data can advance public health practice, governance, and research.”
See part 1 of this series: Open Data: What is it and Why is it Important to Public Health?
See part 2 of this series: Open Data: Why Knowledge Management is Critical to its Sustainability