Understanding novel and traditional data source trade-offs in quality, utility, and governance
Special Track Chairs: Leigh Anderson, Francisco Santamarina and Ananditha Raghunath (University of Washington)
Most countries conduct a census at least once every ten years, and there are several multi-country household surveys including the Living Standards Measurement Study and Demographic Household Survey that every three or so years collect data on a variety of welfare and other measures used for country and global purposes such as tracking the Sustainable Development Goals. These surveys generally provide rich, nationally representative household data that are more valid and reliable than many administrative estimates and can be disaggregated by sub-population, such as female headed households. But gathering these data are expensive. Novel data sources that have emerged – satellite, social media, the internet – tend to be less resource intensive as the infrastructure is in place and data are collected with use, almost incidentally rather than intentionally. But less is documented about the quality and potential biases of novel data, relative to traditional data sources such as administrative data and household surveys.
The utility of data includes its technical accessibility and use. Two issues are how raw or micro-data are used to construct summary indicators and train machine algorithms. The construction decisions that translate data into summary statistics, including what unit of analysis and aggregation rule is used, what appears in a ratio’s denominator, how nominal values are treated, and what time frame is chosen, can change estimates in both magnitude and direction of change. The issues are particularly salient for low- and middle-income countries, whose “progress” is tracked and evaluated against more than a hundred indicators underlying the Sustainable Development Goals. To the extent that these numbers influence decision-making and resource allocation, they matter. As noted in a recent Nature editorial (Mathieu, 2022): “Over and over, I’ve seen governments emphasize making dashboards look good when the priority should be making data available.”
Additionally, new machine learning methods offer potentially cost-effective methods to measure, track and evaluate multiple indicators and outcomes. Presently these methods are largely trained on traditional data. One challenge will be to exploit cost-effective new sources and methods while maintaining valid, accessible, and inclusive measures and resisting the opportunistic use of data and algorithms that exclude sub-populations and can embed other biases.
Ultimately, investments can be made to increase quality and technical capacity, though broad accessibility will likely always remain a challenge for more complex data sources. But there are political and resource issues in the construction, maintainability and use of official data that will drive governance decisions at the local and global level. Different national or governing values around security and privacy trade-offs, individual and collective rights, and public versus commercial incentives will influence decisions around collecting, protecting, and presenting national data. Papers in this track will compare novel and traditional data sources, covering either quality, utility, or governance considerations.
Papers might cover:
- How advanced technologies might help LMICs “leapfrog” some data challenges, for example using Artificial Intelligence, Blockchain, Internet of Things, Data Visualization and analytics infrastructures, cloud and mobile technologies
- How LMICs can own and use their own data for measurement and evaluation
- Data biases of particular concern to LMICs as economies transform
- Use cases of LMIC local and national governance models and frameworks for data
- We invite high-quality submissions that employ quantitative, qualitative, and mixed-method research approaches.
Papers may fit into an existing special edition on Data for Development.