Special Track 5

Data and AI: critical global perspectives on the governance of datasets used for artificial intelligence

Special Track Chairs:

Ben Snaith, Open Data Institute
Elena Simperl, Open Data Institute and Kings College London
Lee Tiedrich, Duke University
Maroussia Lévesque, Harvard University

Description

The models underpinning all forms of artificial intelligence, large-language models, machine learning and automated decision making all rely upon the collection and ingestion of data to identify patterns which can train the technology. These datasets might originate from a variety of sources that can range in terms of their availability for external scrutiny. This could include data from proprietary research studies, or more likely they are scraped from the web – a practice that is skyrocketing with the rise of generative AI.

Across different sectors, there have been calls for more openness regarding these datasets – where and how they are collected, stored and used – and for increased data skills across the population to help people understand how and where AI is operating and what data is feeding these models. Within many of these datasets, there has been found to be explicit and implicit biases related to gender, ethnicity, sexuality, class or geography – that can lead to potential harmful impacts when AI is used in practice. Developments in LLM’s and generative AI has led to work by artists being appropriated and styles imitated, with little course for resolution. Additionally, global warnings have been issued by data protection authorities underscored the privacy concerns presented by data ingestion, litigation and enforcement on these matters continues to escalate. The calls for governance of these technologies and datasets has not been adequately addressed and now, in 2023, the gap between governance and advancement of technology feels more substantial. To fill this gap, 2023 has spawned a number of new collaborations, structures and summits in a bid to impose new norms on AI.

This track is designed to provide a platform to understand the potential theories of governance for the datasets underpinning AI and what the future of local, national, regional and international governance might look like. It also will examine how to align laws and policies with promising technical solutions, which is key for operationalizing solutions to address this pressing need. The research can focus on an individual sector, geography, service, project or dataset to elucidate consideration for dataset governance. It can also provide an opportunity to discuss the development of governance for AI datasets in relation to other technologies, and whether the present focus on AI is justified. Finally, the research can address how to craft laws and policies designed to promote responsible AI and data practices, including by protecting privacy and cybersecurity and helping to ensure that AI is fair, explainable, safe and otherwise aligns with the proposed AI Principles, such as by the OECD.

The aim of this special track is to source research to explore and deepen the understanding of AI datasets and to understand the theory, design and practice of dataset governance and how these designs and practices should be addressed by policymakers. Possible topics might include (but not be limited to):

Responsible data: How to utilise law and policy to promote responsible data practices regarding the collection, storage and use of data for AI
Institutions for governance: the role of institutions in designing and implementing data-based governance and the literacy required to do so
Regional and place-based dynamics: how geographies and geopolitical dynamics lead to conflict and cooperation around AI datasets between different areas, countries, cultures/languages or regions.
Data commons and shared data sets: Mechanisms that encourage the responsible and voluntary sharing of data sets among institutions.
Assurance and trustworthiness of AI datasets: the role of automated and human-led assurance of datasets to ensure safety, transparency, fairness, accountability, and trust
Open source and open data: the effectiveness of using open-source code and models to increase the transparency of AI training datasets and algorithms.
Alternative models of data governance for AI: Alternative models of data governance for AI and new frameworks for ethics and justice centred on justice, Indigenous values, gender.
Power asymmetries within the collection of data for AI: The unequal distribution of power between those who collect data and those who use it to train AI models.
Dataset distortion: mechanisms and technical approaches that support compliance with legal requirements and ethical goals, or protect users from harmful practices