Two
<G a t h e r i n g
D a t a >
Machine learning models are shaped by the datasets that train them. Data is not neutral—it carries the biases, omissions, and ethical concerns of its collection process. Data is primarily human. If AI models are built on human data, then the question arises,
How might we ethically collect data to programme more equitable AI futures?
Extractivist Data Methodologies
Ethical + Feminist Data Methodologies
Dubious or Hidden or False Consent and Permission
Informed Consent
Hidden use of Data -
The Function of Data Sharing is not Defined or Shared
The Function of Data Sharing is not Defined or Shared
Transparency about Outcomes Generated
from Use of Shared Data
from Use of Shared Data
Reproduction of Biases
through Inappropriate or Biased Labelling
through Inappropriate or Biased Labelling
Diverse and Communal Collection
Allows for Dilution of Biases
Allows for Dilution of Biases
Designing the Workshop Framework
With guidance from Bella Day* and drawing from co-creative and activist-led workshop practices, I structured a three-part workshop aimed at ethically collecting first-hand emotional narratives. Each part was designed to:
Build an environment of trust – Creating spaces where participants could share stories comfortably and authentically.
Encourage co-creation – Positioning participants as active contributors to the dataset, rather than passive sources of data.
Experiment with multiple data forms – Separating the collected data into textual transcripts of spoken narratives and non-verbal sounds, exploring different modalities of emotional expression.
This workshop process was essential to counteract the biases embedded in conventional AI datasets. By prioritizing human-first data collection, the goal was not to extract rigid classifications of emotion, but to gather narratives in a way that remains fluid, subjective, and open-ended—aligning with the core philosophy of Machine Yearning.
Below is a fundamental flow for each part, and you can access the blueprint for each workshop here.
Build an environment of trust – Creating spaces where participants could share stories comfortably and authentically.
Encourage co-creation – Positioning participants as active contributors to the dataset, rather than passive sources of data.
Experiment with multiple data forms – Separating the collected data into textual transcripts of spoken narratives and non-verbal sounds, exploring different modalities of emotional expression.
This workshop process was essential to counteract the biases embedded in conventional AI datasets. By prioritizing human-first data collection, the goal was not to extract rigid classifications of emotion, but to gather narratives in a way that remains fluid, subjective, and open-ended—aligning with the core philosophy of Machine Yearning.
Below is a fundamental flow for each part, and you can access the blueprint for each workshop here.