Chapter
Two

<G a t h e r i n g
D a t a >











Machine learning models are shaped by the datasets that train them. Data is not neutral—it carries the biases, omissions, and ethical concerns of its collection process. Data is primarily human. If AI models are built on human data, then the question arises,

How might we ethically collect data to programme more equitable AI futures?







The dominant approach to dataset creation within machine learning is extractive, often relying on hidden, dubious, or false consent. In contrast, feminist and ethical data methodologies emphasize transparency, agency, and communal participation. This distinction informed my process -






Extractivist Data Methodologies

Ethical + Feminist Data Methodologies

Dubious or Hidden or False Consent and Permission
Informed Consent


Hidden use of Data -
The Function of Data Sharing is not Defined or Shared
Transparency about Outcomes Generated 
from Use of Shared Data


Reproduction of Biases
through Inappropriate or Biased Labelling
Diverse and Communal Collection
Allows for Dilution of Biases








Given that yearning is a deeply personal and culturally shaped emotion, I needed a data-gathering process that was intentional, participatory, and reflective—one that did not simply extract information, but instead fostered spaces for storytelling, dialogue, and co-creation.










Designing the Workshop Framework
With guidance from Bella Day* and drawing from co-creative and activist-led workshop practices, I structured a three-part workshop aimed at ethically collecting first-hand emotional narratives. Each part was designed to:

Build an environment of trust – Creating spaces where participants could share stories comfortably and authentically.

Encourage co-creation – Positioning participants as active contributors to the dataset, rather than passive sources of data.

Experiment with multiple data forms – Separating the collected data into textual transcripts of spoken narratives and non-verbal sounds, exploring different modalities of emotional expression.

This workshop process was essential to counteract the biases embedded in conventional AI datasets. By prioritizing human-first data collection, the goal was not to extract rigid classifications of emotion, but to gather narratives in a way that remains fluid, subjective, and open-ended—aligning with the core philosophy of Machine Yearning.

Below is a fundamental flow for each part, and you can access the blueprint for each workshop here.








Y’s Website Navigator