A Gender Diverse Perspective of Bias in Large Language Models"/> A Gender Diverse Perspective of Bias in Large Language Models"/>
Concerns about bias in large language models (LLMs) often focus on technical metrics, yet limited research examines how marginalized communities perceive and interpret model behavior. Through 25 in-depth interviews with participants across gender identities (non-binary, men, and women), we investigate how ChatGPT responds to gendered versus neutral prompts and how users evaluate bias in these responses. Our findings reveal striking differences: non-binary participants identified condescension, stereotyping, and identity erasure in outputs that cisgender participants often found acceptable, demonstrating that bias perception is not universal, but rather shaped by lived experience and social positioning. This differential perception reveals a fundamental limitation in current fairness evaluation: technical metrics and aggregate user feedback systematically miss harms that are visible primarily to marginalized users. Our work demonstrates that evaluating LLM fairness requires centering the perspectives of those most affected by algorithmic bias, rather than relying solely on technical detection or treating all user feedback as equivalent, and makes recommendations for how to go about this.
@inproceedings{Gaba26facct,
author = {Aimen Gaba and Emily Wall and Os Keyes and Kyle Wm Hall and Yuriy Brun and Cindy Xiong Bearfield},
title =
{A Gender Diverse Perspective of Bias in Large Language Models},
booktitle = {Proceedings of the 9th ACM Conference on Fairness, Accountability, and Transparency (FAccT)},
venue = {FAccT},
address = {Montreal, QC, Canada},
month = {June},
date = {25--28},
year = {2026},
abstract = {Concerns about bias in large language models (LLMs) often
focus on technical metrics, yet limited research examines how marginalized
communities perceive and interpret model behavior. Through 25 in-depth
interviews with participants across gender identities (non-binary, men, and
women), we investigate how ChatGPT responds to gendered versus neutral
prompts and how users evaluate bias in these responses. Our findings reveal
striking differences: non-binary participants identified condescension,
stereotyping, and identity erasure in outputs that cisgender participants
often found acceptable, demonstrating that bias perception is not
universal, but rather shaped by lived experience and social positioning.
This differential perception reveals a fundamental limitation in current
fairness evaluation: technical metrics and aggregate user feedback
systematically miss harms that are visible primarily to marginalized users.
Our work demonstrates that evaluating LLM fairness requires centering the
perspectives of those most affected by algorithmic bias, rather than
relying solely on technical detection or treating all user feedback as
equivalent, and makes recommendations for how to go about this.}
}