Contextualizing Toxicity: An Annotation Framework for Unveiling Pragmatics in Conversations of Online Discussion Forums
Abstract: The role of context has attracted increasing attention in research on toxicity detection. Interpreting toxic language remains a complex and multifaceted challenge, shaped by numerous linguistic, contextual, and social factors. However, current approaches often define “context” narrowly, focusing primarily on surface lexical cues such as hate lexicons, profanity markers, or sentiment polarity. These features, while useful, are insufficient to capture the interactional dynamics, user behaviors, and intentionality that shape such phenomena. To address this gap, this paper introduces a novel and systematic annotation framework, grounded in Speech Act Theory (Austin, 1962), aimed at deciphering the illocutionary and perlocutionary dimensions of conversation, which are unexplored in existing studies. We apply this framework to a new dataset of complete Reddit conversation threads, sampled to include discussions that turn toxic (124 conversations, 1990 messages). We evaluate the performance of GPT models (GPT-3, GPT-4, and GPT-5) on this challenging annotation task, providing insights into how large language models capture pragmatic and contextual dimensions of online toxicity.