AI for e-health: Stress Identification from Multimodal Data
We present StressID, a new dataset specifically designed to support learning pipelines for stress identification from unimodal and multimodal data. It contains videos of facial expressions, audio recordings, and physiological signals. Its experimental setup ensures a synchronized and high-quality multimodal data collection. Different stress-inducing stimuli, such as emotional video clips, cognitive tasks including mathematical or comprehension exercises, and public speaking scenarios, are designed to trigger a diverse range of emotional responses. The final dataset consists of recordings from 65 participants, representing more than 39 hours of annotated data in total, and is one of the largest datasets for stress identification. In addition to the dataset, we provide several baseline models for stress recognition, including multimodal predictive baselines that combine video, audio, and physiological inputs and highlight the significant advantage of multimodal learning. We investigate how to build robust and reliable models for stress identification using StressID, by focusing on aspects such as learning with unevenly represented modalities.