3IA PhD/Postdoc Seminar #22

Published on January 26, 2023 Updated on January 27, 2023

on the February 3, 2023

from 10:30am to 12:00pm
Nice, Valrose (Laboratoire J.A Dieudonné)


10:30 - 11:00
Valerya Strizhkova (Chair of F. Bremond)
Inria, STARS team

Multi-View Video Masked Autoencoder for Emotion Recognition

Masking and reconstruction strategy is an efficient solution to self-supervised video pre-training. Video masked autoencoder (VideoMAE) has shown state-of-the-art results in action recognition both on big and small datasets. Here we expand VideoMAE, proposing a more challenging pre-task that reconstructs different views of a masked input, the Multi-View Video Masked (MVVM) strategy. As downstream task we select emotion recognition and we use MEAD as the enabling dataset, where subjects are recorded from different angulation (e.g. front, top, down, lateral etc.). The reconstruction of different views allows the model to learn more powerful representations for each frame. Even small facial expressions, visible only from some views, are encoded in the latent space. With this approach we show for the first time the end-to-end video emotion classification with the big ViT-B network. We increase the recognition of low intensity/subtle emotions of around 8%, when compared with state-of-the-art methods. Moreover, the MEAD dataset contains high and low intensity emotions which, thanks to the multi-view approach, enables fine-grained classification. The capability of classifying sub-categories is tested on a very small (200 videos) in-the-wild dataset where multiple shades of anger are represented (MFA dataset). The MVVM autoencoder is able to transfer knowledge and reach state-of-the-art emotion recognition accuracy.

11:00 - 11:30 
Lucie Cadorel (invited outside 3iA)
I3S / Inria Wimmics team

Geospatial Knowledge in Real Estate Listings : Extracting and Localizing Uncertain Spatial Information from Text.

Spatial information is found in numerous unstructured (textual) documents such as travel blogs, social media in emergencies or Real Estate advertisements, and could be very difficult to extract and localize. Usually, digital gazetteers are used to match geospatial objects to their boundaries but they might be incomplete. Indeed, humans often use spatial expressions with toponyms (e.g., "West of Nice, France", "Nearby the Promeande des Anglais"), place types instead of toponyms (e.g., "Next to the beach") or local and unofficial place names (e.g., "La Banane in Cannes") that are not found in official gazetteers. For example, Real Estate professionals often exaggerate boundaries of a place that is popular and well-reputed since the location is one of the most valuable factors of purchasing. Thus, a number of studies have proposed to enrich gazetteers by estimating and representing the vernacular places. However, only a few approaches have taken into account vague spatial expressions such as "nearby" and places without toponyms (e.g, "the university"). In our work, we propose an automatic workflow to extract spatial information from Real Estate advertisements and retrieve a location approximation of the uncertain places in order to enrich geographic gazetteers.

11:30 - 12:00

Open discussion on the two contributions

More information