Skip to Main content Skip to Navigation

Apprentissage en ligne de signatures audiovisuelles pour la reconnaissance et le suivi de personnes au sein d’un réseau de capteurs ambiants

François-Xavier Decroix 1
1 LAAS-RAP - Équipe Robotique, Action et Perception
LAAS - Laboratoire d'analyse et d'architecture des systèmes
Abstract : The neOCampus operation, started in 2013 by Paul Sabatier University in Toulouse, aims to create a connected, innovative, intelligent and sustainable campus, by exploiting the skills of 11 laboratories and several industrial partners. These multidisciplinary skills are combined in order to improve users (students, teachers, administrative sta ) daily comfort and to reduce the ecological footprint of the campus. The intelligence we want to bring to the campus of the future requires to provide to its buildings a perception of its intern activity. Indeed, optimizing the energy resources needs a characterization of the user's activities so that the building can automatically adapt itself to it. Human activity being open to multiple levels of interpretation, our work is focused on extracting people trajectories, its more elementary component. Characterizing users activities, in terms of movement, uses data extracted from cameras and microphones distributed in a room, forming a sparse network of heterogeneous sensors. From these data, we then seek to extract audiovisual signatures and rough localizations of the people transiting through this network of sensors. While protecting person privacy, signatures must be discriminative, to distinguish a person from another one, and compact, to optimize computational costs and enables the building to adapt itself. Having regard to these constraints, the characteristics we model are the speaker's timbre, and his appearance, in terms of colorimetric distribution. The scienti c contributions of this thesis are thus at the intersection of the elds of speech processing and computer vision, by introducing new methods of fusing audio and visual signatures of individuals. To achieve this fusion, new sound source location indices as well as an audiovisual adaptation of a multi-target tracking method were introduced, representing the main contributions of this work. The thesis is structured in 4 chapters, and the rst one presents the state of the art on visual reidenti cation of persons and speaker recognition. Acoustic and visual modalities are not correlated, so two signatures are separately computed, one for video and one for audio, using existing methods in the literature. After a rst chapter dedicated to the state of the art in re-identi cation and speaker recognition methods, the details of the computation of the signatures is explored in chapter 2. The fusion of the signatures is then dealt as a problem of matching between audio and video observations, whose corresponding detections are spatially coherent and compatible. Two novel association strategies are introduced in chapter 3. Spatio-temporal coherence of the bimodal observations is then discussed in chapter 4, in a context of multi-target tracking.
Document type :
Complete list of metadata

Cited literature [165 references]  Display  Hide  Download
Contributor : Christine Fourcade <>
Submitted on : Thursday, December 6, 2018 - 2:34:09 PM
Last modification on : Thursday, June 10, 2021 - 3:07:00 AM
Long-term archiving on: : Thursday, March 7, 2019 - 1:55:12 PM


DECROIX François Xavier.pdf
Files produced by the author(s)


  • HAL Id : tel-01946899, version 1


François-Xavier Decroix. Apprentissage en ligne de signatures audiovisuelles pour la reconnaissance et le suivi de personnes au sein d’un réseau de capteurs ambiants. Automatique / Robotique. Université Toulouse 3 Paul Sabatier (UT3 Paul Sabatier), 2017. Français. ⟨tel-01946899⟩



Record views


Files downloads