Thesis Defense of Yu Su
Yu Su, a PhD candidate from the SYNAPSE team, will defend her thesis on November 11, 2019, in the RT Amphitheater at the Vitry-sur-Seine campus of UPEC—120 rue Paul Armangot, 94400 Vitry-sur-Seine.
Title: A bio-inspired smart perception system based on human’s cognitive auditory skills
Abstract:
Developing a machine capable of conscious perception of its environment, alongside and with humans, is one of the goals of bio-inspired artificial intelligence (BIA). The AI and BIA research communities generally agree that adding an artificial capability that produces a kind of "awareness" or "conscious" processing of information by a machine would lead to technology that is much more powerful and advanced than that based on conventional AI.
Hearing is one of the main sensory systems of the human cognitive system. The ears transform the myriad of stimuli perceived from the ambient environment into signals (nerve impulses) generated by different types of nerve cells, and this occurs at all times, even while we sleep. In fact, alongside vision, the auditory system constitutes a fundamental sense of perception in humans. Motivated by the importance of auditory complementarity in human perception and its characterization of the surrounding environment, and given the current limitations in simulating the human cognitive auditory mechanism, the primary objective of this doctoral work is to provide machines with artificial cognitive auditory capabilities that give them an augmented and adapted perception of the environment, similar to that developed in humans.
To achieve this goal, a study of the latest research covering auditory attention models, environmental sound classification techniques, deep learning-based methods, and human auditory response mechanisms was conducted to better understand the state of the art and the complexity of achieving the objectives of this doctoral work. This study highlighted the inherent shortcomings of existing techniques and directed our investigations toward modeling bio-inspired mechanisms for detecting auditory deviance. These models were associated with convolutional neural networks (CNNs) to categorize detected sounds in the environment by exploiting a knowledge-based system.
Subsequently, the work led to the implementation of a model for detecting auditory deviance using both temporal and spatial characteristics of the perceived sound (temporal and spatial domains). An approach for extracting such characteristics was proposed. Thus, these characteristics contribute to detecting deviance and auditory salience in each domain (i.e., temporal domain and spatial domain) to be combined later to enhance the detection and categorization of the perceived sound from the real environment (i.e., the final output). Experimental results demonstrate the viability of the proposed model for detecting salient deviant sounds in an audio clip, as well as the robustness and accuracy of the proposed models.
Finally, the work resulted in the development of a powerful model for detecting and characterizing environmental sounds, derived from the fusion of two 4-layer CNNs. The two types of aggregated acoustic features proposed and evaluated in Chapter 4 were used to train each CNN separately. The fusion occurs at the "softmax" values of the two CNN models. Experimental results revealed exceptional performance in detecting and classifying sound events: 97.2% obtained on the UrbanSound8K dataset, which is 4.2% higher than the most effective methods in the field.


