T03: Deep Learning for Multimodal and Multisensorial Interaction

Friday, 26 July 2019, 08:30 – 12:30
Back to Tutorials' Program


Nicholas Cummins (short bio)

ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany


Björn W. Schuller (short bio)

ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany
GLAM – Group on Language, Audio & Music, Imperial College London, UK



The objective of this tutorial is to introduce recent methods of deep learning for optimal and efficient fusion, processing, analysis, and synthesis of multimodal and multisensorial interaction data for the next generation's intelligent interfaces.

Current Human-Computer Interaction is becoming increasingly multimodal and multisensorial with the present increasing spread of everyday speech interaction, video and depth information-based interaction, and even physiological sensor data analysis becoming a daily standard such as in current smart watches next to more traditional and modern haptical interaction. The vast amount of such collected interaction data can currently best be exploited by methods of deep learning.


Content and benefits:

In this tutorial, methods for optimal deep fusion, analysis and synthesis of such data for tomorrow's intelligent interaction are presented. This includes deep fusion on various early via intermediate to late levels. Further, participants are introduced to coping with asynchronous cross-sensorial and cross-modal data fusion by deep algorithms such as Deep Canonical Warping to overcome one of the major obstacles in this field. A major focus is then put on unsupervised representation learning such as in an end-to-end manner from raw sensor signals. This includes the exploitation of convolutional and recurrent network topologies with memory to best handle the typical interaction time series data. Alongside signal-type data, also handling of symbolic information such as text or events are dealt with.

Participants are further introduced to Automatic Machine Learning allowing for deep networks to self-optimize in such context. Likewise, multimodal and multi sensorial fusion increasingly opens up also to the non-expert interface designer, as mainly labelled data is needed to set up a system ready for rich intelligent input and output processing. To allow for mobile interaction, methods of model complexity reduction on restricted hardware are further presented. Transfer learning and Generative Adversarial Models as further shown allow to cope with low availability of user data for learning from few examples as well as the generation of interaction data by deep methods.

The tutorial is based on open source toolkits to give the attendee the tools at hand needed to benefit from the above described right away. These include auDeep, End2You, and openXBOW opening up also to the non-python savvy participant. At the same time, experts in deep fusion will find interest in the latest methods presented and outlook given following the general introduction and practical parts.


Target Audience:

The target audience is the broad audience of HCI International. The tutorial introduces general principles of deep learning for a start in an introductory manner and then moves to the recent approaches for fusion by deep learning, synchronization by suited means of deep warping, and analysis and synthesis hence targeting also intermediate to advanced level participants from general HCI, intelligent interaction, to the deep learning and machine learning expert attendees.

Bio Sketches of Presenters:

Nicholas Cummins is a habilitation candidate at the ZB.D Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany, where he is involved in large European Horizon 2020 projects centered around multimodal and multi sensorial interaction and deep learning such as DE-ENIGMA, RADAR-CNS and TAPAS. His current research includes areas of behavioural signal processing with a focus on the automatic multisensory analysis and understanding of different health states. Dr. Cummins received his PhD in Electrical Engineering from UNSWAustralia in 2016. He has published regularly in the field since 2011; these papers have attracted considerable attention and citations. He did his under graduate degree at UNSW as a mature student, graduating with first class honors in 2011. He has given repeated tutorials at leading international conferences such as IEEE EMBC, Interspeech, or IEEE SAM.

Björn Schuller received the Diploma in 1999, the Doctoral degree in 2006, and the Habilitation and Adjunct Teaching Professorship in the subject area of signal processing and machine intelligence in 2012, all in electrical engineering and information technology from TUM in Munich, Germany. He is Professor of Artificial Intelligence in the Department of Computing, Imperial College London, U.K., and a Full Professor and head of the ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany as well as co-founding CEO and current CSO of the audio intelligence company audEERING. He (co-)authored five books and more than 700 publications in peer reviewed books, journals, and conference proceedings in the fields of Affective Computing, HCI, and deep learning leading to more than 20000 citations (h-index = 68). He is a Fellow of the IEEE, president-emeritus of the AAAC, and Senior Member of the ACM. His service to the community includes Editor in Chief of the IEEE Transactions on Affective Computing, past and present General Chair and Program Chair of conferences in the field such as ACM ICMI, IEEE ACII, and Interspeech. He has given 15 tutorials up to now at leading conferences such as ACII, ACM Multimedia, EMBC, ICASSP, IJCAI, Interspeech, SAM, IUI, or UMAP.