Skip to main content
No Access

Emotional control and visual representation using advanced audiovisual interaction

Published Online:pp 480-498

Modern interactive means combined with new digital media processing and representation technologies can provide a robust framework for enhancing user experience in multimedia entertainment systems and audiovisual artistic installations with non-traditional interaction/feedback paths based on user affective state. In this work, the ‘Elevator’ interactive audiovisual platform prototype is presented, which aims to provide a framework for signalling and expressing human behaviour related to emotions (such as anger) and finally produce a visual outcome of this behaviour, defined here as the emotional ‘thumbnail’ of the user. Optimised, real-time audio signal processing techniques are employed for monitoring the achieved anger-like behaviour, while the emotional elevation is attempted using appropriately selected combined audio/visual content reproduced using state-of-the-art audiovisual playback technologies that allow the creation of a realistic immersive audiovisual environment. The demonstration of the proposed prototype has shown that affective interaction is possible, allowing the further development of relative artistic and technological applications.


affective interaction, voice anger detection, audio and emotions, emotional control, audiovisual arts


  • 1. Alves, V. , Roque, L. (2009). ‘A proposal of soundscape design guidelines for user experience enrichment’. Proceedings of the AudioMostly 2009 Conference on Interaction with Sound. Glasgow, UK Google Scholar
  • 2. Baraldi, F.B. , Poli, G.D. , Roda, A. (2006). ‘Communicating expressive intentions with a single piano note’. Journal of New Music Research. 35, 3, 197-210 Google Scholar
  • 3. Berkhout, A.J. , Vogel, P. , Vries, D. (1992). ‘Use of wave field synthesis for natural reinforced sound’. Proceedings of the Audio Engineering Society 92nd Convention. preprint 3299 Google Scholar
  • 4. Birchfield, D. , Lorig, D. , Phillips, K. (2005). ‘Network Dynamics in Sustainable: a robotic sound installation’. Organised Sound. 10, 267-274 Google Scholar
  • 5. Birchfield, D. , Phillips, K. , Kidané, A. , Lorig, D. (2006). ‘Interactive Public Sound Art: a case study’. Proceedings of the 2006 International Conference on New Interfaces for Musical Expression (NIME06). Paris, France Google Scholar
  • 6. Borchert, M. , Dusterhoft, A. (2005). ‘Emotions in speech – experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments’. Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. 147-151 Google Scholar
  • 7. Boxer, S. (2005). ‘Art that puts you in the picture, like it or not’. New York Times. Google Scholar
  • 8. Brandenburg, K. , Bosi, M. (1997). ‘ISO/IEC MPEG-2 advanced audio coding: overview and applications’. Proceedings of the Audio Engineering Society 103rd Convention. New York, preprint 4641 Google Scholar
  • 9. Davis, M. (1993). ‘The AC-3 multichannel coder’. Proceedings of the Audio Engineering Society 95th Convention. New York, preprint 3774 Google Scholar
  • 10. Feng, Y. , Chang, E. , Xu, Y. , Shum, H.Y. (2001). ‘Emotion detection from speech to enrich multimedia content’. Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing. 550-557 Google Scholar
  • 11. Friberg, A. (2008). ‘Digital audio emotions: an overview of computer analysis and synthesis of emotional expression in music’. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08). Espoo, Finland Google Scholar
  • 12. Gabrielsson, A. , Lindström, E. , Juslin, P.N. Sloboda, J.A. (2001). ‘The influence of musical structure on emotional expression’. Music and Emotion: Theory and Research. New York:Oxford University Press , 223-248 Google Scholar
  • 13. Gundlach, R. (1935). ‘Factors determining the characterization of musical phrases’. American Journal of Psychology. 47, 4, 624-643 Google Scholar
  • 14. Hevner, K. (1936). ‘Experimental studies of the elements of expression in music’. American Journal of Psychology. 48, 246-286 Google Scholar
  • 15. Housain, G. , Thompson, W.F. , Schellenberg, E.G. (2002). ‘Effects of musical tempo and mode on arousal, mood, and spatial abilities’. Music Perception. 20, 2, 151-171 Google Scholar
  • 16. Juslin, P.N. (1997). ‘Perceived emotional expression in synthesized performances of a short melody: capturing the listener’s judgment policy’. Musicae Scientiae. 1, 2, 225-256 Google Scholar
  • 17. Juslin, P.N. , Laukka, J. (2003). ‘Communication of emotions in vocal expression and music performance: different channels, same code?’. Psychological Bulletin. 129, 5, 770-814 Google Scholar
  • 18. Kienast, M. , Sendlmeier, W.F. (2000). ‘Acoustical analysis of spectral and temporal changes in emotional speech’. Proceedings of the ISCA ITRW on Speech and Emotion. Newcastle, 92-97 Google Scholar
  • 19. Korba, C.A.M. , Messadeg, D. , Djemili, R. , Bourouba, H. (2008). ‘Robust speech recognition using perceptual wavelet denoising and Mel-frequency product spectrum cepstral coefficients features’. Informatica. 32, 3, 283-288 Google Scholar
  • 20. Moller, H. , Sorensen, M. , Hammershoi, D. , Jensen, C. (1995). ‘Head-related transfer functions of human subjects’. Journal of the Audio Engineering Society. 43, 5, 300-321 Google Scholar
  • 21. Nakatsu, R. , Nicholson, J. , Tosa, N. (1999). ‘Emotion recognition and its application to computer agents with spontaneous interactive capabilities’. Proceedings of the 3rd Conference on Creativity and Cognition. Loughborough, 135-143 Google Scholar
  • 22. Oliveira, A.P. , Cardoso, A. (2008). ‘Emotionally-controlled music synthesis’. Proceedings of the 10th Regional Conference of AES Portugal. Lisboa Google Scholar
  • 23. Oudeyer, P.Y. (2003). ‘The production and recognition of emotions in speech: features and algorithms’. Int. J. Human–Computer Studies. 59, 157-183 Google Scholar
  • 24. Peter, C. , Herbon, A. (2006). ‘Emotion representation and physiology assignments in digital systems’. Interacting with Computers. 18, 2, 139-170 Google Scholar
  • 25. Potamianos, G. , Potamianos, A. (1999). ‘Speaker adaptation for audio-visual speech recognition’. Proceedings of Eurospeech. 3, 1291-1294 Google Scholar
  • 26. Razak, A. , Yusof, M.H. , Komiya, R. (2003). ‘Towards automatic recognition of emotion in speech’. Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology. 548-551 Google Scholar
  • 27. Russell, J. (1980). ‘A circumplex model of affect’. Journal of Personality and Social Psychology. 39, 1161-1178 Google Scholar
  • 28. Scherer, K.R. (2004). ‘Which emotions can be induced by music? What are the underlying mechanisms? And how can we measure them?’. Journal of New Music Research. 33, 3, 239-251 Google Scholar
  • 29. Tsakostas, C. , Floros, A. (2007a). ‘Optimized binaural modelling for immersive audio applications’. Proceedings of the Audio Engineering Society 122th Convention. Vienna, preprint 7100 Google Scholar
  • 30. Tsakostas, C. , Floros, A. (2007b). ‘Real-time spatial representation of moving sound sources’. Proceedings of the Audio Engineering Society 123rd Convention. New York, preprint 7279 Google Scholar
  • 31. Viste, H. , Evangelista, G. (2004). ‘Binaural source localization’. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx’04). 145-150 Google Scholar
  • 32. Vogt, T. , Andre, E. , Nikolaus, B. (2008). ‘EmoVoice – A framework for online recognition of emotions from voice’. Proceedings of 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems. Kloster Irsee, Germany Google Scholar
  • 33. Wallis, I. , Ingalls, T. , Campana, E. (2008). ‘Computer generating emotional music: the design of an affective music algorithm’. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08). Espoo, Finland Google Scholar
  • 34. Ward, A.B. , Elko, G.W. (1999). ‘Effect of loudspeaker position on the robustness of acoustic crosstalk cancellation’. IEEE Signal Processing Letters. 6, 5, 106-108 Google Scholar

Additional References