Publications

Conference Papers and Journal Articles

Florian Stimberg, Alex Narest, Alessio Bazzica, Lennart Kolmodin, Pablo Barrera González, Olga Sharonova, Henrik Lundin, Thomas C. Walters (2020) “WaveNetEQ - Packet Loss Concealment with WaveRNN,” IEEE Asilomar Conference on Signals, Systems, and Computers 1-5 November 2020.
Archit Gupta, Brendan Shillingford, Yannis Assael, Thomas C. Walters (2019) “Speech bandwidth extension with WaveNet” IEEE WASPAA 2019 arXiv:1907.04927 Conference Poster
Cristina Garbacea, Aaron van den Oord, Yazhe Li, Felicia Lim, Alejandro Luebs, Oriol Vinyals, Thomas C. Walters (2019) “Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder,” ICASSP 2019 arXiv Conference Poster
Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis (2018). “Parallel WaveNet: Fast High-Fidelity Speech Synthesis,” Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3918-3926. PDF
W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters (2018). “Wavenet based low rate speech coding,” ICASSP 2018. arXiv:1712.01120
Thomas C. Walters, David A. Ross, Richard F. Lyon (2013). “The Intervalgram: An Audio Feature for Large-Scale Cover-Song Recognition”, From Sounds to Music and Emotions: 9th International Symposium, CMMR 2012, London, UK, June 19-22, 2012, Revised Selected Papers, Springer Berlin Heidelberg, pp. 197-213. PDF
Roy D. Patterson, Timothy Ives, Thomas C. Walters, Richard F. Lyon (2012). “Modelling the Distortion Produced by Cochlear Compression.” 16th International Symposium on Hearing
Richard F. Lyon, Martin Rehn, Samy Bengio, Thomas C. Walters and Gal Chechik (2010). “Sound retrieval and ranking using auditory sparse-code representations,” Neural Computation, 22, 2390-2416. PDF
Roy D. Patterson, Thomas C. Walters, Jessica J. M. Monaghan, Etienne Gaudrain (2010). “Reviewing the Definition of Timbre as It Pertains to the Perception of Speech and Musical Sounds,” In: The Neurophysiological Bases of Auditory Perception ; proceedings of the 15th International Symposium on Hearing, edited by Lopez-Poveda E.A., Palmer A.R., Meddis R. (Springer, New York) p.223–233.
Roy D. Patterson,Thomas C. Walters, Jessica Monaghan, Christian Feldbauer, Toshio Irino (2010). “Auditory Speech Processing for Scale-Shift Covariance and its Evaluation in Automatic Speech Recognition”, IEEE International Symposium on Circuits and Systems, Paris, France. DOI:10.1109/ISCAS.2010.5537725
Martin Rehn, Richard F. Lyon, Samy Bengio, Thomas C. Walters and Gal Chechik (2009). “Sound Ranking Using Auditory Sparse-Code Representations”. International Conference on Machine Learning 2009, Workshop: Sparse Methods for Music Audio, Montréal, Canada
Richard E. Turner, Thomas C. Walters, Jessica J. M. Monaghan, and Roy D. Patterson (2009). “A statistical formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data,” Journal of the Acoustical Society of America 125(4), 2374-2386 PDF
Thomas C. Walters, Phil A. Gomersall, Richard E. Turner and Roy D. Patterson (2008). “Comparison of relative and absolute judgments of speaker size based on vowel sounds,” Proceedings of Meetings on Acoustics 050003 1
Roy D. Patterson, Thomas C. Walters and Jessica J.M. Monaghan (2008). “Revising the definition of timbre to make it useful for speech and musical sounds”. British Society of Audiology short papers meeting, York, UK
Jessica J. M. Monaghan, Christian Feldbauer, Thomas C. Walters and Roy D. Patterson (2008). “Low-Dimensional, Auditory Feature Vectors that Improve VTL Normalization in Automatic Speech Recognition”. Journal of the Acoustical Society of America 123, 3066
Toshio Irino, Tom C. Walters and Roy D. Patterson (2007). “A computational auditory model with a nonlinear cochlea and acoustic scale normalization,” 19th International Congress on Acoustics, Madrid PDF
David R. R. Smith, Thomas C. Walters and Roy D. Patterson (2007). “Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled,” Journal of the Acoustical Society of America 122(6), 3628-3639 PDF
David R. R. Smith, Thomas C. Walters and Roy D. Patterson (2007). “Judging sex and age: Effect of glottal-pulse rate, vocal-tract length and original talker,” 19th International Congress on Acoustics, Madrid PDF
Thomas C. Walters, Phil A. Gomersall, Richard Turner and Roy D. Patterson (2007). “Comparison of relative and absolute judgments of speaker size based on vowel sounds,” The Journal of the Acoustical Society of America 121(5), 3119
Roy D. Patterson, Thomas C. Walters and Toshio Irino (2005). “Extracting a carrier-independent version of the syllabic message: The principles” The Journal of the Acoustical Society of America 117(4), 2373

Patents (Applied for and / or granted)

Ioannis Alexandros Assael, Thomas Chadwick Walters, Archit Gupta, Brendan Shillingford (2020). Bandwidth extension of incoming data using neural networks..
Cristina Garbacea, Aaron Gerard Antonius van den Oord, Yazhe Li, Sze Chie Lim, Alejandro Luebs, Oriol Vinyals, Thomas Chadwick Walters (2020). Speech Coding Using Discrete Latent Representations.
Benjamin Kenneth Coppin, Mustafa Suleyman, Thomas Chadwick Walters, Timothy Mann, Chia-Yueh Carlton Chu, Martin Szummer, Luis Carlos Cobo Rus, Jean-Francois Crespo (2018). Selecting content items using reinforcement learning.
Luciano Sbaiz, Jay Yagnik, King Hong Thomas Leung, Hanna Pasula, Thomas Chadwick Walters, Thomas Bugnon, Matthias Rochus Konrad (2018). Automatic learning of a video matching system.
Thomas C. Walters, Douglas Eck, Ryan M. Rifkin (2017). Generating a playlist based on input acoustic information.
Thomas Chadwick Walters, Gertjan Pieter Halkes, Matthias Rochus Konrad, Gheorghe Postelnicu (2017). Audio and video matching using a hybrid of fingerprinting and content based classification.
Richard Francis Lyon, Ron Weiss, Thomas Chadwick Walters (2016). Systems and methods facilitating selective removal of content from a mixed audio recording.
Christopher Russell LaRosa, Sam Kvaalen, Thomas Chadwick Walters, Richard Francis Lyon, Robert Steven Glickstein, Rushabh Ashok Doshi, Molly Castle Nix, Jason Matthew Toff (2015). System and method for selective removal of audio content from a mixed audio recording.
Jay Yagnik, Richard Francis Lyon, Thomas Chadwick Walters, Douglas Eck (2015). Sound representation via winner-take-all coding of auditory spectra.
Robert F. Lyon, Martin Rehn, Thomas Walters, Samy Bengio, Gal Chechik (2013). Audio classification for information retrieval using sparse features.
Geremy A. Heitz III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin (2012). Generating a playlist.
Richard F. Lyon, Thomas C. Walters, David Ross (2012). Intervalgram representation of audio for melody recognition.

Book Chapters

Steven R. Ness, Thomas Walters and Richard F. Lyon (2011). “Auditory Sparse Coding,” In: Music Data Mining, CRC Press/Chapman Hall (2011)
Roy D. Patterson, Etienne Gaudrain, and Thomas C. Walters (2010). “The perception of family and register in musical tones,” In: Music Perception. Jones, M. R., Popper, A. N., and Fay, R. R. (Eds). Springer-Verlag, New York. DOI:10.1007/978-1-4419-6114-3_2
Roy D. Patterson, David R. R. Smith, Ralph van Dinther and Thomas C. Walters (2007). “Size Information in the Production and Perception of Communication Sounds,” In: Auditory Perception of Sound Sources. Yost, W. A., Popper, A. N., and Fay, R. R. (Eds). Springer Science+Business Media, LLC, New York. Google Books

PhD Thesis

Thomas C. Walters (2011). Auditory-Based Processing of Communication Sounds. Ph.D. thesis, University of Cambridge.