Dear Organizer, this instance of indico is depracated, please use our updated indico3 instance.

INTERSPEECH 2020

from Sunday, 25 October 2020 (00:00) to Thursday, 29 October 2020 (23:59)
Shanghai International Convention Center

        : Sessions
    /     : Talks
        : Breaks
25 Oct 2020
26 Oct 2020
27 Oct 2020
28 Oct 2020
29 Oct 2020
AM
PM
16:30
Tutorial 1 (until 18:00) (room1)
16:30 Efficient and flexible implementation of machine learning for ASR and MT - Nick Rossenbach André Merboldt Ralf Schlüter Albert Zeyer Parnia Bahar   (room1)
16:30
Tutorial 2 (until 18:00) (room2)
16:30 Spoken dialogue for social robots - Tatsuya Kawahara Kristiina Jokinen   (room2)
16:30
Tutorial 3 (until 18:00) (room3)
16:30 Meta learning and its applications to human language processing - Ngoc Thang Vu Hung-yi Lee Shang-Wen Li   (room3)
16:30
Tutorial 4 (until 18:00) (room4)
16:30 Intelligibility evaluation and speech enhancement based on deep learning - Yu Tsao Fei Chen   (room4)
18:00
Coffee Break (until 18:15) ()
18:15
Tutorial 1 (until 19:45) (room1)
18:15 Efficient and flexible implementation of machine learning for ASR and MT - Nick Rossenbach Ralf Schlüter André Merboldt Parnia Bahar Albert Zeyer   (room1)
18:15
Tutorial 2 (until 19:45) (room2)
18:15 Spoken dialogue for social robots - Tatsuya Kawahara Kristiina Jokinen   (room2)
18:15
Tutorial 3 (until 19:45) (room3)
18:15 Meta learning and its applications to human language processing - Ngoc Thang Vu Hung-yi Lee Shang-Wen Li   (room3)
18:15
Tutorial 4 (until 19:45) (room4)
18:15 Intelligibility evaluation and speech enhancement based on deep learning - Fei Chen Yu Tsao   (room4)
19:45
Coffee Break (until 20:00) ()
20:00
Tutorial 5 (until 21:30) (room1)
20:00 'Speech 101' - What everyone working on spoken language processing needs to know about spoken language - Roger K. Moore   (room1)
20:00
Tutorial 6 (until 21:30) (room2)
20:00 Neural approaches to conversational information retrieval - Jianfeng Gao Paul Bennett Chenyan Xiong   (room2)
20:00
Tutorial 7 (until 21:30) (room3)
20:00 Neural models for speaker diarization in the context of speech recognition - Tae Jin Park Dimitrios Dimitriadis Kyu J. Han   (room3)
20:00
Tutorial 8 (until 21:30) (room4)
20:00 Spoken language processing for language learning and assessment - Klaus Zechner Keelan Evanini Vikram Ramanarayanan   (room4)
21:30
Coffee Break (until 21:45) ()
21:45
Tutorial 5 (until 23:15) (room1)
21:45 'Speech 101' - What everyone working on spoken language processing needs to know about spoken language - Roger K. Moore   (room1)
21:45
Tutorial 6 (until 23:15) (room2)
21:45 Neural approaches to conversational information retrieval - Jianfeng Gao Chenyan Xiong Paul Bennett   (room2)
21:45
Tutorial 7 (until 23:15) (room3)
21:45 Neural models for speaker diarization in the context of speech recognition - Tae Jin Park Dimitrios Dimitriadis Kyu J. Han   (room3)
21:45
Tutorial 8 (until 23:15) (room4)
21:45 Spoken language processing for language learning and assessment - Vikram Ramanarayanan Klaus Zechner Keelan Evanini   (room4)
17:00
Opening session Keynote 1:Janet B. Pierrehumbert, The cognitive status of simple and complex models (until 19:00) (room1)
18:00 The cognitive status of simple and complex models   (room1)
19:00
Coffee Break (until 19:15) ()
19:15
Mon-1-1 ASR neural network architectures I - Schluter Ralf Schluter (until 20:15) (room1)
19:15 Mon-1-1-1 On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition - Yu Wu (Microsoft Research Asia) Shujie Liu (Microsoft Research Asia) rui zhao (microsoft) Jinyu Li (Microsoft) Yashesh Gaur (Microsoft) Chengyi Wang (Microsoft Research Asia)   (room1)
19:15 Mon-1-1-10 Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition - Jinhwan Park (Seoul National University) Wonyong Sung (Seoul National University)   (room1)
19:15 Mon-1-1-2 SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition - Ian McLoughlin (ICT Cluster, Singapore Institute of Technology) Zhifu Gao (Alibaba Group) Ming Lei (Alibaba Group) ShiLiang Zhang (Alibaba Group)   (room1)
19:15 Mon-1-1-3 CONTEXTUAL RNN-T FOR OPEN DOMAIN ASR - mahaveer jain (facebook) Florian Metze (facebook) Geoffrey Zweig (facebook) Gil Keren (facebook) Jay Mahadeokar (facebook) Yatharth Saraf (facebook)   (room1)
19:15 Mon-1-1-4 ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition - Kyu Han (ASAPP) Joshua Shapiro (ASAPP) Tao Ma (ASAPP) Tao Lei (ASAPP) Jeremy Wohlwend (ASAPP) Jing Pan (ASAPP)   (room1)
19:15 Mon-1-1-5 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity - Jae-sun Seo (Arizona State University) Visar Berisha (Arizona State University) Jian Meng (Arizona State Unviersity) Chaitali Chakrabarti (Arizona State University) Deepak Kadetotad (Arizona State University / Starkey Hearing Technologies)   (room1)
19:15 Mon-1-1-6 BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example - Timo Lohrenz (Technische Universität Braunschweig) Tim Fingscheidt (Technische Universität Braunschweig)   (room1)
19:15 Mon-1-1-7 Relative Positional Encoding for Speech Recognition and Direct Translation - Jan Niehues (Maastricht University) Tuan Nam Nguyen (Karlsruhe Institute of Technology) Ngoc-Quan Pham (Karlsruhe Institute of Technology) Thanh-Le Ha (Karlsruhe Institute of Technology) Elizabeth Salesky (Johns Hopkins University) Alexander Waibel (Carnegie Mellon) Sebastian Stüker (Karlsruhe Institute of Technology) Thai Son Nguyen (Karlsruhe Institute of Technology)   (room1)
19:15 Mon-1-1-8 Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers - Tianyan Zhou (Microsoft) Takuya Yoshioka (Microsoft) Yashesh Gaur (Microsoft) Zhuo Chen (Microsoft) Zhong Meng (Microsoft) Xiaofei Wang (Microsoft) Naoyuki Kanda (Microsoft)   (room1)
19:15 Mon-1-1-9 Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework - Takashi Fukuda (IBM Research) Samuel Thomas (IBM Research AI)   (room1)
19:15
Mon-1-10 Speech, Language, and Multimodal Resources (until 20:15) (room10)
19:15 Mon-1-10-1 ATCSpeech: a Multilingual pilot-controller Speech Corpus from Real Air Traffic Control Environment - Yi LIN (Sichuan University) Dan Li (Southwest Air Traffic Management Bureau, Civil Aviation Administration of China) Bo Yang (Sichuan University) Zhongping Yang (Wisesoft Co. Ltd.) Bing Wang (Southwest Air Traffic Management Bureau, Civil Aviation Administration of China) Min Ruan (Southwest Air Traffic Management Bureau, Civil Aviation Administration of China) Zhengmao Chen (Sichuan University) Xianlong Tan (Southwest Air Traffic Management Bureau, Civil Aviation Administration of China) Xiping Wu (Sichuan University)   (room10)
19:15 Mon-1-10-10 FT Speech: Danish Parliament Speech Corpus - Andreas Søeborg Kirkedal (Interactions) Marija Stepanović (IT University of Copenhagen) Barbara Plank (IT University of Copenhagen)   (room10)
19:15 Mon-1-10-2 Developing an Open-Source Corpus of Yoruba Speech - Alexander Gutkin (Google) Kọ́lá Túbọ̀sún (Chevening Research Fellow at British Library) Oddur Kjartansson (Google Research) Clara Rivera (Google Research) Isin Demirsahin (Google Research)   (room10)
19:15 Mon-1-10-3 ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers - Hyun Ah Kim Hyunhoon Jung (Clova AI, NAVER Corp.) Sohee Yang (Clova AI, NAVER Corp.) Jung-Woo Ha (Clova AI, NAVER Corp.) Nako Sung (Clova AI, NAVER Corp.) Jin Gu Kang (Clova AI, NAVER Corp.) Soojin Kim (Clova AI, NAVER Corp.) Sang-Woo Lee (Clova AI, NAVER Corp.) Eunmi Kim (Clova AI, NAVER Corp.) Sunghun Kim (Clova AI, NAVER Corp.;The Hong Kong University of Science and Technology) Chan Kyu Lee (Clova AI, NAVER Corp.) Kyoungtae Doh (Clova AI, NAVER Corp.) Hyeji Kim (Clova AI, NAVER Corp.) Kihyun Nam (Hankuk University of Foreign Stuides)   (room10)
19:15 Mon-1-10-4 LAIX Corpus of Chinese Learner English Towards A Benchmark for L2 English ASR - Hui Lin (LAIX) Huan Luan (LAIX) Yanhong Wang (LAIX) Jiahong Yuan (LAIX)   (room10)
19:15 Mon-1-10-5 Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English Proficiency - Vikram Ramanarayanan (Educational Testing Service R&D)   (room10)
19:15 Mon-1-10-6 CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment - Cymie Wing-Yee Ng (The Chinese University of Hong Kong) Jiarui Wang (The Chinese University of Hong Kong) Michael Chi-Fai Tong (The Chinese University of Hong Kong) Kathy Yuet-Sheung Lee (The Chinese University of Hong Kong) Si-Ioi Ng (The Chinese University of Hong Kong) Tan Lee (The Chinese University of Hong Kong)   (room10)
19:15 Mon-1-10-7 FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics - Sami Virpioja (University of Helsinki) Juho Leinonen (Aalto University) Mittul Singh (Aalto University) Katri Leino (Aalto University) Mikko Kurimo (Aalto University)   (room10)
19:15 Mon-1-10-8 DiPCo - Dinner Party Corpus - Cirenia Huerta (Amazon) Jan Trmal (Johns Hopkins University) Maurizio Omologo (Fondazione Bruno Kessler - irst) Xuewen Luo (Amazon) Maarten Van Segbroeck (Amazon) Ksenia Kutsenko (Amazon) Roland Maas (Amazon.com) Bjorn Hoffmeister (Apple) Ahmed Zaid (Apple) Tinh Nguyen (Amazon)   (room10)
19:15 Mon-1-10-9 Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews - Alejo J Nevado-Holgado (University of Oxford) Yue Wu (University of Oxford) Terry Lyons (University of Oxford) Niall Taylor (University of Oxford) Maria Liakata (The Alan Turing Institute) Bo Wang (University of Oxford) Kate Saunders (University of Oxford)   (room10)
19:15
Mon-1-11 Language Recognition (until 20:15) (room11)
19:15 Mon-1-11-1 Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition - Denis Jouvet ((LORIA - INRIA) and Irina Illina(LORIA/INRIA) Raphaël Duroselle (Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy)   (room11)
19:15 Mon-1-11-2 The XMUSPEECH System for AP19-OLR Challenge - Miao Zhao (Xiamen University) Lin Li (Xiamen University) Yiming Zhi (Xiamen University) Qingyang Hong (Xiamen University) Jing Li (Xiamen University) Zheng Li (Xiamen University)   (room11)
19:15 Mon-1-11-3 On the Usage of Multi-feature Integration for Speaker Verification and Language Identification - Lin Li (Xiamen University) Zheng Li (Xiamen University) Qingyang Hong (Xiamen University) Miao Zhao (Xiamen University) Jing Li (Xiamen University)   (room11)
19:15 Mon-1-11-4 What does an End-to-End Dialect Identification Model Learn about Non-dialectal Information? - Shammur Absar Chowdhury (University of Trento) Ahmed Ali (Qatar Computing Research Institute) Suwon Shon (Massachusetts Institute of Technology) James Glass (Massachusetts Institute of Technology)   (room11)
19:15 Mon-1-11-5 Releasing a toolkit and comparing the performance of language embeddings across various spoken language identification datasets - Mikko Kurimo (Aalto University) Tommi Jauhiainen (University of Helsinki) Matias Lindgren (Aalto University)   (room11)
19:15 Mon-1-11-6 Learning Intonation Pattern Embeddings for Arabic Dialect Identification - Elsayed Issa (University of Arizona) Aitor Arronte Alvarez (Center for Language and Technology, University of Hawaii. Technical University of Madrid)   (room11)
19:15 Mon-1-11-7 Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages - Dietrich Klakow (dietrich.klakow@lsv.uni-saarland.de) Bernd Möbius (Saarland University) Tania Avgustinova (Saarland University) Badr Abdullah (Saarland University)   (room11)
19:15
Mon-1-2 Multi-channel speech enhancement (until 20:15) (room2)
19:15 Mon-1-2-1 Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition - Shuai Nie (NLPR, Institute of Automation, Chinese Academy of Sciences) Zhanlei Yang (Huawei Technologies) Shan Liang (NLPR, Institute of Automation, Chinese Academy of Sciences) Guanjun Li (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences,) Wenju Liu (NLPR, Institute of Automation, Chinese Academy of Sciences) Longshuai Xiao (NLPR, Institute of Automation, Chinese Academy of Sciences)   (room2)
19:15 Mon-1-2-10 A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-channel Speech Recognition in the CHiME-6 Challenge - Jia Pan (University of Science and Technology of China) Jun Du (University of Science and Technologoy of China) Lei Sun (University of Science and Technology of China) Chin-Hui Lee (Georgia Institute of Technology) Yan-Hui Tu (University of Science and Technology of China) Feng Ma (University of Science and Technology of China)   (room2)
19:15 Mon-1-2-2 Neural Spatio-Temporal Beamformer for Target Speech Separation - Shi-Xiong Zhang (Tencent AI lab) Lianwu Chen (Tencent AI lab) Meng Yu (Tencent AI lab) Jianming Liu (Tencent AI lab) Chao Weng (Tencent AI lab) Dong Yu (Tencent AI lab) YONG XU (Tencent AI lab)   (room2)
19:15 Mon-1-2-3 Online directional speech enhancement using geometrically constrained independent vector analysis - Shoji Makino (University of Tsukuba) Kazuhito Koishida (Microsoft Corporation) Li Li (University of Tsukuba)   (room2)
19:15 Mon-1-2-4 End-to-End Multi-Look Keyword Spotting - Dan Su (Tencent AI Lab) Xuan Ji (Tencent AI Lab) Meng Yu (Tencent AI Lab) Bo Wu (Tencent AI Lab) Dong Yu (Tencent AI Lab)   (room2)
19:15 Mon-1-2-5 Differential Beamforming for Uniform Circular Array with Directional Microphones - Jinwei Feng (Alibaba group) Weilong Huang (Alibaba group)   (room2)
19:15 Mon-1-2-6 Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement - Hu Hu (Georgia Institute of Technology) Chin-Hui Lee (Georgia Institute of Technology) Jun Qi (Georgia Institute of Technology) Sabato Marco Siniscalchi (University of Enna) Chao-Han Huck Yang (Georgia Institute of Technology) Yannan Wang (Tencent Corporation)   (room2)
19:15 Mon-1-2-7 An End-to-end Architecture of Online Multi-channel Speech Separation - Zhuo Chen (Microsoft, One Microsoft Way, Redmond, WA, USA) Jian Wu (Northwestern Polytechnical University) Jinyu Li (Microsoft, One Microsoft Way, Redmond, WA, USA) Yi Luo (Microsoft, One Microsoft Way, Redmond, WA, USA) Takuya Yoshioka (Microsoft, One Microsoft Way, Redmond, WA, USA) Lei Xie (School of Computer Science, Northwestern Polytechnical University) Zhili Tan (Microsoft, STCA, Beijing) Ed Lin (Microsoft, STCA)   (room2)
19:15 Mon-1-2-8 Mentoring-Reverse Mentoring for Unsupervised Multi-channel Speech Source Separation - Masahito Togami (Line Corporation) Yu Nakagome (Waseda Univ.) Tetsuji Ogawa (Waseda University) Tetsunori Kobayashi (Waseda University)   (room2)
19:15 Mon-1-2-9 Computationally efficient and versatile framework for joint optimization of blind speech separation and dereverberation - Tomohiro Nakatani (NTT Corporation) Keisuke Kinoshita (NTT) Hiroshi Sawada (NTT Corporation) Shoko Araki (NTT Communication Science Laboratories) Rintaro Ikeshita (NTT Corporation)   (room2)
19:15
Mon-1-3 Speech processing in the brain (until 20:15) (room3)
19:15 Mon-1-3-1 Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation - Youssef Hmamouche (Aix Marseille University) Laurent Prévot (Aix Marseille Université & CNRS) Magalie Ochs (LIS) Thierry Chaminade (INT, Aix Marseille Université)   (room3)
19:15 Mon-1-3-2 Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals - Gaoyan Zhang (Tianjin University) Zhuo Zhang (Tianjin University) Shuang Wu (Tianjin University) Di Zhou (Japan Advanced Institute of Science and Technology) Jianwu Dang (JAIST)   (room3)
19:15 Mon-1-3-3 Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell - Chongyuan Lian (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) Lan Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) Feiqi Zhu (Shenzhen Luohu People’s Hospital) Tianqi Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) Nan Yan (Shenzhen Institutes of Advanced Technology) Manwa Lawrence Ng (The University of Hong Kong) Mingxiao Gu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)   (room3)
19:15 Mon-1-3-4 Congruent Audiovisual Speech Enhances Cortical Envelope Tracking during Auditory Selective Attention - Zhen Fu (Peking University) Jing Chen (Peking University)   (room3)
19:15 Mon-1-3-5 Contribution of RMS-level-based speech segments to target speech decoding under noisy conditions - Fei Chen (Southern University of Science and Technology) Ed X. Wu (The University of Hong Kong) Lei Wang (Southern University of Science and Technology)   (room3)
19:15 Mon-1-3-6 Cortical Oscillatory Hierarchy for Natural Sentence Processing - Bin Zhao (Tianjin University) Masashi Unoki (JAIST) Gaoyan Zhang (Tianjin University) Jianwu Dang (JAIST)   (room3)
19:15 Mon-1-3-7 Comparing EEG analyses with different epoch alignments in an auditory lexical decision experiment - Kimberley Mulder (Center for Language Studies, Radboud University, Nijmegen) Louis ten Bosch (Radboud University Nijmegen) Lou Boves (Centre for Language and Speech Technology, Radboud University Nijmegen)   (room3)
19:15 Mon-1-3-8 Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait - Tanya Talkar (Harvard University) Anne O'Brien (Spaulding Rehabilitation Hospital) Jeffrey Palmer (MIT Lincoln Laboratory) Adam Lammert (Worcester Polytechnic Institute) Hrishikesh Rao (MIT Lincoln Laboratory) Gloria Vergara-Diaz (Spaulding Rehabilitation Hospital) Daniel Hannon (MIT Lincoln Laboratory) Douglas Sturim (MIT) Sophia Yuditskaya (MIT Lincoln Laboratory) Gregory Ciccarelli (MIT Lincoln Laboratory) Paolo Bonato (Spaulding Rehabilitation Hospital) Richard DeLaura (MIT Lincoln Laboratory) Ross Zafonte (Spaulding Rehabilitation Hospital) Thomas Quatieri (MIT Lincoln Laboratory) James Williamson (MIT Lincoln Laboratory)   (room3)
19:15
Mon-1-4 Speech Signal Representation (until 20:15) (room4)
19:15 Mon-1-4-1 Towards Learning a Universal Non-Semantic Representation of Speech - Yinnon Haviv (Google) Omry Tuval (Google) Aren Jansen (Google) Félix de Chaumont Quitry (Google) Oran Lang (Google) Ira Shavitt (Google) Ronnie Maor (Google) Dotan Emanuel (Google) Marco Tagliasacchi (Google) Joel Shor (Google)   (room4)
19:15 Mon-1-4-10 Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals - Hiroshi Saruwatari (The University of Tokyo) Yasunori Ohishi (NTT Corporation) Hirotoshi Takeuchi (University of Tokyo) Kunio Kashino (NTT Corporation)   (room4)
19:15 Mon-1-4-2 Poetic Meter Classification Using i-vector-MTF Fusion - Ben P. Babu (RIT Kottayam) Aiswarya Vinod (College of Engineering,Trivandrum) Rajeev Rajan (College of Engineering ,Trivandrum)   (room4)
19:15 Mon-1-4-3 Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism - Jinsong Zhang (Beijing Language and Culture University) Wang Dai (Beijing Language and Culture University) Yingming Gao (Institute of Acoustics and Speech Communication, Technische Universität Dresden) Yanlu Xie (Beijing Language and Culture University) Dengfeng Ke (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Binghuai Lin (MIG, Tencent Science and Technology Ltd., Beijing) Wei Wei (Beijing Language and Culture University)   (room4)
19:15 Mon-1-4-4 Automatic Analysis of Speech Prosody in Dutch - Judith Hanssen (Avans University of Applied Sciences) Aoju Chen (Utrecht University) Berit Janssen (Utrecht University) Na Hu (Utrecht University) Carlos Gussenhoven (Radboud University)   (room4)
19:15 Mon-1-4-5 Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting - Mathias Quillot (LIA - Avignon University) Adrien Gresse (LIA - Avignon University) Richard Dufour (LIA - Avignon University) Jean-Francois Bonastre (Avignon University, LIA)   (room4)
19:15 Mon-1-4-6 Enhancing formant information in spectrographic display of speech - Anand Medabalimi (IIIT Hyderabad) Bayya Yegnanarayana (International Institute of Information Technology at Hyderabad) Vishala Pannala (International Institute of Information Technology Hyderabad)   (room4)
19:15 Mon-1-4-7 Unsupervised Methods for Evaluating Speech Representations - Wei-Ning Hsu (Massachusetts Institute of Technology) James Glass (Massachusetts Institute of Technology) Michael Gump (MIT)   (room4)
19:15 Mon-1-4-8 Robust pitch regression with voiced/unvoiced classification in nonstationary noise environments - Dung Tran (Microsoft) Uros Batricevic (Microsoft) Kazuhito Koishida (Microsoft)   (room4)
19:15 Mon-1-4-9 Nonlinear ISA with Auxiliary Variables for Learning Speech Representations - Amrith Setlur (CMU) Alan W Black (Carnegie Mellon University) Barnabas Poczos (Carnegie Mellon University)   (room4)
19:15
Mon-1-5 Speech Synthesis: Neural Waveform Generation I (until 20:15) (room5)
19:15 Mon-1-5-1 Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders - Zhenhua Ling (University of Science and Technology of China) Yang Ai (University of Science and Technology of China)   (room5)
19:15 Mon-1-5-10 Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions - Dipjyoti Paul (Computer Science Department, University of Crete, Greece) Yannis Pantazis (Institute of Applied and Computational Mathematics, FORTH) Yannis Stylianou (Univ of Crete)   (room5)
19:15 Mon-1-5-11 Neural Homomorphic Vocoder - Zhijun Liu (Shanghai Jiao Tong University) Kai Yu (Shanghai Jiao Tong University) Kuan Chen (Shanghai Jiao Tong University)   (room5)
19:15 Mon-1-5-2 FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction - Shan Liu (Tencent) Ling-Hui Chen (Tencent) Heng Lu (Tencent) Zewang Zhang (Tencent) Qiao Tian (Tencent)   (room5)
19:15 Mon-1-5-3 VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network - HOON-YOUNG CHO (NCSOFT, AI Center, Speech Lab) Young-Ik Kim (Researcher) Jinhyeok Yang (NCSOFT) Junmo Lee (NCSOFT) Injung Kim (Handong Global University)   (room5)
19:15 Mon-1-5-4 Lightweight LPCNet-based Neural Vocoder with Tensor Decomposition - Hiroki Kanagawa (NTT Corporation) Yusuke Ijima (NTT corporation)   (room5)
19:15 Mon-1-5-5 WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU - Hung-yi Lee (National Taiwan University (NTU)) Po-chun Hsu (College of Electrical Engineering and Computer Science, National Taiwan University)   (room5)
19:15 Mon-1-5-6 What the future brings: investigating the impact of lookahead for incremental neural TTS - Brooke Stephenson (Université Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble and LIG, UGA, G-INP, CNRS, INRIA, Grenoble, France) Thomas Hueber (CNRS / GIPSA-lab) Laurent Girin (GIPSA-lab / University of Grenoble) Laurent Besacier (LIG)   (room5)
19:15 Mon-1-5-7 Fast and lightweight on-device TTS with Tacotron2 and LPCNet - Vladimir Kryzhanovskiy (Huawei Technologies Co. Ltd.) Denis Parkhomenko (Huawei Technologies Co. Ltd.) Vitalii Bushaev (Huawei Technologies Co. Ltd.) Vadim Popov (Huawei Technologies Co. Ltd.) Sergey Repyevsky (Huawei Technologies Co. Ltd.) Stanislav Kamenev (Huawei Technologies Co. Ltd.) Tasnima Sadekova (Huawei Technologies Co. Ltd.) Mikhail Kudinov (Huawei Technologies Co. Ltd.)   (room5)
19:15 Mon-1-5-8 Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed - Wei Song (JD AI Research) Chao Zhang (University of Cambridge) Xiaodong He (JD AI Research) Guanghui Xu (JD AI Research) Zhengchen Zhang (JD.com) Bowen Zhou (JD AI Research)   (room5)
19:15 Mon-1-5-9 Can Auditory Nerve models tell us what’s different about WaveNet vocoded speech? - Naomi Harte (Trinity College Dublin) Sébastien Le Maguer (Adapt Centre / Trinity College Dublin)   (room5)
19:15
Mon-1-7 Speaker Diarization (until 20:15) (room7)
19:15 Mon-1-7-1 End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors - Yusuke Fujita (Hitachi, Ltd.) Kenji Nagamatsu (Hitachi, Ltd.) Yawen Xue (Hitachi, Ltd.) Shinji Watanabe (Johns Hopkins University) Shota Horiguchi (Hitachi, Ltd.)   (room7)
19:15 Mon-1-7-2 Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario - Mariya Korenevskaya (STC-innovations Ltd) Tatiana Timofeeva (STC-innovations Ltd) Anton Mitrofanov (STC-innovations Ltd) Tatiana Prisyach (STC-innovations Ltd) Aleksei Romanenko (ITMO University) Ivan Podluzhny (STC-innovations Ltd) Maxim Korenevsky (Speech Technology Center) Andrei Andrusenko (ITMO University) Ivan Sorokin (STC) Aleksandr Laptev (ITMO University) Yuri Khokhlov (STC-innovations Ltd) Ivan Medennikov (STC-innovations Ltd)   (room7)
19:15 Mon-1-7-4 New advances in speaker diarization - Masayuki Suzuki (IBM Research) Ron Hoory (IBM Haifa Research Lab) Weizhong Zhu (IBM T.J. Watson Research Center) Gakuto Kurata (IBM Research) Hagai Aronowitz (IBM Research - Haifa)   (room7)
19:15 Mon-1-7-5 Self-Attentive Similarity Measurement Strategies in Speaker Diarization - Qingjian Lin (SEIT, Sun Yat-sen University) Ming Li (Duke Kunshan University) Yu Hou (Duke Kunshan University)   (room7)
19:15 Mon-1-7-6 Speaker attribution with voice profiles by graph-based semi-supervised learning - Michael Brudno (University of Toronto) Xiong Xiao (Microsoft) Jixuan Wang (University of Toronto) Frank Rudzicz (University of Toronto) Ranjani Ramamurthy (Microsoft) Jian Wu (Microsoft)   (room7)
19:15 Mon-1-7-7 Deep Self-Supervised Hierarchical Clustering for Speaker Diarization - Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012) Prachi Singh (Indian Institute of Science, Bangalore)   (room7)
19:15 Mon-1-7-8 Spot the conversation: speaker diarisation in the wild - Joon Son Chung (University of Oxford) Jaesung Huh (Naver Corporation) Triantafyllos Afouras (University of Oxford) Andrew Zisserman (University of Oxford) Arsha Nagrani (University of Oxford)   (room7)
19:15
Mon-1-8 Noise robust and distant speech recognition (until 20:15) (room8)
19:15 Mon-1-8-1 Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition - Wangyou Zhang (Shanghai Jiao Tong University) Yanmin Qian (Shanghai Jiao Tong University)   (room8)
19:15 Mon-1-8-10 Simulating realistically-spatialised simultaneous speech using video-driven speaker detection and the CHiME-5 dataset - Jack Deadman (University of Sheffield) Jon Barker (University of Sheffield)   (room8)
19:15 Mon-1-8-2 Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition - Jiqing Han (Harbin Institute of Technology) Zhihao Du (Harbin Institute of Technology) Xueliang Zhang (Inner Mongolia University)   (room8)
19:15 Mon-1-8-3 Anti-aliasing regularization in stacking layers - Ananya Misra (Google) Antoine Bruguier (Google) Rohit Prabhavalkar (Google) Arun Narayanan (Google Inc.)   (room8)
19:15 Mon-1-8-4 Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription - Andrei Andrusenko (ITMO University) Aleksandr Laptev (ITMO University) Ivan Medennikov (STC-innovations Ltd)   (room8)
19:15 Mon-1-8-5 End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming - Yanmin Qian (Shanghai Jiao Tong University) Shinji Watanabe (Johns Hopkins University) Xuankai Chang (Johns Hopkins University) Aswin Shanmugam Subramanian (Johns Hopkins University) Wangyou Zhang (Shanghai Jiao Tong University)   (room8)
19:15 Mon-1-8-6 Quaternion Neural Networks for Multi-channel Distant Speech Recognition - Mohamed Morchid (University of Avignon) Titouan parcollet (University of Oxford) Xinchi Qiu (University of Oxford) Mirco Ravanelli (Université de Montréal) Nicholas Lane (University of Oxford)   (room8)
19:15 Mon-1-8-7 Improved Guided Source Separation Integrated with a Strong Back-end for the CHiME-6 Dinner Party Scenario - Qian Shi (Institute of Acoustics,Chinese Academy of Sciences) Zuozhen Liu (Institute of Acoustics,Chinese Academy of Sciences) Hangting Chen (Institute of Acoustics,Chinese Academy of Sciences) Pengyuan Zhang (Institute of Acoustics,Chinese Academy of Sciences)   (room8)
19:15 Mon-1-8-8 Neural Speech Separation Using Spatially Distributed Microphones - Zhuo Chen (Microsoft) Dongmei Wang (Microsoft) Takuya Yoshioka (Microsoft)   (room8)
19:15 Mon-1-8-9 Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones - Kenji Nagamatsu (Hitachi, Ltd.) Shota Horiguchi (Hitachi, Ltd.) Yusuke Fujita (Hitachi, Ltd.)   (room8)
19:15
Mon-1-9 Speech in Multimodality (MULTIMODAL) (until 20:15) (room9)
19:15 Mon-1-9-1 Toward Silent Paralinguistics: Speech-to-EMG – Retrieving Articulatory Muscle Activity from Speech - Isabel Trancoso (INESC-ID / IST Univ. Lisbon) Lorenz Diener (University of Bremen) Kevin Scheck (Cognitive Systems Lab (CSL), University of Bremen) Alberto Abad (INESC-ID/IST) Tanja Schultz (Universität Bremen) Catarina Botelho (INESC-ID/Instituto Superior Técnico, University of Lisbon, Portugal) Dennis Küster (Cognitive Systems Lab (CSL), University of Bremen) Shahin Amiriparian (University of Augsburg) Björn Schuller (University of Augsburg / Imperial College London)   (room9)
19:15 Mon-1-9-2 Multimodal Deception Detection using Automatically Extracted Acoustic, Visual, and Lexical Features - Jiaxuan Zhang (Columbia University) Julia Hirschberg (Columbia University) Sarah Ita Levitan (Columbia University)   (room9)
19:15 Mon-1-9-3 Multi-modal Attention for Speech Emotion Recognition - Zexu Pan (National University of Singapore) Zhaojie Luo (Osaka University) Haizhou Li (National University of Singapore) Jichen Yang (National University of Singapore)   (room9)
19:15 Mon-1-9-4 WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition - Guang Shen (Harbin Engineering University) Rui Chen (Harbin Engineering University) Kejia Zhang (Harbin Engineering University) Qilong Han (Harbin Engineering University) Yu Zhang (Southern University of Science and Technology) Riwei Lai (Harbin Engineering University) Hongtao Song (Harbin Engineering University)   (room9)
19:15 Mon-1-9-5 A Multi-scale Fusion Framework for Bimodal Speech Emotion Recognition - Ming Chen (Zhejiang University) Xudong Zhao (Hithink RoyalFlush Information Network Co., Ltd.)   (room9)
19:15 Mon-1-9-6 Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition - Helen Meng (The Chinese University of Hong Kong) Kun Li (SpeechX Limited) Pengfei Liu (SpeechX Limited)   (room9)
19:15 Mon-1-9-7 Multi-modal embeddings using multi-task learning for emotion recognition - Srinivas Parthasarathy (Amazon) Aparna Khare (Amazon.com) Shiva Sundaram (Amazon)   (room9)
19:15 Mon-1-9-8 Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network - Jeng-Lin Li (Department of Electrical Engineering, National Tsing Hua University) Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)   (room9)
19:15 Mon-1-9-9 Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition - Zheng Lian (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Rongjun Li (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Zhanlei Yang (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Jian Huang (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Bin Liu (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)   (room9)
19:15
Mon-S&T 1 Speech processing and analysis Mon-S&T 2 Speech annotation and speech assessment (until 21:30) (room12)
19:15 A Dynamic 3D Pronunciation Teaching Model based on Pronunciation Attributes and Anatomy - Yanlu Xie (Beijing Language and Culture University) Xiaoli Feng (Beijing Language and Culture University,Yunfan Hailiang (Beijing) technology co., LTD) Yayue Deng (Beijing Language and Culture University) Boxue Li (Yunfan Hailiang (Beijing) technology co., LTD)   (room12)
19:15 A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback - Jinsong Zhang (Advanced Innovation Center for Language Resource and Intelligence, Beijing Language and Culture University) Xiaoli Feng (Advanced Innovation Center for Language Resource and Intelligence, Beijing Language and Culture University) Yujia Jin (Advanced Innovation Center for Language Resource and Intelligence, Beijing Language and Culture University) Yanlu Xie (Advanced Innovation Center for Language Resource and Intelligence, Beijing Language and Culture University) Boxue Li (Yunfan Hailiang (Beijing) Technology co., LTD)   (room12)
19:15 CATOTRON–A Neural Text-to-Speech System in Catalan - Alp Öktem ( Col·lectivaT) Alex Peiro ́-Lilja (Universitat Pompeu Fabra) Mireia Farru ́s (Universitat Pompeu Fabra) Santiago Pascual (Universitat Polite`cnica de Catalunya) Baybars Ku ̈lebi (Col·lectivaT)   (room12)
19:15 Computer-Assisted Language Learning System: Automatic Speech Evaluation for Singapore an Children Learning Malay and Tamil - Siti Umairah Md Salleh (Institute for Infocomm Research, A*STAR, Singapore) Thai Ngoc Thuy Huong Helen (Institute for Infocomm Research, A*STAR, Singapore) Ke Shi (Institute for Infocomm Research, A*STAR, Singapore) Nur Farah Ain Binte Suhaimi (Institute for Infocomm Research, A*STAR, Singapore) Nancy F. Chen (Institute for Infocomm Research, A*STAR, Singapore) Richeng Duan (Institute for Infocomm Research, A*STAR, Singapore) Kye Min Tan (Institute for Infocomm Research, A*STAR, Singapore) Rajan s/o Vellu (Institute for Infocomm Research, A*STAR, Singapore)   (room12)
19:15 End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge - Zixiong Su (The University of Tokyo) Takaaki Saeki (The University of Tokyo) Naoki Kimura (The University of Tokyo)   (room12)
19:15 ICE-Talk: an Interface for a Controllable Expressive Talking Machine - Kevin El Haddad (Numediart Institute, University of Mons) Noe ́ Tits (Numediart Institute, University of Mons) Thierry Dutoit (Numediart Institute, University of Mons)   (room12)
19:15 Kaldi-web: An installation-free, on-device speech recognition system - Emmanuel Vincent (Universite ́ de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France) Laurent Pierron (Universite ́ de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France) Mathieu Hu (Universite ́ de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France) Denis Jouvet (Universite ́ de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France)   (room12)
19:15 Rapid Enhancement of NLP systems by Acquisition of Data in Correlated Domains - Mayuresh Sanjay Oak (Samsung Research and Development Institute) Tejas Udayakumar (Samsung Research and Development Institute) Kinnera Saranu (Samsung Research and Development Institute) Ajit Ashok Saunshikar (Samsung Research and Development Institute) Sandip Shriram Bapat (Samsung Research and Development Institute)   (room12)
19:15 Real-time, full-band, online DNN-based voice conversion system using a single CPU - Shinnnosuke Takamichi (Graduate School of Information Science and Technology, The University of Tokyo, Japan) Takaaki Saeki (Graduate School of Information Science and Technology, The University of Tokyo, Japan.) Yuki Saito (Graduate School of Information Science and Technology, The University of Tokyo, Japan.) Hiroshi Saruwatari (Graduate School of Information Science and Technology, The University of Tokyo, Japan.)   (room12)
19:15 Smart Tube: A Biofeedback System for Vocal Training and Therapy through Tube Phonation - Tatsuya Kitamura (Konan University) Naoko Kawamura (Himeji Dokkyo University) Kenta Hamada (Konan University)   (room12)
19:15 SoapBox Labs Fluency Assessment Platform for child speech - Robert O’Regan (SoapBox Labs, Dublin, Ireland) Adrian Hempel (SoapBox Labs, Dublin, Ireland) Agape Deng (SoapBox Labs, Dublin, Ireland) Nicholas Parslow (SoapBox Labs, Dublin, Ireland) Amelia C. Kelly (SoapBox Labs, Dublin, Ireland) Armin Saeb (SoapBox Labs, Dublin, Ireland) Eleni Karamichali (SoapBox Labs, Dublin, Ireland) Niall Mullally (SoapBox Labs, Dublin, Ireland) Gloria Montoya Gomez (SoapBox Labs, Dublin, Ireland) Karel Vesely ́ (SoapBox Labs, Dublin, Ireland) Arnaud Letondor (SoapBox Labs, Dublin, Ireland) Qiru Zhou (SoapBox Labs, Dublin, Ireland)   (room12)
19:15 Soapbox Labs Verification Platform for child speech - Qiru Zhou (SoapBox Labs, Dublin, Ireland) Robert O’Regan (SoapBox Labs, Dublin, Ireland) Armin Saeb (SoapBox Labs, Dublin, Ireland) Arnaud Letondor (SoapBox Labs, Dublin, Ireland) Karel Vesely ́ (SoapBox Labs, Dublin, Ireland) Agape Deng (SoapBox Labs, Dublin, Ireland) Amelia C. Kelly (SoapBox Labs, Dublin, Ireland) Nicholas Parslow (SoapBox Labs, Dublin, Ireland) Eleni Karamichali   (room12)
19:15 Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog Technology - David Suendermann-Oeft (Modality.ai, Inc.) Vignesh Murali (Modality.ai, Inc.) Vikram Ramanarayanan (Modality.ai, Inc.University of California, San Francisco) Hardik Kothare (Modality.ai, Inc. University of California, San Francisco) Oliver Roesler (Modality.ai, Inc.) Doug Habberstad (Modality.ai, Inc.) Jackson Liscombe (Modality.ai, Inc.) Patrick Lange (Modality.ai, Inc.) David Pautler (Modality.ai, Inc.) Dirk Schnelle-Walka (Modality.ai, Inc.) Michael Neumann (Modality.ai, Inc.) Andrew Cornish (Modality.ai, Inc.)   (room12)
19:15 VCTUBE: A Library for Automatic Speech Data Annotation - Seong Choi (Sungkyunkwan University) Eunil Park (Sungkyunkwan University) Jeewoo Yoon (Sungkyunkwan University) Jinyoung Han (Sungkyunkwan University) Seonghee Lee (Electronics and Telecommunications Research Institute) Minsam Ko (Hanyang University) Munyoung Lee (Electronics and Telecommunications Research Institute) Seunghoon Jeong (Hanyang University) Migyeong Yang (Sungkyunkwan University)   (room12)
19:15 VoiceID on the fly: A Speaker Recognition System that Learns from Scratch - Xinxin Zhang (Department of Electrical & Computer Engineering, University of Washington, Seattle, USA) Baihan Lin (Department of Applied Mathematics, University of Washington, Seattle, USA)   (room12)
19:15
Mon-SS-1-6 Automatic Speech Recognition for Non- Native Children's Speech (until 20:15) (room6)
19:15 Mon-SS-1-6-1 Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children’s Speech - Marco Matassoni (Fondazione Bruno Kessler) Chee Wee (Ben) Leong (Educational Testing Service) Roberto Gretter (FBK) Falavigna Daniele (Fondazione Bruno Kessler) Keelan Evanini (Educational Testing Service)   (room6)
19:15 Mon-SS-1-6-2 The NTNU System at the Interspeech 2020 Non-Native Children’s Speech ASR Challenge - Fu-An Chao (National Taiwan Normal University) Tien-Hong Lo (National Taiwan Normal University) Berlin Chen (National Taiwan Normal University) Shi-Yan Weng (National Taiwan Normal Unversity)   (room6)
19:15 Mon-SS-1-6-3 Non-Native Children's Automatic Speech Recognition: the INTERSPEECH 2020 Shared Task ALTA Systems - Kate Knill (University of Cambridge) Yu Wang (University of Cambridge) Xixin Wu (University of Cambridge) Mark Gales (Cambridge University) Linlin Wang (Cambridge University Engineering Department)   (room6)
19:15 Mon-SS-1-6-4 Data augmentation using prosody and false starts to recognize non-native children's speech - Tamás Grósz (Department of Signal Processing and Acoustics, Aalto University) Mittul Singh (Aalto University) Hemant Kathania (Aalto University) Mikko Kurimo (Aalto University)   (room6)
19:15 Mon-SS-1-6-5 UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech - Renée Lu (University of New South Wales) Julien Epps (University of New South Wales) Beena Ahmed (University of New South Wales) Mostafa Shahin (University of New South Wales)   (room6)
20:15
Coffee Break (until 20:30) ()
20:30
Mon-2-1 Speech Emotion Recognition I (SER I) (until 21:30) (room1)
20:30 Mon-2-1-1 Enhancing Transferability of Black-box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models - Björn Schuller (University of Augsburg / Imperial College London) Zhao Ren (University of Augsburg) Jing Han (University of Augsburg) Nicholas Cummins (University of Augsburg)   (room1)
20:30 Mon-2-1-2 End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model - Tatsuya Kawahara (Kyoto University) Han Feng (Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan) Sei Ueno (Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan)   (room1)
20:30 Mon-2-1-3 Improving Speech Emotion Recognition Using Graph Attentive Bi-directional Gated Recurrent Unit Network - Bo-Hao Su (Department of Electrical Engineering, National Tsing Hua University) Yun-Shao Lin (Department of Electrical Engineering, National Tsing Hua University) Chun-Min Chang (Department of Electrical Engineering, National Tsing Hua University) Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)   (room1)
20:30 Mon-2-1-4 An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition - Adria Mallol-Ragolta (University of Augsburg) Nicholas Cummins (University of Augsburg) Björn Schuller (University of Augsburg / Imperial College London)   (room1)
20:30 Mon-2-1-5 Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition - Carlos Busso (The University of Texas at Dallas) Kusha Sridhar (The University of Texas at Dallas)   (room1)
20:30 Mon-2-1-6 Augmenting Generative Adversarial Networks for Speech Emotion Recognition - Sara Khalifa (Distributed Sensing Systems Group, Data61, CSIRO Australia) Siddique Latif (University of Southern Queensland Australia/Distributed Sensing Systems Group, Data61, CSIRO Australia) Raja Jurdak (Queensland University of Technology (QUT)) Björn Schuller (University of Augsburg / Imperial College London) Rajib Rana (University of Southern Queensland) Muhammad Asim (Information Technology University, Lahore)   (room1)
20:30 Mon-2-1-7 Speech Emotion Recognition ‘in the wild’ Using an Autoencoder - Haimo Zhang (University of Auckland) Vipula Dissanayake (University of Auckland) Mark Billinghurst (University of Auckland) Suranga Nanayakkara (University of Auckland)   (room1)
20:30 Mon-2-1-8 Emotion Profile Refinery for Speech Emotion Classification - P. C. Ching (The Chinese University of Hong Kong) Shuiyang Mao (The Chinese University of Hong Kong) Tan Lee (The Chinese University of Hong Kong)   (room1)
20:30 Mon-2-1-9 Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation - Yun-Shao Lin (Department of Electrical Engineering) Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University) Sung-Lin Yeh (Department of Electrical Engineering, National Tsing Hua University)   (room1)
20:30
Mon-2-10 DNN architectures for Speaker Recognition (until 21:30) (room10)
20:30 Mon-2-10-1 AutoSpeech: Neural Architecture Search for Speaker Recognition - Shaojin Ding (Texas A&M University) Xinyu Gong (Texas A&M University) Zhangyang Wang (Texas A&M University) Weiwei Zha (University of Science and Technology of China) Tianlong Chen (Texas A&M University)   (room10)
20:30 Mon-2-10-10 Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification - Xiaoyang Qu (Ping An Technology (Shenzhen) Co., Ltd.) Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd.) Jing Xiao (Ping An Technology (Shenzhen) Co., Ltd.)   (room10)
20:30 Mon-2-10-2 Densely Connected Time Delay Neural Network for Speaker Verification - Ya-Qi Yu (Nanjing University) Wu-Jun Li (Nanjing University)   (room10)
20:30 Mon-2-10-3 Phonetically-Aware Coupled Network For Short Duration Text-independent Speaker Verification - Yun Lei (Alibaba Group) Hongbin Suo (Alibaba Group) Siqi Zheng (Alibaba )   (room10)
20:30 Mon-2-10-4 Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention - Youngmoon Jung (KAIST) Hoi Rin Kim (KAIST) Myunghun Jung (KAIST) Jahyun Goo (KAIST)   (room10)
20:30 Mon-2-10-5 Vector-based attentive pooling for text-independent speaker verification - Jing Xu (Nankai University) Yanfeng Wu (Nankai University) Xiaolei Hou (Nankai University) Chenkai Guo (Nankai University) Hongcan Gao (Nankai University)   (room10)
20:30 Mon-2-10-6 self-attention encoding and pooling for speaker recognition - pooyan safari (TALP research center, BarcelonaTech) Javier Hernando (Universitat Politecnica de Catalunya) Miquel India (Universitat Politecnica de Catalunya)   (room10)
20:30 Mon-2-10-7 ARET: Aggregated Residual Extended Time-delay Neural Networks for Speaker Verification - Ruiteng Zhang (Tianjin University) Lin Zhang (Tianjin University) Junhai Xu (Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University) Wenhuan Lu (Tianjin University) Meng Liu (Tianjin University) Jianguo Wei (Tianjin University) Jiayu Jin (Tianjin University) Longbiao Wang (Tianjin University)   (room10)
20:30 Mon-2-10-8 Adversarial Separation Network for Speaker Recognition - Yunchun Zhang (Yunnan University) Kong Aik Lee (Biometrics Research Laboratories, NEC Corporation) Meng Liu (Tianjin University) Hanyi Zhang (Yunnan University) Longbiao Wang (Tianjin University) Jianguo Wei (Tianjin University)   (room10)
20:30 Mon-2-10-9 Text-Independent Speaker Verification with Dual Attention Network - Jingyu Li (The Chinese University of Hong Kong) Tan Lee (The Chinese University of Hong Kong)   (room10)
20:30
Mon-2-11 ASR model training and strategies (until 21:30) (room11)
20:30 Mon-2-11-1 Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition - Chengzhu Yu (Tencent) Chao Weng (Tencent AI Lab) Jia Cui (Tencent) Chunlei Zhang (Tencent AI Lab) Dong Yu (Tencent AI Lab)   (room11)
20:30 Mon-2-11-2 Semantic Mask for Transformer based End-to-End Speech Recognition - Shujie Liu (Microsoft Research Asia) Liang Lu (Microsoft) Jinyu Li (Microsoft) Guoli Ye (Microsoft) Yu Wu (Microsoft Research Asia) Chengyi Wang (Nankai University) Yujiao Du (Alibaba Corporation) Shuo Ren (Beihang University) Ming Zhou (microsoft research asia) Sheng Zhao (Microsoft)   (room11)
20:30 Mon-2-11-3 Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces - Frank Zhang (Facebook AI, USA) Yongqiang Wang (Facebook AI, USA) Chunxi Liu (Facebook AI, USA) Geoffrey Zweig (Facebook AI, USA) Yatharth Saraf (Facebook AI, USA) Xiaohui Zhang (Facebook AI, USA)   (room11)
20:30 Mon-2-11-4 A Federated Approach in Training Acoustic Models - Kenichi Kumatani (Amazon Inc.) Dimitrios Dimitriadis (Microsoft) Yashesh Gaur (Microsoft) Sefik Emre Eskimez (Microsoft) Robert Gmyr (Microsoft)   (room11)
20:30 Mon-2-11-5 On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data - Imran Sheikh (Inria) Irina Illina (LORIA/INRIA) Emmanuel Vincent (Inria)   (room11)
20:30 Mon-2-11-6 On Front-end Gain Invariant Modeling for Wake Word Spotting - Ming Sun (Amazon) Noah D. Stein (Amazon) Yunliang Cai (Amazon) Tao Zhang (Amazon) Shiv Vitaladevuni (Amazon) Yixin Gao (Amazon) Chieh-Chi Kao (Amazon)   (room11)
20:30 Mon-2-11-7 Unsupervised Regularization-Based Adaptive Training for Speech Recognition - Jun Du (University of Science and Technologoy of China) Fenglin Ding (University of Science and Technology of China) Wu Guo (university of science and technology of china) Bin Gu (University of Science and Technology of China) Zhenhua Ling (University of Science and Technology of China)   (room11)
20:30 Mon-2-11-8 On the Robustness and Training Dynamics of Raw Waveform Models - Steve Renals (University of Edinburgh) Erfan Loweimi (The University of Edinburgh) Peter Bell (University of Edinburgh)   (room11)
20:30 Mon-2-11-9 Iterative Pseudo-Labeling for Speech Recognition - Gabriel Synnaeve (Facebook AI Research) Jacob Kahn (Facebook AI Research) Tatiana Likhomanenko (Facebook AI Research) Qiantong Xu (Facebook) Ronan Collobert (Facebook AI Research) Awni Hannun (Facebook AI Research)   (room11)
20:30
Mon-2-2 ASR neural network architectures and training I (until 21:30) (room2)
20:30 Mon-2-2-1 FAST AND SLOW ACOUSTIC MODEL - Emilian Stoimenov (Microsoft Corp) Kshitiz Kumar (Microsoft Corp) Hosam Khalil (Microsoft Corp) Jian Wu (Microsoft Corp)   (room2)
20:30 Mon-2-2-2 Self-Distillation for Improving CTC-Transformer-based ASR Systems - Takafumi Moriya (NTT Corporation) Ryo Masumura (NTT Corporation) Takanori Ashihara (NTT Corporation) Tomohiro Tanaka (NTT Corporation) Tsubasa Ochiai (NTT Communication Science Laboratories) Hiroshi Sato (NTT media intelligent laboratory) Marc Delcroix (NTT Communication Science Laboratories) Shigeki Karita (NTT Communication Science Laboratories) Yusuke Shinohara (NTT Corporation)   (room2)
20:30 Mon-2-2-3 Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard - Kartik Audhkhasi (IBM Research) Zoltán Tüske (IBM Research) Brian Kingsbury (IBM Research) George Saon (IBM)   (room2)
20:30 Mon-2-2-4 Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection - Pedro Moreno (google inc.) Zhehuai Chen (Google) Andrew Rosenberg (Google LLC) Gary Wang (Simon Fraser University) Bhuvana Ramabhadran (Google) Yu Zhang (Google)   (room2)
20:30 Mon-2-2-5 PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR - Dan Povey (Xiaomi, Inc.) Yiwen Shao (Center for Language and Speech Processing,Johns Hopkins University) Yiming Wang (Johns Hopkins University) Sanjeev Khudanpur (Johns Hopkins University)   (room2)
20:30 Mon-2-2-6 CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency - Hongyu Xiang (Tsinghua University) Zhijian Ou (Department of Electronic Engineering, Tsinghua University) keyu An (Tsinghua University)   (room2)
20:30 Mon-2-2-7 CTC-synchronous Training for Monotonic Attention Model - Hirofumi Inaguma (Kyoto University) Masato Mimura (Kyoto University) Tatsuya Kawahara (Kyoto University)   (room2)
20:30 Mon-2-2-8 Continual Learning for Multi-Dialect Acoustic Models - Brady Houston (Amazon) Katrin Kirchhoff (Amazon)   (room2)
20:30 Mon-2-2-9 SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition - Helen Meng (The Chinese University of Hong Kong) Zhiyong Wu (Tsinghua University) Dan Su (Tencent AILab Shenzhen) Yiheng Huang (Tencent AI Lab) Xingchen Song (Tsinghua University)   (room2)
20:30
Mon-2-3 Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation (until 21:30) (room3)
20:30 Mon-2-3-1 RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications - Adriana Stan (Communications Department, Technical University of Cluj-Napoca)   (room3)
20:30 Mon-2-3-2 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer - Yanzhang He (Google) Katie Knister (Google) Francoise Beaufays (Google) Yuan Shangguan (facebook) Ian McGraw (Google)   (room3)
20:30 Mon-2-3-3 Statistical Testing on ASR Performance via Blockwise Bootstrap - Fuchun Peng (Facebook) Zhe Liu (Facebook, Inc)   (room3)
20:30 Mon-2-3-4 SENTENCE LEVEL ESTIMATION OF PSYCHOLINGUISTIC NORMS USING JOINT MULTIDIMENSIONAL ANNOTATIONS - Shrikanth Narayanan (University of Southern California) Anil Ramakrishna (Amazon)   (room3)
20:30 Mon-2-3-5 Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System - Zhi-Jie Yan (Microsoft Research Asia) Kai Fan (Alibaba Group) Boxing Chen (Alibaba) Jiayi Wang (Alibaba Group) Shiliang Zhang (Alibaba Group) Niyu Ge (IBM Research) Bo Li (Alibaba Group)   (room3)
20:30 Mon-2-3-6 Confidence measures in encoder-decoder models for speech recognition - Elisenda Bou-Balust (Vilynx) Daivid Varas (Vilynx) Issey Masuda (Vilynx) Alejandro Woodward (Universitat Politècnica de Catalunya) Clara Bonnín (Vilynx) Juan Carlos Riveiro (Vilynx)   (room3)
20:30 Mon-2-3-7 Word Error Rate Estimation Without ASR Output: e-WER2 - Ahmed Ali (Qatar Computing Research Institute) Steve Renals (University of Edinburgh)   (room3)
20:30 Mon-2-3-8 An evaluation of manual and semi-automatic laughter annotation - Bogdan Ludusan (Bielefeld University) Petra Wagner (Universität Bielefeld)   (room3)
20:30 Mon-2-3-9 Understanding Racial Disparities in Automatic Speech Recognition: the case of habitual "be" - Kevin Tang (University of Florida) Joshua Martin (University of Florida)   (room3)
20:30
Mon-2-4 Phonetics and Phonology (until 21:30) (room4)
20:30 Mon-2-4-1 Secondary phonetic cues in the production of the nasal short-a system in California English - Renee Kemp (UC Davis) Georgia Zellou (UC Davis) Rebecca Scarborough (University of Colorado)   (room4)
20:30 Mon-2-4-2 Acoustic properties of strident fricatives at the edges: implications for consonant discrimination - Maria Giavazzi (Ecole Normale Supérieure) Leo Varnet (ENS) lorenzo maselli (Scuola normale Superiore di Pisa)   (room4)
20:30 Mon-2-4-3 Processes and Consequences of Co-articulation in Mandarin V1N.(C2)V2 Context: Phonology and Phonetics - Mingqiong Luo (Shanghai International Studies University)   (room4)
20:30 Mon-2-4-4 Voicing Distinction of Obstruents in the Hangzhou Wu Chinese Dialect - Yang Yue (University of Chinese Academy of Social Sciences) Fang Hu (Institute of Linguistics, Chinese Academy of Social Sciences)   (room4)
20:30 Mon-2-4-5 The phonology and phonetics of Kaifeng Mandarin vowels - Lei Wang (East China University of Science and Technology)   (room4)
20:30 Mon-2-4-6 Microprosodic variability in plosives in German and Austrian German - Margaret Zellers (University of Kiel) Barbara Schuppler (SPSC Laboratory, Graz University of Technology)   (room4)
20:30 Mon-2-4-7 Er-suffixation in Southwestern Mandarin: An EMA and ultrasound study - Feng-fan Hsieh (National Tsing Hua University) Jing Huang (National Tsing Hua University) Yueh-chin Chang (National Tsing Hua University)   (room4)
20:30 Mon-2-4-8 Electroglottographic-Phonetic Study on Korean Phonation Induced by Tripartite Plosives in Yanbian Korean - Jinghua Zhang (Yanbian Univeristy) Yinghao Li (Yanbian Univeristy)   (room4)
20:30 Mon-2-4-9 Modeling Global Body Configurations in American Sign Language - Beck Cordes Galbraith (Sign-Speak) Nicholas Wilkins (Rochester Institute of Technology) Ifeoma Nwogu (Rochester Institute of Technology)   (room4)
20:30
Mon-2-5 Topics in ASR I (until 21:30) (room5)
20:30 Mon-2-5-1 Augmenting Turn-taking Prediction with Wearable Eye Activity During Conversation - Siyuan Chen (University of New South Wales) Hang Li (UNSW) Julien Epps (School of Electrical Engineering and Telecommunications, UNSW Australia)   (room5)
20:30 Mon-2-5-10 Focal Loss for Punctuation Prediction - Ye Bai (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Cunhang Fan (Institute of Automation, Chinese Academy of Sciences) Zhengkun Tian (Institute of Automation, Chinese Academy of Sciences) Jiangyan Yi (Institute of Automation Chinese Academy of Sciences) Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)   (room5)
20:30 Mon-2-5-2 CAM: Uninteresting Speech Detector - Peng Yang (Amazon) Belinda Zeng (Amazon) Yi Xu (Amazon) Weiyi Lu (Amazon)   (room5)
20:30 Mon-2-5-3 Mixed Case Contextual ASR Using Capitalization Masks - Pat Rondon (Google Inc.) Diamantino Caseiro (Google Inc.) Quoc-Nam Le The (Google Inc.) Petar Aleksic (Google Inc.)   (room5)
20:30 Mon-2-5-4 Speech Recognition and Multi-Speaker Diarization of Long Conversations - Henry Mao (University of California, San Diego) Shuyang Li (University of California, San Diego) Garrison Cottrell (University of California, San Diego) Julian McAuley (University of California, San Diego)   (room5)
20:30 Mon-2-5-5 Investigation of Data Augmentation Techniques for Disordered Speech Recognition - Helen Meng (Chinese University of Hong Kong) SHANSONG LIU (Chinese University of Hong Kong) Jianwei Yu (Chinese University of Hong Kong) shoukang hu (Chinese University of Hong Kong) Xurong Xie (Chinese University of Hong Kong) Xunying Liu (Chinese University of Hong Kong) Mengzhe Geng (Chinese University of Hong Kong)   (room5)
20:30 Mon-2-5-6 A Real-time Robot-based Auxiliary System for Risk Evaluation of COVID-19 Infection - Jing Xiao (Ping An Technology (Shenzhen) Co., Ltd.) Ning Cheng (Ping An Technology (Shenzhen) Co., Ltd.) Wenqi Wei (Ping An Technology (Shenzhen) Co., Ltd.) Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd.) Jiteng Ma (Ping An Technology (Shenzhen) Co., Ltd.)   (room5)
20:30 Mon-2-5-7 An Utterance Verification System for Word Naming Therapy in Aphasia - Victoria Fleming (Speech, Hearing and Phonetic Sciences, University College London) Henry Coley-Fisher (Institute of Cognitive Neuroscience, University College London) William Latham (Goldsmiths College University of London) Mark Huckvale (Speech, Hearing and Phonetic Sciences, University College London) Jenny Crinion (Institute of Cognitive Neuroscience, University College London)) David Barbera (University College London) Ian Shaw (Technical Consultant at SoftV) Emily Upton (Institute of Cognitive Neuroscience, University College London) Alexander Paul Leff (Institute of Cognitive Neuroscience, University College London)   (room5)
20:30 Mon-2-5-8 Exploiting Cross Domain Visual Feature Generation for Disordered Speech Recognition - Jianwei Yu (The Chinese University of Hong Kong) Xurong Xie (Chinese University of Hong Kong) Rongfeng Su (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.) shoukang hu (Chinese University of Hong Kong) Helen Meng (The Chinese University of Hong Kong) SHANSONG LIU (The Chinese University of Hong Kong) Xunying Liu (Chinese University of Hong Kong) Shi-Xiong ZHANG (Tencent AI Lab) Mengzhe Geng (The Chinese University of Hong Kong)   (room5)
20:30 Mon-2-5-9 Joint prediction of punctuation and disfluency in speech transcripts - Liyuan Wang (Tencent Technology Co., Ltd) Binghuai Lin (Tencent Technology Co., Ltd)   (room5)
20:30
Mon-2-7 Voice Conversion and Adaptation I (until 21:30) (room7)
20:30 Mon-2-7-1 Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning - Zhen-Hua Ling (University of Science and Technology of China) Li-Rong Dai (University of Science and Technology of China) Jing-Xuan Zhang (University of Science and Technology of China)   (room7)
20:30 Mon-2-7-2 Improving the Speaker Identity of Non-Parallel Many-to-Many VoiceConversion with Adversarial Speaker Recognition - Ricardo Gutierrez-Osuna (Texas A&M University) Guanlong Zhao (Texas A&M University) Shaojin Ding (Texas A&M University)   (room7)
20:30 Mon-2-7-3 Non-parallel Many-to-many Voice Conversion with PSR-StarGAN - Binbin Chen (vivo AI Lab) Yang Wang (vivo AI Lab) Dongxiang Xu (Nanjing University of Posts and Telecommunications) Yanping Li (Nanjing University of Posts and Telecommunications) Yan Zhang (JIT)   (room7)
20:30 Mon-2-7-4 TTS Skins: Speaker Conversion via ASR - Lior Wolf (Tel Aviv University) Adam Polyak (Facebook) Yaniv Taigman (Facebook)   (room7)
20:30 Mon-2-7-5 GAZEV: GAN-Based Zero Shot Voice Conversion over Non-parallel Speech Corpus - Bingsheng He (National University of Singapore) zining zhang (National University of Singapore) Zhenjie Zhang (Yitu)   (room7)
20:30 Mon-2-7-6 Spoken Content and Voice Factorization for Few-shot Speaker Adaptation - Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Jiangyan Yi (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Rongxiu Zhong (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Zhengqi Wen (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Ruibo Fu (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) Tao Wang (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)   (room7)
20:30 Mon-2-7-7 Unsupervised Cross-Domain Singing Voice Conversion - Yossi Adi (Facebook AI Research) Adam Polyak (Facebook) Yaniv Taigman (Facebook) Lior Wolf (Tel Aviv University)   (room7)
20:30 Mon-2-7-8 Attention-Based Speaker Embeddings for One-Shot Voice Conversion - Daisuke Saito (The University of Tokyo) Tatsuma Ishihara (GREE Inc.)   (room7)
20:30 Mon-2-7-9 Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training - Lei Xie (Northwestern Polytechnical University) Jian Cong (Northwestern Polytechnical University) Guanglu Wan (Meituan-Dianping Group) Shan Yang (Northwestern Polytechnical University) Guoqiao Yu (Meituan-Dianping Group,)   (room7)
20:30
Mon-2-8 Acoustic Event Detection (until 21:30) (room8)
20:30 Mon-2-8-1 Gated Multi-head Attention Pooling for Weakly Labelled Audio Tagging - Wenwu Wang (Center for Vision, Speech and Signal Processing, University of Surrey, UK) Sixin Hong (Peking University) Yuexian Zou (ADSPLAB, School of ECE, Peking University, Shenzhen)   (room8)
20:30 Mon-2-8-10 SpeechMix - Augmenting Deep Sound Recognition using Hidden Space Interpolations - Amit Jindal (Manipal Institute of Technology) Rajiv Ratn Shah (IIIT Delhi) Aniket Didolkar (Manipal Institute of Technology) Arijit Ghosh Chowdhury (Manipal Institute of Technology) Ramit Sawhney (Netaji Subhas Institute of Technology) Narayanan Elavathur Ranganatha (Manipal Academy of Higher Education) Di Jin (MIT)   (room8)
20:30 Mon-2-8-2 Environmental Sound Classification with Parallel Temporal-spectral Attention - Helin Wang (Peking University) Yuexian Zou (Peking University Shenzhen Graduate School) dading chong (Peking University ShenZhen Graduate School) Wenwu Wang (University of Surrey)   (room8)
20:30 Mon-2-8-3 Contrastive Predictive Coding of Audio with an Adversary - Kazuya Kawakami (DeepMind) Aaron van den Oord (DeepMind) Luyu Wang (DeepMind)   (room8)
20:30 Mon-2-8-4 Memory Controlled Sequential Self Attention for Sound Recognition - Vinod Subramanian (Queen Mary University of London) Helen L. Bear (Queen Mary University of London) Arjun Pankajakshan (Queen Mary University of London) Emmanouil Benetos (Queen Mary University of London)   (room8)
20:30 Mon-2-8-5 Dual Stage Learning based Dynamic Time-Frequency Mask Generation for Audio Event Classification - David Han (US Army Research Laboratory) Jaihyun Park (Korea University) Donghyeon Kim (Korea university) Hanseok Ko (Korea University)   (room8)
20:30 Mon-2-8-6 An Effective Perturbation based Semi-Supervised Learning Method for Sound Event Detection - Yan Song (National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China) Jie Yan (National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China) Lin Liu (iFLYTEK Research, iFLYTEK CO., LTD, Hefei) Li-Rong Dai (National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China) Ian McLoughlin (ICT Cluster, Singapore Institute of Technology) Xu Zheng (National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China)   (room8)
20:30 Mon-2-8-7 A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling - Ming Sun (Amazon.com) Bowen Shi (Toyota Technological Institute at Chicago) Chao Wang (Amazon.com) Chieh-Chi Kao (Amazon.com)   (room8)
20:30 Mon-2-8-8 Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging - Ming Sun (Amazon.com) Chieh-Chi Kao (Amazon.com) Chao Wang (Amazon.com) Chun-Chieh Chang (Johns Hopkins University)   (room8)
20:30 Mon-2-8-9 Two-stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-token Connectionist Temporal Classification - Hong Kook Kim (Professor) Inyoung Park (Ph. D. Student)   (room8)
20:30
Mon-2-9 Spoken Language Understanding I (until 21:30) (room9)
20:30 Mon-2-9-1 End-to-End Neural Transformer Based Spoken Language Understanding - martin radfar (Amazon Inc) Athanasios Mouchtaris (Amazon Inc) Jimmy Kunnzmann (Amazon Inc)   (room9)
20:30 Mon-2-9-10 Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical Study - Longshaokan Wang (Amazon) Behnam Hedayatnia (Amazon) Yang Liu (Amazon) Karthik Gopalakrishnan (Amazon Alexa AI) Dilek Hakkani-Tur (Amazon Alexa AI)   (room9)
20:30 Mon-2-9-2 Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding - Zijian Zhao (Shanghai Jiao Tong University) Lu Chen (Shanghai Jiao Tong University) Su Zhu (Shanghai Jiao Tong University) Ruisheng Cao (Shanghai Jiao Tong University) Chen Liu (Shanghai Jiao Tong University) Kai Yu (Shanghai Jiao Tong University)   (room9)
20:30 Mon-2-9-3 Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces - Ariya Rastrow (Amazon.com) Milind Rao (Applied Scientist) Anirudh Raju (Amazon) Pranav Dheram (Amazon Alexa) Bach Bui (Amazon Alexa)   (room9)
20:30 Mon-2-9-4 Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning - Ngoc Thang Vu (University of Stuttgart) Pavel Denisov (University of Stuttgart)   (room9)
20:30 Mon-2-9-5 Context Dependent RNNLM for Automatic Transcription of Conversations - Sriram Ganapathy (Indian Institute of Science, Bangalore, India,) Srikanth Raj Chetupalli (Indian Institute of Science, Bangalore)   (room9)
20:30 Mon-2-9-6 Improving End-to-End Speech-to-Intent Classification with Reptile - Philip John Gorinski (Huawei Noah's Ark Lab) Yusheng Tian (Huawei Noah’s Ark Lab, London)   (room9)
20:30 Mon-2-9-7 Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation - Donghyun Kwak (Search Solution Inc.) Jiwon Yoon (Department of Electrical and Computer Engineering and INMC, Seoul National University) Won Ik Cho (Department of Electrical and Computer Engineering and INMC, Seoul National University) Nam Soo Kim (Seoul National University)   (room9)
20:30 Mon-2-9-8 Towards an ASR error robust Spoken Language Understanding System - Yaroslav Nechaev (Amazon Alexa) Imre Kiss (Amazon Alexa) Weitong Ruan (Amazon Alexa) Chengwei Su (Amazon Alexa) Luoxin Chen (Amazon Alexa)   (room9)
20:30 Mon-2-9-9 End-to-End Spoken Language Understanding Without Full Transcripts - Hong-Kwang Kuo (IBM T. J. Watson Research Center) Gakuto Kurata (IBM Research) Zvi Kons (IBM Haifa research lab) Yinghui Huang (IBM) Zoltán Tüske (IBM Research) Luis Lastras (IBM Research AI) Brian Kingsbury (IBM Research) Kartik Audhkhasi (IBM Research) Ron Hoory (IBM Haifa Research Lab) Samuel Thomas (IBM Research AI)   (room9)
20:30
Mon-SS-2-6 Large-Scale Evaluation of Short- Duration Speaker Verification (SdSV) (until 21:30) (room6)
20:30 Mon-SS-2-6-1 Improving X-vector and PLDA for Text-dependent Speaker Verification - Yue Lin (NetEase Games AI Lab) Zhuxin Chen (NetEase Games AI Lab)   (room6)
20:30 Mon-SS-2-6-2 SdSV Challenge 2020: Large-Scale Evaluation of Short‐Duration Speaker Verification - Hossein Zeinali (Amirkabir University of Technology) Md Jahangir Alam (Computer Research Institute of Montreal (CRIM)) Lukas Burget (Brno University of Technology) Kong Aik Lee (Biometrics Research Laboratories, NEC Corporation)   (room6)
20:30 Mon-SS-2-6-3 The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020 - Tao Jiang (School of Informatics, Xiamen University) Lin Li (Xiamen University) Qingyang Hong (Xiamen University) Miao Zhao (School of Informatics, Xiamen University)   (room6)
20:30 Mon-SS-2-6-4 Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020 - Min Hyun Han (Seoul National University) Woo Hyun Kang (Department of Electrical and Computer Engineering and INMC, Seoul National University) Sung Hwan Mun (Seoul National University) Nam Soo Kim (Seoul National University)   (room6)
20:30 Mon-SS-2-6-5 The TalTech Systems for the Short-duration Speaker Verification Challenge 2020 - Jörgen Valk (Tallinn University of Technology) Tanel Alumäe (Tallinn University of Technology)   (room6)
20:30 Mon-SS-2-6-6 Investigation of NICT submission for short-duration speaker verification challenge 2020 - Xugang Lu (NICT) Hisashi Kawai (NICT) Peng Shen (NICT)   (room6)
20:30 Mon-SS-2-6-7 Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization - Jenthe Thienpondt (IDLab, Department of Electronics and Information Systems, Ghent University - imec, Belgium) Kris Demuynck (Ghent University) Brecht Desplanques (Ghent University - imec, IDLab, Department of Electronics and Information Systems)   (room6)
20:30 Mon-SS-2-6-8 BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020 - Ondrej Novotny (Brno University of Technology) Alicia Lozano-Diez (Brno University of Technology) Lukas Burget (Brno University of Technology) Karel Vesely (Brno University of Technology) Pavel Matejka (Brno University of Technology) Anna Silnova (Brno University of Technology) Bhargav Pulugundla (Brno University of Technology) Johan Rohdin (Brno University of Technology) Ondrej Glembek (Brno University of Technology) Oldrich Plchot (Brno University of Technology)   (room6)
20:30 Mon-SS-2-6-9 Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification - Vijay Ravi (Ph.D. Student, UCLA) Abeer Alwan (UCLA) Ruchao Fan (University of California, Los Angeles) Huanhua Lu (UCLA) Amber Afshan (University of California, Los Angeles)   (room6)
21:30
Coffee Break (until 21:45) ()
21:45
Diversity Meeting (until 22:45) (room12)
21:45
ISCA-SAC "2nd Mentoring" (until 22:45) (room6)
21:45 ISCA-SAC "2nd Mentoring"   (room6)
21:45
Mon-3-1 Cross/multi- lingual and code- switched speech recognition (until 22:45) (room1)
21:45 Mon-3-1-1 Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? - Jialu Li (University of Illinois at Urbana-Champaign) Mark Hasegawa-Johnson (University of Illinois)   (room1)
21:45 Mon-3-1-10 Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition - Yeha Lee (VUNO) Sangki Kim (VUNO) Jisung Wang (VUNO) Kim Jihwan (VUNO)   (room1)
21:45 Mon-3-1-2 Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages - Martha Yifiru Tachbelie (Addis Ababa University) Solomon Teferra Abate (Addis Ababa University) Tanja Schultz (Universität Bremen)   (room1)
21:45 Mon-3-1-3 Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning - Longfei Yang (Tokyo Institute of Technology) Takahiro Shinozaki (Tokyo Institute of Technology) Wenxin Hou (Tokyo Institute of Technology) Yue Dong (Tokyo Institute of Technology) Bairong Zhuang (Tokyo Institute of Technology) Jiatong Shi (Johns Hopkins University)   (room1)
21:45 Mon-3-1-4 Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition - Haizhou Li (National University of Singapore) Yanhua Long (Shanghai Normal University) Yijie Li (Unisound AI Technology Co., Ltd.) Emre Yilmaz (National University of Singapore) Xinyuan Zhou (Shanghai Normal University)   (room1)
21:45 Mon-3-1-5 Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages - Solomon Teferra Abate (Addis Ababa University) Martha Yifiru Tachbelie (Addis Ababa University) Tanja Schultz (Universität Bremen)   (room1)
21:45 Mon-3-1-6 Multilingual Jointly Trained Acoustic and Written Word Embeddings - Karen Livescu (TTI-Chicago) Shane Settle (Toyota Technological Institute at Chicago) Yushi Hu (University of Chicago)   (room1)
21:45 Mon-3-1-7 Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks - Chia-Yu Li (Institute of Natural Language Processing, University of Stuttgart, Germany) Ngoc Thang Vu (University of Stuttgart)   (room1)
21:45 Mon-3-1-8 Data Augmentation for Code-switch Language Modeling by Fusing Multiple Text Generation Methods - Xinhui Hu (Hithink Flush Information Network Co Ltd) Lei Yang (Hithink RoyaFlush AI Research Institute) Binbin Gu (Hithink RoyaFlush AI Research Institute) Xinkang Xu (Hithink RoyaFlush AI Research Institute) Qi Zhang (Hithink RoyaFlush AI Research Institute)   (room1)
21:45 Mon-3-1-9 A 43 Language Multilingual Punctuation Prediction Neural Network Model - Edward Lin (Microsoft) Xinxing Li (Microsoft)   (room1)
21:45
Mon-3-10 Human speech production I (until 22:45) (room10)
21:45 Mon-3-10-1 Two different mechanisms of movable mandible for vocal-tract model with flexible tongue - Takayuki Arai (Sophia University)   (room10)
21:45 Mon-3-10-2 Improve the performance of acoustic-to-articulatory inversion by dynamically removing the training loss of noncritical portions of articulatory channels - qiang fang (Phonetics Lab., Institute of Linguistics, Chinese Academy of Social Sciences)   (room10)
21:45 Mon-3-10-3 Speaker conditioned acoustic-to-articulatory inversion using x-vectors - Prasanta Ghosh (Assistant Professor, EE, IISc) Aravind Illa (PhD Student, Indian Institute of Science, Bangalore)   (room10)
21:45 Mon-3-10-4 Coarticulation as synchronised sequential target approximation: An EMA study - Zirui Liu (University College London) Yi Xu (University College London) Feng-fan Hsieh (National Tsing Hua University)   (room10)
21:45 Mon-3-10-5 Improved Model for Vocal Folds with a Polyp with Potential Application - Jônatas Santos (Federal University of Sergipe) Israel Santos (Federal University of Sergipe) Jugurta Montalvão (Federal University of Sergipe)   (room10)
21:45 Mon-3-10-6 Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker Characteristics - Lin Zhang (Tianjin University) Kiyoshi Honda (Tianjin University) Seiji Adachi (Fraunhofer Institute for Building Physics) Jianguo Wei (Tianjin University)   (room10)
21:45 Mon-3-10-7 Air-tissue boundary segmentation in real time Magnetic Resonance Imaging video using 3-D convolutional neural network - Renuka Mannem (Indian Institute of Science) Navaneetha Gaddam (Rajiv Gandhi University of Knowledge Technologies, Kadapa) Prasanta Ghosh (Assistant Professor, EE, IISc)   (room10)
21:45 Mon-3-10-8 An investigation of the virtual lip trajectories during the production of bilabial stops and nasal at different speaking rates - Tilak Purohit (International Institute of Information Technology - Bangalore (IIIT-B), Bangalore, India) Prasanta Ghosh (Assistant Professor, EE, IISc)   (room10)
21:45
Mon-3-11 Targeted Source Separation (until 22:45) (room11)
21:45 Mon-3-11-1 SpEx+: A Complete Time Domain Speaker Extraction Network - Eng Siong Chng (Nanyang Technological University) Chenglin Xu (Nanyang Technological University) Longbiao Wang (Tianjin University) Jianwu Dang (JAIST) Haizhou Li (National University of Singapore) Meng Ge (Tianjin University)   (room11)
21:45 Mon-3-11-10 Speaker-Aware Monaural Speech Separation - Kun Hu (The University of Sydney) zhiyong wang (The University of Sydney) chang xu (The University of Sydney) Duc Chung Tran (Computing Fundamental Department, FPT University) Jiahao Xu (The University of Sydney)   (room11)
21:45 Mon-3-11-2 Atss-Net: Target Speaker Separation via Attention-based Neural Network - Tingle Li (Duke Kunshan University) Yuanyuan Bao (Duke Kunshan University) Qingjian Lin (SEIT, Sun Yat-sen University) Ming Li (Duke Kunshan University)   (room11)
21:45 Mon-3-11-3 Multimodal Target Speech Separation with Voice and Face References - Stefan Wermter (Wermter) Leyuan Qu (University of Hamburg) Cornelius Weber (University of Hamburg)   (room11)
21:45 Mon-3-11-4 X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network - zhenjie zhang (Yitu) Bingsheng He (National University of Singapore) zining zhang (National University of Singapore)   (room11)
21:45 Mon-3-11-5 Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation - Yanmin Qian (Shanghai Jiao Tong University) Chenda Li (Shanghai Jiao Tong University)   (room11)
21:45 Mon-3-11-6 A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments - Jing Shi (Institute of Automation, Chinese Academy of Sciences.) Bo Xu (Institute of Automation, Chinese Academy of Science) jiaming xu (Institute of Automation, Chinese Academy of Sciences) Yunzhe Hao (Institute of Automation, Chinese Academy of Sciences) Lei Qin (Huawei Consumer Business Group) Peng Zhang (Institute of Automation, Chinese Academy of Science)   (room11)
21:45 Mon-3-11-7 Time-Domain Target-Speaker Speech Separation With Waveform-Based Speaker Embedding - Takahiro Shinozaki (Tokyo Institute of Technology) Shengzhou Gao (Tokyo Institute of Technology) jianshu zhao (Tokyo institute of Technology)   (room11)
21:45 Mon-3-11-8 Listen to What You Want: Neural Network-based Universal Sound Selector - Yuma Koizumi (NTT Media Intelligence Laboratories) Shoko Araki (NTT Communication Science Laboratories) Keisuke Kinoshita (NTT) Hiroaki Itou (NTT Media Intelligence Laboratories) Marc Delcroix (NTT Communication Science Laboratories) Tsubasa Ochiai (NTT Communication Science Laboratories)   (room11)
21:45 Mon-3-11-9 Crossmodal Sound Retrieval based on Specific Target Co-occurrence Denoted with Weak Labels - Noboru Harada (NTT Corporation) Masahiro Yasuda (NTT Corporation) Yasunori Ohishi (NTT corporation) Yuma Koizumi (NTT Media Intelligence Laboratories)   (room11)
21:45
Mon-3-2 Anti-spoofing and Liveness Detection (until 22:45) (room2)
21:45 Mon-3-2-1 Multi-Task Siamese Neural Network for Improving Replay Attack Detection - Gokhan Tur (Amazon Alexa AI) Fei Tao (Uber AI) Patrick von Platen (University of Cambridge)   (room2)
21:45 Mon-3-2-2 POCO: a Voice Spoofing and Liveness Detection Corpus based on Pop Noise - Kong Aik Lee (Biometrics Research Laboratories, NEC Corporation) Seng Pei Liew (NEC) Sakiko Mishima (Data Science Research Laboratories, NEC Corporation) Ryo Mizushima (Security Research Laboratories, NEC Corporation) Kosuke Akimoto (Data Science Research Laboratories, NEC Corporation)   (room2)
21:45 Mon-3-2-3 Dual-adversarial domain adaptation for generalized replay attack detection - Hongji Wang (Shanghai Jiao Tong University) Yanmin Qian (Shanghai Jiao Tong University) Shuai Wang (Shanghai Jiao Tong University) Heinrich Dinkel (Shanghai Jiao Tong University) Kai Yu (Shanghai Jiao Tong University)   (room2)
21:45 Mon-3-2-4 Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection - Ha-Jin Yu (University of Seoul) Hye-jin Shim (University of Seoul) Jee-weon Jung (University of Seoul) Hee-Soo Heo (School of Computer Science, University of Seoul, Korea)   (room2)
21:45 Mon-3-2-5 Competency Evaluation in Voice Mimicking Using Acoustic Cues - Adarsh S. (College of Engineering,Trivandrum) Rajeev Rajan (College of Engineering,Trivandrum) Abhijith G. (College of Engineering,Trivandrum) Akshay Prasannan (College of Engineering,Trivandrum)   (room2)
21:45 Mon-3-2-6 Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks - Rohan Kumar Das (National University of Singapore) Zhenzong Wu (National University of Singapore) Haizhou Li (National University of Singapore) Jichen Yang (National University of Singapore)   (room2)
21:45 Mon-3-2-7 Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers - Hemlata Tak (EURECOM) Massimiliano Todisco (EURECOM - School of Engineering & Research Center - Digital Security Department) Nicholas Evans (EURECOM) Andreas Nautsch (EURECOM) Jose Patino (EURECOM)   (room2)
21:45 Mon-3-2-8 Investigating Light-ResNet Architecture for Spoofing Detection under Mismatched Conditions - Gajan Suthokumar (he University of New South Wales) Prasanth Parasu (The University of New South Wales) Kaavya Sriskandaraja (The University of New South Wales) Julien Epps (School of Electrical Engineering and Telecommunications, UNSW Australia)   (room2)
21:45 Mon-3-2-9 Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection - Changhong Liu (School of Computer and Information Engineering, Jiangxi Normal University,) Zhenchun Lei (School of Computer and Information Engineering, Jiangxi Normal University, Nanchang) Jihua Ye (School of Computer and Information Engineering, Jiangxi Normal University) Yingen Yang (School of Computer and Information Engineering, Jiangxi Normal University, Nanchang)   (room2)
21:45
Mon-3-3 Noise reduction and intelligibility (until 22:45) (room3)
21:45 Mon-3-3-1 Lightweight Online Noise Reduction on Embedded Devices using Hierarchical Recurrent Neural Networks - Tobias Rosenkranz (Sivantos GmbH, Erlangen) Alberto N. Escalante Banuelos (Sivantos GmbH, Erlangen) Pascal Zobel (Friedrich-Alexander-Universität Erlangen-Nürnberg) Andreas Maier (Friedrich-Alexander-Universität Erlangen-Nürnberg) Hendrik Schröeter (Friedrich-Alexander-Universität Erlangen-Nürnberg)   (room3)
21:45 Mon-3-3-10 Large scale evaluation of importance maps in automatic speech recognition - Michael Mandel (Brooklyn College, CUNY) Viet Anh Trinh (The Graduate Center, CUNY, New York, USA)   (room3)
21:45 Mon-3-3-2 SEANet: A Multi-modal Speech Enhancement Network - Karolis Misiunas (Google Research) Dominik Roblek (Google Research) Yunpeng Li (Google Research) Marco Tagliasacchi (Google Research)   (room3)
21:45 Mon-3-3-3 Lite Audio-Visual Speech Enhancement - Yu Tsao (Academia Sinica) Shang-Yi Chuang (Academia Sinica) Chen-Chou Lo (Academia Sinica) Hsin-Min Wang (Academia Sinica)   (room3)
21:45 Mon-3-3-4 ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication - Simeon Smeele (Max Planck Institute of Animal Behavior, Cognitive and Cultural Ecology Lab and Max Planck Institute for Evolutionary Anthropology, Department for Human Behavior, Ecology and Culture) Christian Bergler (Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab) Volker Barth (Anthro-Media) Elmar Nöth (Friedrich-Alexander-University Erlangen-Nuremberg) Manuel Schmitt (Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab) Andreas Maier (University Erlangen-Nuremberg)   (room3)
21:45 Mon-3-3-5 A Deep Learning Approach to Active Noise Control - Hao Zhang (The Ohio State University, USA) DeLiang Wang (Ohio State University)   (room3)
21:45 Mon-3-3-6 Improving Speech Intelligibility through Speaker Dependent and Independent Spectral Style Conversion - Alexander Kain (OHSU) Kris Tjaden (University at Buffalo) Tuan Dinh (OHSU)   (room3)
21:45 Mon-3-3-7 End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks - Søren Holdt Jensen (Aalborg University, Denmark) Mathias Bach Pedersen (Aalborg University, Denmark) Morten Kolbæk (Aalborg University, Denmark) Asger Heidemann Andersen (Oticon A/S) Jesper Jensen (Oticon A/S and Aalborg University)   (room3)
21:45 Mon-3-3-8 Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System - Atsunori Ogawa (NTT Communication Science Laboratories) Keisuke Kinoshita (NTT) Kenichi Arai (NTT Communication SCience Laboratories) Toshio Irino (Wakayama University) Tomohiro Nakatani (NTT Corporation) Shoko Araki (NTT Communication Science Laboratories)   (room3)
21:45 Mon-3-3-9 Automatic Estimation of Inteligibility Measure for Consonants in Speech - Ali Abavisani (University of Illinois Urbana-Champaign) Mark Hasegawa-Johnson (University of Illinois)   (room3)
21:45
Mon-3-4 Acoustic Scene Classification (until 22:45) (room4)
21:45 Mon-3-4-1 Neural Architecture Search on Acoustic Scene Classification - Chuming Liang (Xiaomi) Xiangxiang Chu (Xiaomi) Bo Zhang (Xiaomi) Jixiang Li (Xiaomi) Fei Xiang (Xiaomi) Zhao Wang (Xiaomi)   (room4)
21:45 Mon-3-4-10 Deep Learning Based Open Set Acoustic Scene Classification - Michał Kośmider (Samsung R&D Institute Poland) Beniamin Kalinowski (Samsung R&D Institute Poland) Zuzanna Kwiatkowska (Samsung R&D Institute Poland) Krzysztof Rykaczewski (Samsung R&D Institute Poland)   (room4)
21:45 Mon-3-4-2 Acoustic Scene Classification using Audio Tagging - Ju-ho Kim (University of Seoul) Hye-jin Shim (University of Seoul) Ha-Jin Yu (University of Seoul) Seung-bin Kim (University of Seoul) Jee-weon Jung (University of Seoul)   (room4)
21:45 Mon-3-4-3 ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification - Jiqing Han (Harbin Institute of Technology) Liwen Zhang (Harbin Institute of Technology) Ziqiang Shi (Fujitsu Research and Development Center)   (room4)
21:45 Mon-3-4-4 Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network - Jivitesh Sharma (University of Agder) Morten Goodwin (University of Agder) Ole-Christoffer Granmo (University of Agder)   (room4)
21:45 Mon-3-4-5 Acoustic Scene Analysis with Multi-head Attention Networks - Weiran Wang (Amazon) Ming Sun (Amazon) Chao Wang (Amazon) Weimin Wang (Amazon)   (room4)
21:45 Mon-3-4-6 Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification - Yannan Wang (Tencent Technology (Shenzhen) Co., Ltd) Sabato Marco Siniscalchi (University of Enna Kore) Chin-Hui Lee (Georgia Institute of Technology) Hu Hu (Georgia Institute of Technology)   (room4)
21:45 Mon-3-4-7 An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances - Bai Xue (Institute of Software Chinese Academy of Sciences) Chin-Hui Lee (Georgia Institute of Technology) Hu Hu (Georgia Institute of Technology) Jun Du (University of Science and Technologoy of China) Sabato Marco Siniscalchi (University of Enna Kore) Yannan Wang (Tencent Technology (Shenzhen) Co., Ltd)   (room4)
21:45 Mon-3-4-8 Attention-Driven Projections for Soundscape Classification - Muralikrishna H (Indian Institute of Technology, Mandi) Dhanunjaya Varma Devalraju (Indian Institute of Technology, Mandi) Padmanabhan Rajan (Indian Institute of Technology, Mandi) Dileep Aroor Dinesh (Indian Institute of Technology, Mandi)   (room4)
21:45 Mon-3-4-9 Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call Detection - Robert Ewers (Imperial College London) Alexander Shiarella (Imperial College London) Panagiotis Tzirakis (Imperial College London) Björn Schuller (University of Augsburg / Imperial College London)   (room4)
21:45
Mon-3-5 Singing Voice Computing and Processing in Music (until 22:45) (room5)
21:45 Mon-3-5-1 SINGING SYNTHESIS: WITH A LITTLE HELP FROM MY ATTENTION. - Alexis Moinet (Amazon) Orazio Angelini (Amazon Research Cambridge) Thomas Drugman (Amazon) Kayoko Yanagisawa (Amazon)   (room5)
21:45 Mon-3-5-2 Peking Opera Synthesis via Duration Informed Attention Network - Chao Weng (Tencent AI Lab) Chengzhu Yu (Tencent) Yusong Wu (Beijing University of Posts and Telecommunications) Dong Yu (Tencent) Heng Lu (Tencent American) Shengchen Li (Beijing University of Posts and Telecommunications) liqiang zhang (Beijing Institute of Technology)   (room5)
21:45 Mon-3-5-3 DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System - Chunlei Zhang (Tencent AI Lab) Chengzhu Yu (Tencent) Yusong Wu (Beijing University of Posts and Telecommunications) Zijin Li (China Conservatory of Music) Dong Yu (Tencent AI Lab) Xiang Xie (Beijing Institute of Technology) liqiang zhang (Beijing Institute of Technology) Heng Lu (Tencent American) Chao Weng (Tencent)   (room5)
21:45 Mon-3-5-4 Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music - Shengchen Li (Beijing University of Posts and Telecommunications) Frank Soong (Microsoft Research Asia) Yuanbo Hou (Beijing University of Posts and Telecommunications) Jian Luan (Microsoft Search Technology Center Asia)   (room5)
21:45 Mon-3-5-5 Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music - Geng Yang (School of Computer Science, Northwestern Polytechnical University) Jian Wu (Northwestern Polytechnical University) lei xie (School of Computer Science, Northwestern Polytechnical University) Haohe Liu (Northwestern Polytechnical University)   (room5)
21:45
Mon-3-7 Acoustic model adaptation for ASR (until 22:45) (room7)
21:45 Mon-3-7-1 Continual Learning in Automatic Speech Recognition - Hynek Hermansky (JHU) Samik Sadhu (Johns Hopkins University)   (room7)
21:45 Mon-3-7-10 Frame-wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering - Kazunori Komatani (Osaka University) Ryu Takeda (Osaka University)   (room7)
21:45 Mon-3-7-2 Speaker Adaptive Training for Speech Recognition Based on Attention-over-Attention Mechanism - Zhongfu Ye (University of Science and Technology of China) Jia Pan (University of Science and Technology of China) Qingran Wang (iFlytek Research, iFlytek Co., Ltd.) Genshun Wan (University of Science and Technology of China) Jianqing Gao (iFlytek Research, iFlytek Co., Ltd.)   (room7)
21:45 Mon-3-7-3 Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator - Wenning Wei (Microsoft) Jinyu Li (Microsoft) Yan Huang (Microsoft) Yifan Gong (Microsoft) William Gale (Microsoft) Lei He (Microsoft)   (room7)
21:45 Mon-3-7-4 Speech Transformer with Speaker Aware Persistent Memory - Bin Ma (Alibaba Inc.) Yingzhu Zhao (Nanyang Technological University) Shafiq Joty (Nanyang Technological University; Salesforce AI Research) Chongjia Ni (I2R) Eng Siong Chng (Nanyang Technological University) Cheung-Chi LEUNG (Alibaba Group)   (room7)
21:45 Mon-3-7-5 Adaptive Speaker Normalization for CTC-Based Speech Recognition - Jun Du (university of science and technology of china) Bin Gu (university of science and technology of china) Zhenhua Ling (university of science and technology of china) Wu Guo (university of science and technology of china) Fenglin Ding (university of science and technology of china)   (room7)
21:45 Mon-3-7-6 Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification - Akhil Mathur (University College London) Nadia Berthouze (University College London) Nicholas D. Lane (University of Cambridge)   (room7)
21:45 Mon-3-7-7 Learning Fast Adaptation on Cross-Accented Speech Recognition - Zihan Liu (The Hong Kong University Of Science and Technology) Pascale Fung (The Hong Kong University Of Science and Technology) Samuel Cahyawijaya (HKUST) Andrea Madotto (The Hong Kong University Of Science and Technology) Genta Indra Winata (The Hong Kong University Of Science and Technology) Zhaojiang Lin (The Hong Kong University Of Science and Technology) Peng Xu (The Hong Kong University Of Science and Technology)   (room7)
21:45 Mon-3-7-8 Black-box Adaptation of ASR for Accented Speech - Preethi Jyothi (Indian Institute of Technology Bombay) Sunita Sarawagi (Indian Institute of Technology Bombay) Abhijeet Awasthi (Indian Institute of Technology Bombay) Kartik Khandelwal (Indian Institute of Technology Bombay)   (room7)
21:45 Mon-3-7-9 Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation - Denis Jouvet (LORIA - INRIA) Emmanuel Vincent (Inria) Mehmet Ali Tugtekin Turan (INRIA)   (room7)
21:45
Mon-3-8 Singing and Multimodal Synthesis (until 22:45) (room8)
21:45 Mon-3-8-1 Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer - Jie Wu (Xiaoice, Software Technology Center Asia, Microsoft) Jian Luan (Microsoft)   (room8)
21:45 Mon-3-8-2 PREDICTION OF HEAD MOTION FROM SPEECH WAVEFORMS WITH A CANONICAL-CORRELATION-CONSTRAINED AUTOENCODER - Jinhong Lu (University of Edinburgh) Hiroshi Shimodaira (University of Edinburgh)   (room8)
21:45 Mon-3-8-3 XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System - Jie Wu (Microsoft) Peiling Lu (Microsoft) Li Zhou (Microsoft) Xu Tan (Microsoft) Jian Luan (Microsoft)   (room8)
21:45 Mon-3-8-4 Stochastic Talking Face Generation Using Latent Distribution Matching - Rajesh Hegde (Indian Institute of Technology Kanpur) Vinay Namboodiri (IIT Kanpur) Ravindra Yadav (Indian Institute of Technology Kanpur) Ashish Sardana (NVIDIA)   (room8)
21:45 Mon-3-8-5 Speech-to-singing Conversion based on Boundary Equilibrium GAN - Da-Yi Wu (National Taiwan University) Yi-Hsuan Yang (Academia Sinica)   (room8)
21:45 Mon-3-8-6 Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image - Koichiro Mori (DeNA Co., Ltd) Kentaro Tachibana (DeNA Co., Ltd) Yuki Saito (The University of Tokyo) Kotaro Onishi (The University of Electro-Communications) Shunsuke Goto (The University of Tokyo)   (room8)
21:45 Mon-3-8-7 Speech Driven Talking Head Generation via Attentional Landmarks Based Representation - Li Teng (Anhui University) wang wentao (Anhui University) Wang Yan (Anhui University) Liu Qiongsong (Unisound) Liang Jiaen (Unisound) Sun Jianqing (Unisound)   (room8)
21:45
Mon-3-9 Intelligibility-enhancing Speech Modification (until 22:45) (room9)
21:45 Mon-3-9-1 Optimization and evaluation of an intelligibility-improving signal processing approach (IISPA) for the Hurricane Challenge 2.0 with FADE - Marc René Schädler (Carl von Ossietzky University Oldenburg)   (room9)
21:45 Mon-3-9-2 iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning - Junichi Yamagishi (National Institute of Informatics) Szu-wei Fu (Research Center for Information Technology Innovation, Academia Sinica) Yu Tsao (Academia Sinica) Haoyu Li (National Institute of Informatics)   (room9)
21:45 Mon-3-9-3 Intelligibility-enhancing speech modifications – The Hurricane Challenge 2.0 - Jan Rennies (Fraunhofer IDMT, Hearing, Speech and Audio Technology) Henning Schepker (University of Oldenburg, Signal Processing Group, Oldenburg) Cassia Valentini-Botinhao (The Centre for Speech Technology Research, University of Edinburgh) Martin Cooke (Basque Foundation for Science, Bilbao)   (room9)
21:45 Mon-3-9-4 Exploring listeners' speech rate preferences - Martin Cooke (Ikerbasque) Olympia Simantiraki (Language and Speech Laboratory, Universidad del Pais Vasco)   (room9)
21:45 Mon-3-9-5 Adaptive compressive onset-enhancement for improved speech intelligibility in noise and reverberation - Simon Doclo (Signal Processing Group, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg) Jörg Bitzer (Institute of Hearing Technology and Audiology (IHA), Jade-University of Applied Sciences Wilhelmshaven / Oldenburg / Elsfleth) Felicitas Bederna (Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, and Cluster of Excellence Hearing4all, Oldenburg) Christian Rollwage (Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, and Cluster of Excellence Hearing4all, Oldenburg) Henning Schepker (Signal Processing Group, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg) Arne Pusch (Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, and Cluster of Excellence Hearing4all, Oldenburg) Jan Rennies (Fraunhofer IDMT, Hearing, Speech and Audio Technology)   (room9)
21:45 Mon-3-9-6 A Sound Engineering Approach to Near End Listening Enhancement - Simon King (University of Edinburgh) Carol Chermaz (The Centre for Speech Technology Research, The University of Edinburgh)   (room9)
21:45 Mon-3-9-7 Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion - Muhammed Shifas PV (Speech Signal Processing Lab, University of Crete) Yannis Pantazis (Institute of Applied and Computational Mathematics, FORTH) Yannis Stylianou (Univ of Crete) Dipjyoti Paul (Computer Science Department, University of Crete, Greece)   (room9)
18:00
Keynote 2 (until 19:00) (room1)
18:00 Brain networks enabling speech perception in everyday settings - Barbara Shinn-Cunningham   (room1)
19:00
Coffee Break (until 19:15) ()
19:15
Tue-1-1 Speech Translation and multilingual/multimodal learning (until 20:15) (room1)
19:15 Tue-1-1-1 A DNN-HMM-DNN Hybrid Model for Discovering Word-like Units from Spoken Captions and Image Regions - Liming Wang (University of Illinois, Urbana Champaign) Mark Hasegawa-Johnson (University of Illinois)   (room1)
19:15 Tue-1-1-2 Efficient Wait-k Models for Simultaneous Machine Translation - Laurent Besacier (LIG) Jakob Verbeek (INRIA) Maha Elbayad (INRIA / LIG)   (room1)
19:15 Tue-1-1-3 Investigating Self-supervised Pre-training for End-to-end Speech Translation - Natalia Tomashenko (LIA, University of Avignon) Fethi Bougares (LIUM- Le Mans Université) Yannick Estève (LIA - Avignon University) Laurent Besacier (LIG) Ha Nguyen (LIG - Grenoble Alpes University, LIA - Avignon University)   (room1)
19:15 Tue-1-1-4 Contextualized Translation of Automatically Segmented Speech - Mattia A. Di Gangi (Fondazione Bruno Kessler, University of Trento) Marco Gaido (Fondazione Bruno Kessler, University of Trento) Marco Turchi (Fondazione Bruno Kessler) Mauro Cettolo (FBK) Matteo Negri (Fondazione Bruno Kessler)   (room1)
19:15 Tue-1-1-5 Self-Training for End-to-End Speech Translation - Qiantong Xu (Facebook AI Research) Xutai Ma (Johns Hopkins University) Juan Pino (Facebook) Yun Tang (Facebook) Mohammad Javad Dousti (Facebook)   (room1)
19:15 Tue-1-1-6 Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing - Robert Enyedi (Amazon) Yogesh Virkar (Amazon) Marcello Federico (Amazon AI) Roberto Barra-Chicote (Amazon)   (room1)
19:15 Tue-1-1-7 Pair Expansion for Learning Multilingual Semantic Embeddings using Disjoint Visually-grounded Speech Audio Datasets - David Harwath (Massachusetts Institute of Technology) Yasunori Ohishi (NTT Corporation) Kunio Kashino (NTT Corporation) Takahito Kawanishi (NTT Corporation) Akisato Kimura (NTT Corporation) James Glass (Massachusetts Institute of Technology)   (room1)
19:15 Tue-1-1-8 Self-Supervised Representations Improve End-to-End Speech Translation - Juan Pino (Facebook) Anne Wu (Facebook) Changhan Wang (Facebook AI Research) Jiatao Gu (Facebook AI Research)   (room1)
19:15
Tue-1-10 Acoustic Phonetics and Prosody (until 20:15) (room10)
19:15 Tue-1-10-1 Correlating cepstra with formant frequencies: implications for phonetically-informed forensic voice comparison - Philip Harrison ((Department of Language and Linguistic Science, University of York) Frantz Clermont (School of Culture, History and Language, Australian National University) Vincent Hughes ((Department of Language and Linguistic Science, University of York)   (room10)
19:15 Tue-1-10-2 Prosody and breathing: A comparison between rhetorical and information-seeking questions in German and Brazilian Portuguese - Oliver Niebuhr (University of Southern Denmark) Jana Neitsch (University of Southern Denmark) Plinio Barbosa (University of Campinas)   (room10)
19:15 Tue-1-10-3 Scaling processes of clause chains in Pitjantjatjara - Hywel Stoakes (University of Melbourne) Catalina Torres (University of Melbourne) Rebecca Defina (University of Melbourne)   (room10)
19:15 Tue-1-10-4 Neutralization of voicing distinction of stops in Tohoku dialects of Japanese: a field work and acoustic measurements - Setsuko Imatomi (Mejiro University) Ayako Hashimoto (Tokyo Kasei-gakuin College) Mafuyu Kitahara (Sophia University) Sanae Matsui (Sophia University) Ai Mizoguchi (Maebashi Institute of Technology) Ryunosuke Kobayashi (Sophia University)   (room10)
19:15 Tue-1-10-5 Correlation between prosody and pragmatics: case study of discourse markers in French and English - Yvon Keromnes (Atilf - Université de Lorraine) Lou Lee (Université de Lorraine) Mathilde Dargnat (Atilf - Université de Lorraine) Denis Jouvet (LORIA - INRIA) Katarina Bartkova (Atilf - Université de Lorraine)   (room10)
19:15 Tue-1-10-6 An analysis of prosodic prominence cues to information structure in Egyptian Arabic - Dina ElZarka (University of Graz)   (room10)
19:15 Tue-1-10-7 Lexical Stress in Urdu - Miriam Butt (University of Konstanz) Benazir Mumtaz (University of Konstanz) Tina Bögel (University of Konstanz)   (room10)
19:15 Tue-1-10-8 Vocal markers from sustained phonation in Huntington's Disease - Hadrien Titeux (LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA) Laurie Lemoine (NPI/ENS/INSERM/UPEC/PSL Research University) Justine Montillot (NPI/ENS/INSERM/UPEC/PSL Research University) Rachid Riad (LSCP/NPI/ENS/CNRS/EHESS/INRIA/UPEC/PSL Research University) Xuan-Nga Cao (LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA) Emmanuel Dupoux (Ecole des Hautes Etudes en Sciences Sociales) Jennifer Hamet Bagnou (NPI/ENS/INSERM/UPEC/PSL Research University) Anne-Catherine Bachoud-Lévi (NPI/ENS/INSERM/UPEC/PSL Research University)   (room10)
19:15 Tue-1-10-9 How Rhythm and Timbre encode Mooré language in Bendré drummed speech - Julien Meyer (Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France) Laure Dentel (Human languages have the flexibility to be acoustically adapted to the context of communication, such as in shouting or whispering. Drummed forms of languages represent one of the most extreme natural expressions of such speech adaptability. A large amount of research has been conducted on drummed languages in anthropology or linguistics, particularly in West African societies. However, in spite of the clearly rhythmic nature of drumming, previous studies have largely neglected exploring systematically the role of speech rhythm. Here, we explore a unique corpus of the Bendré drummed speech form of the Mossi people, transcribed published in the 80’s by the anthropologist Kawada Junzo. The analysis of this large database in Mooré language reveals that the rhythmic units encoded in the length of pauses between drumbeats match more closely with vowel-to-vowel intervals than with syllable parsing. Meanwhile, we confirm for the first time a result found recently on the drummed speech tradition of the Bora Amazonian language. However, the complex acoustic structure of the Bendré skin drum required much more attention than the simple two pitch hollow log drum of the Bora. Thus, we also present here results on how drummed Bendré timbre encodes tones of Mooré language.)   (room10)
19:15
Tue-1-2 Speaker Recognition I (until 20:15) (room2)
19:15 Tue-1-2-1 Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms - Hye-jin Shim (University of Seoul) Seung-bin Kim (University of Seoul) Ju-ho Kim (University of Seoul) Ha-Jin Yu (University of Seoul) Jee-weon Jung (University of Seoul)   (room2)
19:15 Tue-1-2-10 Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification - Xixin Wu (University of Cambridge) Dong Yu (Tencent AI Lab) Helen Meng (Chinese University of Hong Kong) Jinghua Zhong (The Chinese University of Hong Kong) Xu Li (The Chinese University of Hong Kong) Dan Su (Tencent AILab Shenzhen) Xunying Liu (Chinese University of Hong Kong) Na Li (Tencent)   (room2)
19:15 Tue-1-2-2 Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances - Youngmoon Jung (KAIST) Hoi Rin Kim (KAIST) Yeunju Choi (KAIST) Myunghun Jung (KAIST) Seong Min Kye (KAIST)   (room2)
19:15 Tue-1-2-3 An Adaptive X-vector Model for Text-independent Speaker Verification - Zhenhua Ling (University of Science and Technology of China) Jun Du (University of Science and Technology of China) Wu Guo (University of Science and Technology of China) Bin Gu (University of Science and Technology of China) Fenglin Ding (University of Science and Technology of China)   (room2)
19:15 Tue-1-2-4 Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions - Santi Prieto (das-nano)   (room2)
19:15 Tue-1-2-5 Sum-Product Networks for Robust Automatic Speaker Identification - Aaron Nicolson (Griffith University) Kuldip K. Paliwal (Griffith University)   (room2)
19:15 Tue-1-2-6 Segment Aggregation for short utterances speaker verification using raw waveforms - Jee-weon Jung (University of Seoul) Ju-ho Kim (University of Seoul) Ha-Jin Yu (University of Seoul) Seung-bin Kim (University of Seoul) Hye-jin Shim (University of Seoul)   (room2)
19:15 Tue-1-2-7 SIAMESE X VECTOR RECONSTRUCTION FOR DOMAIN ADAPTED SPEAKER RECOGNITION - Ron Hoory (IBM Haifa Research Lab) Hagai Aronowitz (IBM Research - Haifa) Shai Rozenberg (IBM)   (room2)
19:15 Tue-1-2-8 Speaker Re-identification with Speaker Dependent Speech Enhancement - Qiang Huang (University of Sheffield) Yanpei Shi (University of Sheffield) Thomas Hain (University of Sheffield)   (room2)
19:15 Tue-1-2-9 Blind speech signal quality estimation for speaker verification systems - Artem Ivanov (STC-innovations Ltd.) Alexandr Kozlov (Speech Technology Center Ltd.) Tseren Andzukaev (STC-innovations Ltd.) Sergey Novoselov (ITMO University, Speech Technology Center) Galina Lavrentyeva (ITMO University, STC-innovations) Artem Gorlanov (STC-innovations Ltd.) Anastasia Avdeeva (STC-innovations Ltd.) Marina Volkova (ITMO University, STC-innovations Ltd.)   (room2)
19:15
Tue-1-3 Spoken Language Understanding II (until 20:15) (room3)
19:15 Tue-1-3-1 Modeling ASR Ambiguity for Neural Dialogue State Tracking - Manish Shrivastava (IIIT Hyderabad) Laurent Besacier (LIG) Vaishali Pal (IIIT Hyderabad) Fabien Guillot (Naver Labs Europe) Jean-Michel Renders (Naver Labs Europe)   (room3)
19:15 Tue-1-3-10 Confidence measure for speech-to-concept end-to-end spoken language understanding - Antoine LAURENT (LIUM - Laboratoire Informatique Université du Mans) Yannick Estève (LIA - Avignon University) Antoine Caubrière (LIUM, University of Le Mans) Emmanuel Morin (LS2N UMR CNRS 6004)   (room3)
19:15 Tue-1-3-2 ASR Error Correction with Augmented Transformer for Entity Retrieval - Ashish Kumar Agrawal (Amazon) Haoyu Wang (Amazon) Yang Liu (Amazon) Shuyan Dong (Amazon) James Logan (Amazon) Yue Liu (Amazon)   (room3)
19:15 Tue-1-3-3 Large-Scale Transfer Learning for Low-resource Spoken Language Understanding - Zhiyong Zhang ((Ping An Technology (Shenzhen) Co., Ltd.) Jianzong Wang ((Ping An Technology (Shenzhen) Co., Ltd.) Xueli Jia ((Ping An Technology (Shenzhen) Co., Ltd.) Jing Xiao (Ping An Technology) Ning Cheng ((Ping An Technology (Shenzhen) Co., Ltd.)   (room3)
19:15 Tue-1-3-4 Data balancing for boosting performance of low-frequency classes in Spoken Language Understanding - Fabian Triefenbach (Amazon) Quynh Do (Amazon AI) Judith Gaspers (Amazon)   (room3)
19:15 Tue-1-3-5 An Interactive Adversarial Reward Learning-based Spoken Language Understanding System - Hongxia Jin (Samsung Research America) Yu Wang (Samsung Research America) yilin shen (Samsung Research America)   (room3)
19:15 Tue-1-3-6 Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding - jin cao (Amazon) Jun Wang (Amazon) Wael Hamza (Amazon) Shang-Wen Li (Amazon AWS AI) Kelly Vanee (Amazon)   (room3)
19:15 Tue-1-3-7 Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training - Shota Orihashi (NTT Corporation) Mana Ihori (NTT Corporation) Tomohiro Tanaka (NTT Corporation) Ryo Masumura (NTT Corporation)   (room3)
19:15 Tue-1-3-8 Deep F-measure Maximization for End-to-End Speech Understanding - Leda Sari (University of Illinois at Urbana-Champaign) Mark Hasegawa-Johnson (University of Illinois)   (room3)
19:15 Tue-1-3-9 An Effective Domain Adaptive Post-Training Method for BERT in Response Selection - Chanhee Lee (Korea University) Kisu Yang (Korea University) Dongsuk Oh (Department of Computer Science and Engineering, Korea University) Dongyub Lee (Kakao Corp) Taesun Whang (Korea University) Heuiseok Lim (Korea University)   (room3)
19:15
Tue-1-4 Human speech processing (until 20:15) (room4)
19:15 Tue-1-4-1 Attention to indexical information improves voice recall - Grant McGuire (University of California Santa Cruz) Molly Babel (University of British Columbia)   (room4)
19:15 Tue-1-4-10 Identifying Important Time-frequency Locations in Continuous Speech Utterances - Hassan Salami Kavaki (The Graduate Center, CUNY, New York) Michael Mandel (Brooklyn College, CUNY, New York)   (room4)
19:15 Tue-1-4-2 Categorization of Whistled Consonants by French Speakers - Anaïs Tran Ngoc (Université Côte d'Azur, CNRS, BCL, France) Fanny Meunier (CNRS) Julien Meyer (Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France)   (room4)
19:15 Tue-1-4-3 Whistled vowel identification by French listeners - Anaïs Tran Ngoc (Université Côte d'Azur, CNRS, BCL, France) Fanny Meunier (CNRS) Julien Meyer (Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France)   (room4)
19:15 Tue-1-4-4 F0 slope as a cue to speech segmentation in French - Maria del Mar Cordero (Université Côte d’Azur, CNRS, BCL) Fanny Meunier (CNRS) Nicolas Grimault (CNRS, UMR 5292, INSERM, U1028, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Lyon) Elsa Spinelli (4Université Grenoble Alpes, CNRS, LPNC, Grenoble) Stéphane Pota (4Université Grenoble Alpes, CNRS, LPNC, Grenoble)   (room4)
19:15 Tue-1-4-5 Does French listeners’ ability to use accentual information at the word level depend on the ear of presentation? - Amandine Michelas (Aix-Marseille université, CNRS, LPL, UMR 7309, Aix-en-Provence) Dufour Sophie (Aix-Marseille université, CNRS, LPL, UMR 7309, Aix-en-Provence)   (room4)
19:15 Tue-1-4-6 A perceptual study of the five level tones in Hmu (Xinzhai variety) - Wen Liu (Shandong University)   (room4)
19:15 Tue-1-4-7 Mandarin and English Adults’ Cue-weighting of Lexical Stress - Liquan Liu (Western Sydney University and University of Oslo) Feng-Ming Tsao (National Taiwan University) Varghese Peter (School of Psychology, Western Sydney University) Zhen zeng (MARCS Institute, Western Sydney University) Karen Mattock (School of Psychology) Alba Tuninetti (Bilkent University, Turkey)   (room4)
19:15 Tue-1-4-8 Age-related differences of tone perception in Mandarin-speaking seniors - William Shi-Yuan WANG (Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR) Gang PENG (Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR) Yan FENG (Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR)   (room4)
19:15 Tue-1-4-9 Social and functional pressures in vocal alignment: Differences for human and voice-AI interlocutors - Georgia Zellou (UC Davis) Michelle Cohn (University of California, Davis)   (room4)
19:15
Tue-1-5 Feature extraction and distant ASR (until 20:15) (room5)
19:15 Tue-1-5-1 Raw Sign and Magnitude Spectra for Multi-head Acoustic Modelling - Erfan Loweimi (The University of Edinburgh) Peter Bell (University of Edinburgh) Steve Renals (University of Edinburgh)   (room5)
19:15 Tue-1-5-10 Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition - Rohit Kumar (IISc Bangalore) Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012) Anirudh Sreeram (IISc) Anurenjan Purushothaman (IISc)   (room5)
19:15 Tue-1-5-2 Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations - Purvi Agrawal (PhD Student, Indian Institute of Science, Bangalore-560012, India) Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012)   (room5)
19:15 Tue-1-5-3 A Deep 2D Convolutional Network for Waveform-based Speech Recognition - Steve Renals (University of Edinburgh) Zoran Cvetkovic (King's College London) Dino Oglic (King's College London) Peter Bell (University of Edinburgh)   (room5)
19:15 Tue-1-5-4 Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions - Gerhard Rigoll (Technical University Munich) Nicolas Lindae (Technical University Munich) Palle Klewitz (Technical University Munich) Ludwig Kürzinger (Technical University Munich)   (room5)
19:15 Tue-1-5-5 An alternative to MFCCs for ASR - Sanjeev Khudanpur (Johns Hopkins University) Hynek Hermansky (JHU) Dan Povey (Johns Hopkins University) Pegah Ghahremani (Johns Hopkins University) Hossein Hadian (Department of Computer Engineering, Sharif University of Technology, Tehran, Iran)   (room5)
19:15 Tue-1-5-6 Phase based spectro-temporal features for building a robust ASR system - Gudmalwar Ashishkumar (National Institute of Technology Meghalaya) Ch. V. Rama Rao (National Institute of Technology, Meghalaya) anirban dutta (National Institute of Technology Meghalaya)   (room5)
19:15 Tue-1-5-7 Deep Scattering Power Spectrum Features for Robust Speech Recognition - Dino Oglic (King's College London) Zoran Cvetkovic (King's College London) Neethu Mariam Joy (King's College London) Steve Renals (University of Edinburgh) Peter Bell (University of Edinburgh)   (room5)
19:15 Tue-1-5-8 FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition - Nicholas Lane (University of Oxford) Titouan parcollet (University of Oxford) Xinchi Qiu (University of Oxford)   (room5)
19:15 Tue-1-5-9 Bandpass Noise Generation and Augmentation for Unified ASR - Bo Ren (Microsoft China) Yifan Gong (Microsoft Corp) Jian Wu (Microsoft Corp) Kshitiz Kumar (Microsoft Corporation)   (room5)
19:15
Tue-1-7 Speech Synthesis: Text Processing, Data and Evaluation (until 20:15) (room7)
19:15 Tue-1-7-1 g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset - Kyubyong Park (Kakao Brain) Seanie Lee (KAIST)   (room7)
19:15 Tue-1-7-2 A Mask-based Model for Mandarin Chinese Polyphone Disambiguation - Haiteng Zhang (Databaker (Beijing) Technology Co.,Ltd.)   (room7)
19:15 Tue-1-7-3 Perception of concatenative vs. neural text-to-speech (TTS): Differences in intelligibility in noise and language attitudes - Georgia Zellou (UC Davis) Michelle Cohn (University of California, Davis)   (room7)
19:15 Tue-1-7-4 Enhancing Sequence-to-Sequence Text-to-Speech with Morphology - Korin Richmond (University of Edinburgh) Jason Taylor (University of Edinburgh)   (room7)
19:15 Tue-1-7-5 Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling - Youngmoon Jung (KAIST) Hoi Rin Kim (KAIST) Yeunju Choi (KAIST)   (room7)
19:15 Tue-1-7-6 Deep Learning Based Assessment of Synthetic Speech Naturalness - Gabriel Mittag (Technische Universität Berlin) Sebastian Möller (Quality and Usability Lab, TU Berlin)   (room7)
19:15 Tue-1-7-7 Distant Supervision for Polyphone Disambiguation in Mandarin Chinese - Yuanyuan Zhao (Kwai) Jiaqi Zhu (Institute of Software, Chinese Academy of Science) Jinba Xiao (Kwai) Jiawen Zhang (University of Chinese Academy of Sciences)   (room7)
19:15 Tue-1-7-8 An unsupervised method to select a speaker subset from large multi-speaker speech synthesis datasets - Simon King (University of Edinburgh) Joanna Rownicka (University of Edinburgh) Jennifer Williams (University of Edinburgh) Pilar Oplustil (University of Edinburgh)   (room7)
19:15 Tue-1-7-9 Understanding the Effect of Voice Quality and Accent on Talker Similarity - Anurag Das (Texas A&M University) Ricardo Gutierrez-Osuna (Texas A&M University) John Levis (Iowa State University) Guanlong Zhao (Texas A&M University) Evgeny Chukharev-Hudilainen (Iowa State University)   (room7)
19:15
Tue-1-8 Search for speech recognition (until 20:15) (room8)