INTERSPEECH is the world’s largest conference on spoken language processing, covering both speech science and technology, from theory to practical applications. The theme of this year’s conference is “Speech everywhere!”, as Automatic Speech Recognition (ASR) models are integrated in large AI systems that combine speech, vision, language and more, so as to adhere better to human speech communication principles and optimize performance.
The conference program includes a wide array of papers and posters by the world’s leading scientists in the field, presenting cutting-edge developments. It will be opened by Prof. Dr.-Ing. Hermann Ney, AppTek’s very own Director of Science and Professor at RWTH Aachen University. Prof. Ney is also the winner of the 2021 ISCA Medal for Scientific Achievements, among many other awards to honor his lifelong contributions in the field of natural language processing, which resulted in multiple operational research prototypes and commercial systems. He will present his view on the historical development of research on speech and language processing with emphasis on the framework of Bayes decision rule.
Nine of AppTek’s speech scientists will be presenting papers on a variety of topics, starting with sampling-based criteria for language modeling with large vocabularies, a hot topic when it comes to modern word-based language models, which are becoming increasingly larger. Their research shows that all sampling methods perform equally well when model outputs are corrected for the intended class posterior probabilities, while delivering the expected speedups. Their claims are supported by experimental evidence in language modeling and ASR on LibriSpeech and SwitchBoard.
Y. Gao, D. Thulke, A. Gerstenberger, V. A. K. Tran, R. Schlüter, H. Ney:
"On Sampling-Based Training Criteria for Neural Language Modeling"
Novel end-to-end ASR architectures do not distinguish between acoustic and language models. While these novel architectures enable more efficient search, they can be challenging in case of domain mismatch. The next two papers are concerned with methods for language model prior correction and integration of language models separately trained on large text-only resources.
A. Zeyer, A. Merboldt, W. Michel, R. Schlüter, H. Ney
"Librispeech Transducer Model with Internal Language Model Prior Correction"
M. Zeineldeen, A. Glushko, W. Michel, A. Zeyer, R. Schlüter, H. Ney:
"Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models"
Other papers make important contributions in ASR research by comparing different modeling architectures, and by proposing a novel fully acoustic-oriented subword modeling approach that combines the advantages of several methods into a single pipeline. The latter results in better word segmentation and a more balanced sequence length, both of which are pertinent issues particularly in streaming ASR output, which is used in live captioning scenarios.
W. Zhou, A. Zeyer, R. Schlüter, H. Ney:
"Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept"
W. Zhou, M. Zeineldeen, Z. Zheng, R. Schlüter, H. Ney:
"Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition"
The last paper submitted is of a different nature and is a result of collaboration with the English Studies Department at RWTH Aachen University. It focuses on the linguistic complexities present in non-native spontaneous speech, which is a frequent problem encountered in speech recognition applications, given that so much of today’s English audio content is produced by non-native speakers. The significance of this contribution also lies in the fact that there is a lack of publicly available databases and benchmark datasets for spontaneously produced non-native speech, as well as the considerable inter-individual variability of such speech.
Y. Qiao, W. Zhou, E. Kerz, R. Schlüter:
"The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech"
The conference takes place in a hybrid format this year, in Brno, Czech Republic, from 30th August to 3rd September. You can find more information about the conference program and how to register here.
AppTek provides an artificial intelligence and machine learning-based automatic speech recognition, machine translation and natural language understanding platform for organizations in a variety of markets, such as media and entertainment, call centers, government, enterprise business and others across the globe. Available via the cloud or on-premise, AppTek delivers the highest quality real-time streaming and batch speech technology solutions in the industry. Featuring scientists and research engineers who are recognized amongst the best and most experienced in the world, the company’s solutions cover a wide array of languages, dialects, and channels.