AppTek Scientists Present at ICASSP 2023 in Rhodes Greece

June 6, 2023

Members of the AppTek Science Team are on the Greek Island of Rhodes this week participating at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023).  ICASSP is a comprehensive technical program with more than 300 oral and poster sessions, complemented by plenaries and perspective talks, tutorials, panels, exhibitions and demonstrations, and industry workshops. These sessions showcase all the latest developments in research and technology for signal processing and its applications.

On June 09, 8:15 AM - 09:45 AM, AppTek scientists will be speaking at Poster Session SLT P40 on the following papers:

Enhancing and Adversarial: Improve ASR with Speaker Labels

Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph M. Lüscher, Ralf Schlüter, Hermann Ney

ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7\% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.

Lattice-free Sequence Discriminative Training for Phoneme-based Neural Transducers

Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

Stop by and visit the team if you are at the conference. Looking forward to connecting with our friends and colleagues!

AI and ML Technologies to Bridge the Language Gap
Find us on Social Media:
ABOUT is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U), large language models (LLMs)  and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.

Copyright 2021 AppTek    |    Privacy Policy      |       Terms of Service     |      Cookie Policy