Company Overview

Home / Company Overview

About AppTek

AppTek develops engines and solutions for cross lingual communication. Apptek is a leader in automatic speech recognition (ASR), neural machine translation (MT), machine learning (ML), natural language understanding (NLU) and artificial intelligence (AI).   Founded in 1990, AppTek employs one of the most agile, talented teams of ASR, MT, and NLU PhD scientists and research engineers in the world.  Through our advanced research in speech recognition, machine translation and artificial intelligence, we have solved many challenging problems improving human-quality transcription, language understanding and translation accuracy.  Our people are among the language technology and machine learning industry’s premier experts. Our long-standing affiliations with the world’s leading human language technology universities is central to our continuous introduction of new theories and solutions for automating recognition, translation and communication. Our 30-year history of achieving performance goals with our customers across government, global commerce, call centers and media comes from our understanding of their problems and the best application of technology solutions.

Company History and Timeline

Company founded to enable global multilingual commerce applications and communications
Began providing critical support to USG with advanced text analytics and machine translation software
Patent filed for Automatic Speech Recognition method
USG awards AppTek first multi-speaker, multilingual auto speech and translation system
Patent for Keyword Speech Recognition
eBay acquires AppTek's Hybrid Machine Translation platform for cross-border trade
Launch of new AI platform for ASR and NMT
Two Patents for Deep Neural Network Model Advancements
Patent for Audio Recognition of Keywords
AppTek Wins Two 2019 SpeechTEK People’s Choice Awards
Hermann Ney, Science Director, granted IEEE's James L Flanagan award for pioneering life-long advancements in speech technology

Recent Academic Research and Publications

A New Training Pipeline for an Improved Neural Transducer

May 2020
Albert Zeyer | André Merboldt | Ralf Schlüter | Hermann Ney

The RNN transducer is a promising end-to-end model candidate. We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training. We also generalize from the original neural network model and study more powerful models, made possible due to the maximum approximation. We further generalize the output label topology to cover RNN-T, RNA and CTC. We perform several studies among all these aspects, including a study on the effect of external alignments. We find that the transducer model generalizes much better on longer sequences than the attention model. Our final transducer model outperforms our attention model on Switchboard 300h by over 6% relative WER.

View Research

Early Stage LM Integration Using Local and Global Log-Linear Combination

May 2020
Wilfried Michel | Ralf Schlüter | Hermann Ney

Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. Language model integration is straightforward with the clear separation of acoustic model and language model in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for language model integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and language model is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in training and in testing. This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training. The proposed methods show good improvements over standard model combination (shallow fusion) on our state-of-the-art Librispeech system. Furthermore, the improvements are persistent even if the LM is exchanged for a more powerful one after training.

View Research

Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition without Length Bias

May 2020
Wei Zhou | Ralf Schlüter | Hermann Ney

As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in simple beam search to ease the problem, most of which are heuristic-based and require considerable tuning. We show that heuristics are not proper modeling refinement, which results in severe performance degradation with largely increased beam sizes. We propose a novel beam search derived from reinterpreting the sequence posterior with an explicit length modeling. By applying the reinterpreted probability together with beam pruning, the obtained final probability leads to a robust model modification, which allows reliable comparison among output sequences of different lengths. Experimental verification on the LibriSpeech corpus shows that the proposed approach solves the length bias problem without heuristics or additional tuning effort. It provides robust decision making and consistently good performance under both small and very large beam sizes. Compared with the best results of the heuristic baseline, the proposed approach achieves the same WER on the 'clean' sets and 4% relative improvement on the 'other' sets. We also show that it is more efficient with the additional derived early stopping criterion.

View Research

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

July 2019
Eugen Beck | Wei Zhou | Ralf Schlüter | Hermann Ney

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform d...

View Research

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

July 2019
Y. Kim, Y. Gao, and H. Ney

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability is limited to cognate languages by sharing their vocabularies. This paper shows effective techniques to transfer a pre-trained NMT model to a new, unrelated language without shared vocabularies. We relieve the vocabulary mismatch by using cross-lingual word embedding, train a more language-agnostic encoder by injecting artificial noises, and generate synthetic data easily from the pre-training data without back-translation.....

View Research

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

June 2019
Yunsu Kim | Hendrik Rosendahl | Nick Rossenbach | Hermann Ney

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation....

View Research

Language Modeling with Deep Transformers

May 2019
Kazuki Irie | Albert Zeyer | Ralf Schlüter | Hermann Ney

We explore multi-layer autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured Transformer models outperform our baseline models based on the shallow stack of LSTM recurrent neural network layers....

View Research

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

May 2019
Tobias Menne, Ralf Schlüter, Hermann Ney:

Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word...

View Research

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

April 2019
Oscar Koller | Necati Cihan Camgoz | Hermann Ney | Richard Bowden

In this work we present a new approach to the field of weakly supervised learning in the video domain. Our method is relevant to sequence learning problems which can be split up into sub-problems that occur in parallel. Here, we experiment with sign language data. The approach exploits sequence constraints within each independent stream and combines them ....

View Research
View More Academic Research
30-Year Leaders in Speech Technology
Find us on Social Media:

AppTek provides an artificial intelligence and machine learning-based automatic speech recognition, machine translation and natural language understanding platform for organizations in a variety of markets, such as media and entertainment, call centers, government, enterprise business and others across the globe. Available via the cloud or on-premise, AppTek delivers the highest quality real-time streaming and batch speech technology solutions in the industry.   Featuring scientists and research engineers who are recognized amongst the best and most experienced in the world, the company’s solutions cover a wide array of languages, dialects, and channels.

Copyright 2021 AppTek    |    Privacy Policy      |       Terms of Service     |      Cookie Policy