SEQCLAS, which stands for “A Statistical Classification Framework for Human Language Technology”, is a project funded by the European Research Council under the European Union’s Horizon 2020 research and innovation program. The project has been running since August 2016 under the coordination of Prof. Dr.-Ing. Hermann Ney, AppTek’s Director of Science, with the participation of many AppTek scientists, and comes to a close in July 2021.
The goal of the project has been to develop a unifying framework of novel methods for sequence classification used in human language technologies (HLT), such as automatic speech recognition (ASR) and machine translation (MT). It is expected to pave the way for a new generation of algorithms used in these scientific fields, and thus set the basis for accelerating progress and pushing the state-of-the-art forward.
Huge amounts of unstructured speech and text data are available on the web and in large private or public digital archives for various languages, and are commonly used for AI applications such as speech recognition, machine translation and text image recognition. The challenge of using such data lies in their automatic processing so as to enable applications that support humans in communication. The statistical classification techniques developed so far for data processing need to be improved, in order to better meet the needs of real-world applications and so as to ultimately match or exceed human performance.
From a scientific perspective, challenges as different as translation (from text in one natural language to text in another language) and speech transcription (where audio in one language is transcribed as text in the same language) bear some underlying similarity: in both cases, one needs to map a sequence of data to a sequence of symbols. The fundamental difficulty lies in the fact that one cannot do this piecemeal, but the source sequence, whether audio or text, needs to be considered as a whole, or at least one needs to consider each piece of data in context. For example, in the phrase "Honey, please pass me the honey jar", the two occurrences of the word “honey” have a very different meaning, and the first "honey" is translated into German as "Liebling" whereas the second as "Honig".
The Bayes decision rule, which is the topic of Prof. Ney’s keynote address at this year’s INTERSPEECH conference, lies at the heart of all successful approaches to statistical classification, so as to achieve optimum performance. In other words, the Bayes rule describes the framework to achieve the lowest possible error rate given true probability distributions. However, different modeling assumptions have been made over the decades, including simplifications and approximations, without it being clear how final performance is affected, which at the end of the day is what truly matters in a sequence classification system.
Science is often not as straightforward and systematic as people assume. In a novel field, when exploring the unknown, decisions are being taken that are seemingly good at the time but may later turn out to create an inconsistent and less-than-clean approach that is still accepted and used by the scientific community. SEQCLAS looks into some of the choices that have been made over the years, so as to determine whether they are good enough for people to carry on building on top of them or whether it is best that they are replaced.
Instead of developing specific solutions for different human language technology tasks independently, SEQCLAS has taken the approach to come up with a unifying framework across all three HLT tasks (speech recognition, machine translation, text image recognition). Further adaptation of the HMM-framework to neural network based acoustic models has increased recognition accuracy of state-of-the-art models, while other breakthroughs on the ASR front include feature extraction directly from the speech signal.
In terms of machine translation, the project team experimented with building MT systems in unsupervised training scenarios. The later involves the use of monolingual data only for system training, where no explicit parallel sentence pairs are made available in the translation language pair under question. One could say that this is a new frontier reached in machine translation science, as you can now built MT systems using only large amounts of corpora in any two languages you can imagine, even if you have no body of translation memory data between said languages.
The work completed in the project has resulted in high-performing research prototype systems whose effectiveness has been evaluated and corroborated on public international benchmarks. This is great news for low-resource language pairs: it means that it is now possible to build MT systems between uncommon language pairs more easily, which is particularly important given the growth of non-English source content that the localization sector has been witnessing in recent years. It could provide for a great solution to talent sourcing issues for languages pairs as diverse as Korean to Icelandic, or Greek to Japanese, opening up new possibilities to the distribution of, say, popular entertainment content to viewers not able to easily access it until today, while also boosting the content creation markets of the respective countries.
More than 60 papers and articles have been presented at scientific conferences and published to date as a result of the SEQCLAS project, over half of them with the participation of AppTek scientists. These also include five out of six papers due to be presented by AppTek’s scientists at this year’s INTERSPEECH conference. Read more about AppTek’s latest research in speech processing here.
AppTek provides an artificial intelligence and machine learning-based automatic speech recognition, machine translation and natural language understanding platform for organizations in a variety of markets, such as media and entertainment, call centers, government, enterprise business and others across the globe. Available via the cloud or on-premise, AppTek delivers the highest quality real-time streaming and batch speech technology solutions in the industry. Featuring scientists and research engineers who are recognized amongst the best and most experienced in the world, the company’s solutions cover a wide array of languages, dialects, and channels.