4D Human Language Technologies to Fuel the Metaverse

February 28, 2022

The “metaverse" is captivating mainstream audiences with its promise of realistic, spatially-aware and fully-immersive digital worlds that users can inhabit in parallel to our physical world, akin to the fictionalized world set in the Ernest Cline novel Ready Player One.    Inside the metaverse, people will be able to seamlessly engage in new virtual experiences, transact via new virtual economies, and communicate with real-world users globally as well as their virtual counterparts within the virtual world.

At the heart of the metaverse lies the many technological advancements needed to power these virtual realities, including innovations in machine learning (ML) and artificial intelligence (AI) to manage every dimension of user interaction.  

AI-enabled human language technology (HLT) in particular will serve as a critical and foundational component of the metaverse, enabling voice-based user interactions ranging from command-driven hands-free navigation of virtual worlds, to how users connect and interact with virtual AI entities who then in turn understand, process and interact based on those conversational inputs, to enabling seamless interpersonal connections between real-world users around the globe regardless of the language they speak.  

Today inside its 4D approach, AppTek’s science expertise in 80+ languages with 100s of dialects for both common and low resource languages expands automatic speech recognition (ASR) and neural machine translation (NMT) to enable global communication between humans and machines. These languages and dialects are further “sliced and diced” by topic or industry with domain-specific modeling (e.g. commerce, sports, gaming) and corresponding sub-domains (e.g. accounting, credit reporting, lending) to further accelerate performance.  Modeling involves training systems on a variety of acoustic environments including multi-array microphones, 8 kHz telephony, and 16 kHz broadcast media and entertainment, and can also be used to express emotion from both acoustic and text signals using clues from voice volume, pace, and prosody as well as natural language processing and understanding to gauge cues in text such as intent and sentiment.

The 4D approach also reduces the potential for bias through advanced training, diverse resources and its demographic-minded approach to data compilation and curation.  AppTek’s Director of Data Services and former Deloitte expert Kelly Zhang stated, “Inclusivity and representation are critical to an effective and fair metaverse.  AppTek’s 4D modeling takes into consideration age, accents, dialects, education levels, gender, and more, as well as deaf and hard-of-hearing users and text-to-speech for the blind, to make sure every voice counts.”

The AppTek Science Team continues to forge advancements in 4D AI/ML for automatic translation of spoken content within the metaverse.  Today, scientists are working on various flavors of speech translation including systems that cascade speech to text (ASR) with text to text translations (MT), end to end spoken translation systems that train on spoken audio from one language into the translated text of another language, and true textless direct end-to-end speech translation systems that translate from a source speech signal into the target (artificially created) speech.

The team achieved state-of-the-art results including top ranking at the 2021 IWSLT workshop, where AppTek’s end-to-end spoken translation methodology outperformed other approaches in the end-to-end categories.  Headed by Dr. Zoltán Tüske, AppTek’s lead speech scientist and former scientist at IBM, the team also continues the research and development of supervised and unsupervised HLT approaches, as well as “end-to-end” speech recognition that maps a sequence of acoustic features to a sequence of words for large vocabulary continuous speech recognition tasks.

Katie Nguyen, Senior Vice President of Data Operations at AppTek who previously managed data services at SAIC, stated “AppTek's mission is to break down language barriers and enable global communication at scale. We see 4D HLT as the future for cognitive AI within the metaverse to further enable accurate and inclusive cross-lingual human-machine interactions. “

As the work surrounding the metaverse continues to grow, we are excited to see the future of what AI, including human language and cognition technologies, will bring to the metaverse and our global community!

AI and ML Technologies to Bridge the Language Gap
AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.

