An insider’s perspective on data-driven technology and the audiovisual translation industry

December 6, 2021
Dr Volker Steinbiss
By Volker Steinbiss, Managing Director, AppTek GmbH

One of the most popular discussion topics among translators is what machine translation (MT) and other technologies mean for their future. In this context, I have time and again come across popular narratives that I would like to challenge, having worked with language technologies throughout my career. I recently presented these thoughts at the Languages & the Media conference where they were well received by the translators in attendance, so I wanted to put them in writing as well in order to continue feeding into the ongoing discussion.

How does MT work? Why is it often perfect and sometimes crap?

Every machine translation system, before it is used in production, goes through a learning or training phase. It processes a huge amount of parallel data, i.e., many sentences in two languages, and this is how it derives all its knowledge about these languages and is able to translate one into the other. In other words, it learns from examples, a bit like how a child picks up language – it doesn't know about grammar and collocations, it just listens and talks.

The scientific approach formulates the translation problem in the engineering way: Build a machine that for each input sentence, produces a given output sentence, as often as possible. This is formulated in the language of mathematics, with probabilities. The machine used to do the work is a neural network, which can be set to work in different architectures, depending on what is the latest fashion in the scientific community. But whatever the architecture, we still talk about machinery, algorithms, probabilities, parameter estimation, optimization – in other words, pure number crunching.

In this approach, the words are taken as tokens. If one was to replace the thousands of words in the training data with a unique number for each word, the machine would still handle the data in the same manner, looking out for tokens and relations between them. Translation would still work, even though a human would be completely lost. Nothing in this approach has to do with what we know and love about language, and it is completely different from how humans process language. The term "neural network" can fool us, because any similarity to the brain is superficial. In fact, the way humans and machines translate is as similar as the way birds and airplanes fly – they both achieve a comparable result but with a very different mechanism of doing so.

How good will MT get in the future?

It is hard to say how good MT will get someday and I think nobody really knows – but everybody has an opinion. MT is currently not good enough to rely on without professional quality control and post editing. But it is too good not to use it. In order to showcase the potential of MT for use in the subtitling domain, we recently launched a quiz where people were asked to guess whether the English subtitles of a Spanish-language telenovela had been generated by AppTek’s MT system or by a professional subtitler; many got the answers wrong.

I often hear that MT will never be able to translate a certain piece of text, say humor, because it would need to understand language in context and not just juggle meaningless tokens to do so. While I agree we might not be there any time soon, the argumentation to support such statements is often flawed: just because humans have a certain way of doing things, it does not mean machines need to follow the same process to achieve the same result. For example, today neural MT typically generates well-formed, grammatically correct sentences although there is no explicit grammar-handling mechanism built in the models to do so.

Needless to say, the context MT has access to might be limited as a result of its implementation. If, for instance, by technical restriction MT only translates one sentence at a time, it would be unable to consider the context beyond the sentence level because it does not access that information. However, the next version of MT might be less restricted and able to perform better on that front. Document-level translation is already moving from the research lab into practical implementations, at least in its crudist form, so some context errors can now be avoided, such as, for instance, gender issues across subtitles where the speaker gender is obvious from textual context found in previous subtitles.

“Computers cannot be creative”

This is one of the most popular myths about machines that most people seem to accept as a theorem. The idea is that computers are dull and deterministic and cannot be original, inventive, or creative, because they are limited to what their human designers built into them or because their mode of operation just doesn't allow for creativity.

Yet, here are a few examples where computers are creative and inventive indeed:

  • (1) The latest Airbus cabin, more lightweight yet stable, was created via a process called generative design.
  • (2) Go champion Lee Sedol was beaten by AI AlphaGo in 2016. After one game against AlphaGo he stated: "I felt powerless." AlphaGo had crafted original moves and an own – kind of alien – style, that subsequently provided inspiration to Go players.
  • (3) The same with video games. AI AlphaStar attained Grandmaster status for Starcraft II in 2019. It used "unimaginably unusual" strategies never considered by pro players before.
  • (4) Technically closer to MT, a student used an AI called GPT-3 to create a fake blog post that landed in the first place of Hacker News and the large majority of readers took it for real.
  • (5) DALL-E creates artificial images from text, such as an"armchair in the shape of an avocado” , which seems to be as creative as it gets.

It is not easy for us to accept that a piece of software can do things it has not been programmed for. Yet, it can. Why is our intuition failing us in this case? My explanation is that we are limited in our imagination and, indeed, fooled by simple explanatory examples. When we imagine how a neural network learns from examples, we typically think of something within our sphere of experience – say, a few pages of translated sentences. Or a travel guide with a hundred pages. An average-sized novel is approximately 90,000 words, or roughly 5,000 sentences. Decent MT systems are trained on 100 million sentence pairs – 20,000 times as many. It would take a lifetime to read that much. When we go beyond simple every-day phrases like “good morning” to a complex sentence, and then beyond sentences to a full book, to a library of books, such a progression in magnitude is beyond our imagination. We understand the principles, but nobody understands any more what it is that the AI learns.

I can’t predict the future and where we will be in a few or in twenty years from now, but I believe that language technology like MT will improve substantially and there will also be plenty of cases that can only be handled by a skilled professional. I also believe that there is a lot of benefit in partial automation, by having the machine support the professional. For example, by helping in the time alignment of subtitles, or to check for consistency across a series - whatever takes away some of the burden from translators to allow them to focus on high-level work instead.

What is certain is that MT and other language technologies will have an impact on workplaces.

Will we all be replaced by machines one day?

I don’t think the answer to this question is ‘yes’ when it comes to translators, but the problem lies in the way the question is phrased. Even if automation does not replace translators but does 30% of the work, this could still translate into 30% fewer jobs for translators, right?

It’s not as simple as that of course, as workplaces evolve. Historically, we have seen workplaces vanishing or just changing. In the case of translating marketing material, for example, the deep added value of the translator is to go beyond pure translation into a language and consult about the appropriate messaging in the respective culture. The same can be true for translating entertainment content into another language. The importance of translation as a means of transferring cultural knowledge to an audience speaking a difference language was recently highlighted by the large debate in the press about the subtitles and dubs of the latest Netflix hit “Squid Game”.

As the volume of content that requires localization continues to increase, partial automation can help language service providers cope with peaks in volume and speed up production. And translators’ working conditions are being shaped as we speak.

The future is still to be written

Regarding the impact of language technology on translators' workplaces, it is important that we get it right and there is certainly a way to get it wrong. Translators have already considered the nightmare of a race to the bottom regarding quality and cost: production pipelines broken by people who don't know how to manage change; creative intellectuals chained to machines, fixing stupid errors, as another cog in the wheel; results that lack artistic quality; subtitles or dubs that don’t feel right. We definitely want to avoid this.

Let us think of a good version of the future together: The machine does what it's good at, handling some of the boring tasks. It identifies low-quality translations and does not even display them. The translator goes through the less challenging part of the text quickly, almost in proofreading mode, and spends the bulk of his/her time on the more creative work – the part that entertains us as viewers.

I see the machine as a powerful toolbox translators can leverage. I believe that technology improves lives, if used properly. It certainly improves businesses. But not all technology that is relevant for translators is a priority in scientific research. Speaker diarization, for example, would be useful in subtitling, but isn’t a priority in research because nobody is pushing for it. It is widely unknown that one can influence the priorities of the international scientific community by simply putting one’s topics on the agenda. I am an advocate for binding quality standards and objective reproducible benchmarks with minimum quality levels, driven by the needs of the sector. By having such benchmarks, one can specialize the output of machines to a specific domain, so that the tools are as good as possible and can truly be of service to translators.

When it comes to the commercial use of machine output, it would make sense to make the generation process transparent. A simple label of the type “fully automatic”, “post-edited” or “certified translation” would manage user expectations regarding the quality of a certain product, e.g. subtitles, even give them control over the choice of product, while protecting human creative work at the same time. We have all eaten junk food at some point in our lives, and we may even like it on occasion, but we are also willing to spend more on organic products which we believe to be of higher quality and beneficial to our health.

Introducing MT technology in the localization production chain is of course a challenge and a learning process in itself. Ergonomics is a crucial aspect. The UI should support the needs of translators, who should be able to easily take advantage of new developments in the field. For the time being, MT systems often don't understand whether actors in a film are male or female, yet this is required knowledge for accurate translation in some target languages. On the language technology side, we can use metadata to get the translation right, but there must be a way for this meta-information to be provided to the machine either directly from the script, from an image recognition system, or fed into the system manually by the user so that the MT engine can take the actor’s gender into account.

All in all, when considering the impact of machine translation on the future of translators, I believe that, if we all work on it together, we can collectively create modern workplaces that provide professionals with more power than ever before.

Together we can influence the future – this is the time to set the course!

AI and ML Technologies to Bridge the Language Gap
Find us on Social Media:

AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.

Copyright 2021 AppTek    |    Privacy Policy      |       Terms of Service     |      Cookie Policy