Communication is critical to the exchange of news and information, the development and growth of interpersonal relationships, and ultimately building and sustaining a healthy community, in a democratic environment where everyone’s voices and ideas may be heard. Accessibility is a core component of ensuring that communication among all members of the community, including the deaf and hard of hearing (HOH), takes place on an equal footing.
The World Health Organization estimates over 466 million people (5% of the global population) suffer from disabling hearing loss; they expect that number to nearly double in the next 30 years. Those people need and deserve the same access as everyone else to all forms of communication including one-to-one personal interactions, business meetings, classrooms, lectures, movies and television shows, radio broadcasts and live global broadcasts.
In this new Accessibility Series, AppTek will address in-depth the imperative of closed captioning (CC) as a critical tool for equality. Closed captioning refers to text that is added to the bottom of the screen in television broadcasts (or any type of video display) that provides a transcription of the dialogue (and some non-speech auditory elements) in sync with the program’s audio.
To begin, we’ll evaluate and expand upon a recent advocacy Petition that puts a stake in the ground for modern rules surrounding captioning in the United States. On July 31st of this year, a distinguished group of Associations and experts on the needs of the deaf and HOH community filed a Petition with the U.S. Federal Communications Commission (FCC), asking for definitive rule-making on objective, technology-neutral metrics for live captioning quality. Reflecting on the state of live captioning in the USA today, the Petition claims that:
· Problems and inconsistencies with captioning of live programming still widely persist;
· The current “best practices” approach in captioning don’t assess or ensure the quality of captions;
· The FCC needs to adapt a rigorous, substantive procedural framework to assess captioning quality that focuses on actual consumer experience; then adopt technology- and methodology-neutral metrics that program providers must meet;
· And, the FCC must provide immediate guidance about the use of Automatic Speech Recognition technology.
Let’s explore these claims in more detail.
The Petition seeks to promote equal access to video programming for the more than 48 million Americans who are deaf or hard-of-hearing. The closed captioning provisions of the 1996 Telecommunications Act require the FCC to “ensure”that video programming is fully accessible through the provision of closed captions. The petitioners assert that, in the 23 years since this act was passed, consumers have continued to experience poor-quality captions on live programming, and that, in some cases, captions have become even worse. They claim that the “best practices” approach previously adopted by the 2014 Caption Quality Order lacks objective quality levels and “leaves cost, not quality, as the primary driver for broadcasters”. The Petition offers examples of the proliferation of poor captions including missing captions for sports and weather programming, missing speaker identification, captions that are out of sync with speakers, programs that are incompletely captioned, the omission of content and more. The petitioners thus urge the FCC to set objective, technology-neutral standards for live captioning which are critical to “prevent a proliferation of low quality captioning services that might result otherwise.”
Such claims are contested by broadcasters and cable program operators, who state that while closed captioning is not always perfect and every workflow is still prone to errors, the “best practices”approach adopted in 2014 has largely helped improve the quality of live captioning with no evidence to the contrary. They argue that the enforcement of objective standards will create burdensome mandatory auditing processes that will make it more difficult and expensive to monitor caption quality, and will ultimately impede advancements that improve the future of captioning.
In its final assertion, the Petition states that the Commission must provide guidance for the use of automatic speech recognition (ASR) technology, as this was not previously done. It states that rules are unclear as to how ASR technologies fit into existing “best practices”, which have been primarily tailored to human and ENT (Electronic Newsroom Technique) captioning workflows. Ultimately, the Petition seeks to halt the use of poor-quality ASR captions and instead seeks to offer standards and metrics for such use of ASR.
In the AppTek Accessibility series, we seek to uncover the issues surrounding captioning as it relates to this petition, highlight the different viewpoints of both the providers of captioning services and the end users of such services, and interview experts from the field of captioning and ASR so to deepen our understanding of the challenges involved.
In our next post, we will discuss the history of captioning and outline the progress that has been made over the years. We will then provide an overview of the workflows implemented for live captioning production, the challenges such workflows pose, and how ASR fits into them. We will talk to members of the Deaf and HOH community for their perspectives, interview industry experts on the state of modern live captioning workflows and proposed metrics to measure their effectiveness, and also talk with AppTek’s scientists for a deeper understanding of ASR technologies and how they can best service the live captioning industry. We will conclude by looking into current research trends and try to outline where we can expect technology to lead us in a (hopefully) more accessible future.
Noted organizational behavior expert Margaret Wheatly said that “There is no power for change greater than a community discovering what it cares about.” As we continue to tackle the issues surrounding captioning, with a view to deepening our understanding from the perspective of all stakeholders, the ultimate goal for our technology is to drive change that has a positive impact on the lives of our global community.
AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.