Automated Speech Recognition (ASR), generally, and AppTek, specifically, have been around for decades. It is only in the last few years with the widespread advent of deep machine learning techniques such as deep neural networks (DNN), recurrent neural networks (RNN) and convolutional neural networks (CNN) that the applications of ASR have truly exploded.
Although there are many potential applications, at AppTek we are most excited about 3 areas – media, telephony and conversational interfaces for the internet of things (IoT). I will go much deeper into these verticals in future blogs, but as McKinsey points out in Where Machines could Replace Humans, and Where They Can’t repetitive activities with a moderate level of analysis or expertise are prime candidates for automation through artificial intelligence. Media, telephony and IoT interface each have myriad such activities in their current workflows.
The global media market is valued at ~US$1.8 trillion with both healthy forecast CAGR and an increasingly large portion of that value coming from digital. As search increasingly drives discovery of media content, ASR technology will increase the value of every media asset by making them discoverable through search. Closed captioning requirements are becoming more stringent not only in the US, but in most developed and developing countries. The current manual approach is not scalable or cost-effective. Here, too, ASR can help.
Call centers have literally millions of hours of recorded audio which currently cost money to save and provide no value because they are not searchable or analyze-able. An estimated $41B is lost every year to poor customer service and, unless they can analyze the interactions they are already having, call centers usually find out too late. It is too time and capital intensive to have human talent review every call. By automatically producing a full transcript of every call, speech recognition can literally transform a recorded call into a source of assured compliance, sales rep training or customer service analytics.
Every year, Mary Meeker delivers her annual internet trends report – one of the most widely read, tweeted, forwarded, shared pieces of research anywhere. This year, a large portion of the 100+ page presentation was dedicated to voice and voice search. The idea that as IoT and mobile technology become ubiquitous users will continue to type or tap as their main interface is ludicrous – it will be voice.
As I mentioned, ASR and speech technology is very exciting. It sits at the intersection of big data, artificial intelligence and IoT. There is a lot more to talk about.
Until next time…