By TOM DOYLE, CEO & Founder, Aspinity
The seemingly simple act of commanding consumer devices by voice is a choice that nearly 118 million Americans now make every day, according to a recent report from eMarketer, the digital marketing research firm.
But while the voice interface is convenient for users, its practice comes at the potential loss of individual privacy. The reason? Always-on, always-connected voice-first devices such as Amazon Alexa and Google Home require a wall plug and an internet connection to powerful cloud processors, making it possible for cloud companies — however benignly — to collect data on personal habits, location and conversation that were never intended for sharing.
Move processing to the edge
To address concerns over user privacy, device designers are attempting to do more of the audio processing within the consumer device, rather than sending users’ voices into the cloud. Moving more processing to the edge is a trend across the IoT industry, and not just for voice data but for other types of sensitive or proprietary data as well, e.g., acoustic events and vibration.
Yet designers have realized limited success because the conventional approach to always-listening edge processing is notoriously inefficient: It digitizes and processes 100% of incoming sound data even though up to 90% of the data is irrelevant noise. This “digitize-first” approach wastes vast amounts of system power digitizing and analyzing the audio signal as it searches for a wake word when there isn’t even speech present, making it impractical for use in small, battery-operated devices.
Workarounds don’t work
Tackling this power issue is critical to keeping private data secure. Unfortunately, it’s also exceptionally difficult. Design engineers have tried workarounds to decrease power consumption in an always-listening system, including duty cycling and reducing the power of each individual component in the audio signal chain that handles the data. The reality is that these kinds of approaches don’t address the root cause of the problem: too much data.
To truly tackle the problem, we need to change our approach to a system solution, not a component solution. We must move to a more efficient edge architecture that intelligently minimizes the amount of data that moves through the system, focusing the system’s energy resources on analyzing voice and not on searching for a wake word in irrelevant noise
Analyze, THEN digitize
It’s time to move away from the digitize-first approach that has dominated voice wake-up device architecture since the invention of voice-first applications.
Inspired by the way the human brain efficiently filters incoming information, differentiating, for example, a dog bark from a baby’s cry, a new ultra-low-power analog machine learning technology called Reconfigurable Analog Modular Processor (RAMP) is changing this paradigm. For the first time, device designers can use low-power analog machine learning to detect which data are important for further processing and analysis prior to data digitization.
Through its “analyze-first” architecture, the RAMP chip allows the higher-power-processing components in the system to stay asleep until voice has actually been detected, and only then does it wake them to “listen” for a possible wake word.
The analyze-first architecture is so efficient that it uses up to 10x less power than a digitize-first-based approach to always-on listening. That’s the difference between a portable voice-first device that runs for weeks or months instead of hours or days on a single battery charge. More importantly, it’s the difference between the current always-listening devices that indiscriminately record and send all sound data to the cloud, and one that has the localized intelligence to select and send only the relevant data, reducing the user’s vulnerability to the loss of private data.
Balance convenience with privacy
The trade-off between making our lives easier and keeping our personal information private is a choice that we are asked to make many times per day in a hundred different ways. Bringing more audio processing capability to the mobile device without draining the battery is the first step toward delivering more secure voice-first solutions. But to succeed in this effort, we must shift to a bio-inspired architecture that determines which data are important and requires further processing at the earliest point in the signal chain. Once we move to the analyze-first approach, only a small fraction of the tens of zettabytes of data collected by the forthcoming generation of always-on IoT devices will require further processing in the device and in the cloud.
A better balance between cloud and edge processing is a better balance between convenience and privacy, and that’s a win for everyone.
To learn about the analyze-first architectural approach, visit: https://www.aspinity.com/Technology