Voice Technology: As Google Duplex Wows and Scares, a Post-Screen World Emerges with Questions that the Smart Speakers Cannot Answer
By Michael Martinez and Lori Cameron
Share this on:
Such is computing’s future—to each of our voices. For search. Command. And even identity recognition.
Voice assistants hang on every word we say, when prompted. Their genesis has created an entire family named Siri, Alexa, Cortana, Pepper, Watson, and, most recently, Duplex. One wonders when smart speakers will replace what you’re looking at now—this screen.
The promise of voice interfaces was demonstrated again in May when Google rolled out Duplex, an AI-driven voice assistant so lifelike and sophisticated that some found it astounding but others deemed it unsettling and wondered if a new era of robocall abuse just dawned.
Then, in August, Amazon’s Alexa and Microsoft’s Cortana announce that they are officially talking to one another—allowing one to access the features of the other, for leverage in the competitive market.
Google CEO, Sundar Pichai, demos Google Duplex, which calls a local business to make an appointment.
This new world of conversational interfaces tantalizes us and yet raises new questions about humanity’s interactions with AI-driven robots. Do they have First Amendment rights? Who’s culpable if they commit a crime?
These quandaries—as well as the foundations of voice tech—have inspired the works of developers and researchers alike. Indeed, voice assistants touch upon almost half of the Top 10 Tech Trends predicted by the IEEE Computer Society for 2018: deep learning; robotics; artificial intelligence; and ethics, laws, and policies for privacy, security, and liability.
Consumer use rises: 32 percent already own one as of late 2018, and projections call for almost half — 48 percent — of consumers to own after the upcoming holidays, according to an Adobe Digital Insights survey of more than 1,000 consumers.
Here’s a summary of exclusive content about how experts delve into–struggle with–the technology often just called “voice,” drawn from the Computer Society Digital Library and placed in front of the paywall for a limited time.
But the fact is, we should be less fearful of world domination by robot armies and more concerned about how much AI can control our perception of reality, invade our privacy, and make life-changing decisions for us without our explicit consent.
Technological Singularity and AI Doomsday Scenarios – Zeng describes technological singularity as that which might happen as artificial intelligence exceeds human intellectual capacity and control. Tech heavyweights like Bill Gates, Elon Musk, and the late Stephen Hawking have stated that AI could be an existential threat to humans, while other influential thinkers like Eric Horvitz, director of Microsoft Research Lab, reject the AI doomsday scenarios—though he cautions that concerns and risks must be addressed as we grow more reliant on AI.
Impact of Automation on Economy and Employment – This discussion pits the economic benefits for consumers when prices are lowered through automated production against the jobs that are lost when robots replace people. Zeng says regardless “a new group of people who can innovate, design, and develop new products, services, and business models might emerge as winners in the new knowledge economy.”
Legal Ramifications and Accountability – The clearest example used in this discussion involves driverless cars and who is liable in the event of an accident. Zeng offers another example when he says, “In the medical domain, who will be responsible for accidents or errors made by robotic surgical systems or automatic diagnostic systems?”
Privacy Considerations and Human Rights – In addition to invasion of privacy issues when, for example, drones conduct surveillance of unsuspecting individuals, this discussion would have to address autonomous weaponry and “killer robots,” as they were called in multilateral talks at the United Nations back in 2014, and how they can rob us of our rights and even our lives.
Human-AI Relations – As more robots assume the role of caretaker—to children, the elderly, and the disabled—what are the implications when humans develop emotional attachments to robots? Zeng says that the more we interact with robots, the more likely we will form some kind of attachment.
Robot Rights – “Humans have human rights. Animals have animal rights. Should robots have robot rights? Should they be treated as conscious beings? Does the First Amendment protect robots’ speech? Do we need laws to protect robots from being abused?” writes Zeng. The emerging field of roboethics seeks to address these and other questions.
Alexa and friends, can we trust you? The outrageous tales
The stories are mind-boggling and hilarious.
“In January 2017, a 6-year-old Dallas girl sharing her love of dollhouses and cookies with the family’s new Amazon Echo Dot prompted Alexa to order—much to her parents’ surprise—a $160 KidKraft Sparkle Mansion and four pounds of sugar cookies. After reporting the story, the anchor of a San Diego TV morning show remarked, ‘I love the little girl saying “Alexa ordered me a dollhouse.”’ Several Echo owners watching the broadcast reported that, after hearing the anchor’s comment, their own devices also tried to order pricey dollhouses,” said researchers at Korea University and the National Institute of Standards and Technology.
The authors relate another story: “During the Super Bowl, a Google Home ad using the system’s voice-search-activation phrase ‘OK, Google’ reportedly set off many viewers’ own devices. Capitalizing on the incident, in April, Burger King ran an ad for the Whopper in which an actor playing an employee at one of its restaurants says that 15 seconds isn’t enough time to describe the sandwich and instead asks Google, which cites the definition from Wikipedia—prompting viewers’ devices to repeat the question and thus essentially extend the ad. Ironically, after publicly exploiting the system’s vulnerability, the marketing stunt backfired—someone altered the Wikipedia entry for the product to say that it contained cyanide and caused cancer—and became a sobering lesson that a hijacked IVA could cause real harm.”
Because of this, Hyunji Chung, Michaela Iorga, Jeffrey Voas, and Sangjin Lee explore the nature of intelligent virtual assistants (IVAs) by asking some important questions: Are IVAs secure? Are they recording our conversations? If so, where is this voice data stored?
“The presence of IVAs in homes makes this a public-facing challenge, and one that attracts instant—and unwelcome—media attention when problems arise,” they say.
Is it possible for cognitive assistants to become so integrated to our daily routine that we aren’t even aware of their presence anymore?
“Cognitive assistants have the potential to become profound technologies that disappear and weave themselves into our everyday lives. They still have quite a way to go, and we face many challenges in achieving their promise, but they are well on the path along this journey,” says Maria R. Ebling of IBM’s Thomas J. Watson Research Center.
As data becomes the new oil, concerns about eavesdropping and privacy
If the rooms in our homes knew they were empty and adjusted lighting and air conditioning accordingly, the gains in energy conservation and lower energy bills could be huge.
However, the benefits of artificial intelligence and smart spaces rely on how willing we are to be invaded by third parties in exchange for the convenience and efficiency their services offer.
“We’re shifting into a world of pervasive measurement and of socializing with increasingly human-like machine intelligences. The benefits could be enormous if our homes were more sensitive and responsive to activity and environmental conditions,” says Chris Arkenberg, principal analyst at Bespoke Futures.
As with any artificial intelligence service, the better the service, the more personal data must be collected about you.
And as companies position themselves as the most efficient yet secure option, Apple is taking the lead by trying to keep all data collection and processing in our hands rather than its cloud.
“However, this sets up a dynamic in which privacy goes to those who can afford it, while the rest of us get access to the cheap seats in exchange for giving up the minutia of our lives,” says Arkenberg.
An open infrastructure for deep neural networks (DNNs)
Asking an intelligent virtual assistant a question might not get you an intelligent answer.
Answering our questions can be tough on an IVA. First, your question is transcribed to text. Then, it is analyzed for semantic meaning, searched against a database, and an answer sent back to you. Depending on the question, the IVA must be able to handle a variety of queries including classifying images, recognizing faces, decoding speech, and analyzing text.
As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo gain traction, web service companies are hoping to solve these challenges by using large deep neural networks (DNNs) to tackle huge, complex jobs.
“These are challenging machine learning problems that require powerful algorithms to provide a satisfactory experience for users. One such machine learning algorithm, deep neural networks (DNNs), has recently gained popularity in solving this wide range of challenges,” say researchers from the University of Michigan.
The researchers propose a new model called DjiNN, an open infrastructure for DNN as a service in warehouse scale computers, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing.
Without question, it is much easier for us humans to hear and respond to the emotion in someone’s voice than it is for a robot.
However, if virtual assistants are going to serve us even better, they will have to listen to what we say—and don’t say.
Our words combined with the sound of our voice can reveal enough information to allow even more accurate and perhaps life-saving responses, whether we find ourselves frightened, stressed, or hurt.
“Besides the text information of users’ queries, the acoustic information and query attributes are very important in inferring emotions in voice dialogue applications,” say researchers from Tsinghua University and Sichuan University.
The researchers propose a Hybrid Emotion Inference Model (HEIM), which involves what is called a Latent Dirichlet Allocation (LDA) to extract text features and a Long Short-Term Memory (LSTM) to model the acoustic features.
“While they’re a tool for accessing and integrating digital devices and functionality within the home, smart speakers highlight how services are becoming increasingly reliant on always-on, high-speed Internet connections, and storage and processing in datacenters far outside the home,” write A. J. Brush of Microsoft Research, Mike Hazas of Lancaster University, and Jeannie Albrecht of Williams College.
They cast skepticism on whether smart speakers like Amazon’s Echo or Google Home will be permanent members of our homes.
“It remains to be seen whether these are just nifty voice-activated gadgets that replicate what smartphones already do—one of which is serving as if-this-then-that ‘glue’ between sensors and appliances—or whether they’ll serve a lasting, meaningful role working with inhabitants of the home,” they write.
Michael Martinez, the editor of the Computer Society’s Computer.Org website and its social media, has covered technology as well as global events while on the staff at CNN, Tribune Co. (based at the Los Angeles Times), and the Washington Post. He welcomes email feedback, and you can also follow him on LinkedIn.
About Lori Cameron
Lori Cameron is a Senior Writer for the IEEE Computer Society and currently writes regular features for Computer magazine, Computing Edge, and the Computing Now and Magazine Roundup websites. Contact her at firstname.lastname@example.org. Follow her on LinkedIn.