If you’ve ever dialed an 800 number to ask or complain about something you bought or to make an inquiry about something you’re thinking of buying, there is a decent chance you were profiled — by the arrangement of your words and the tone of your voice — without knowing it. My research suggests many customer contact centers now approach and manage callers based on what they think the person’s voice or syntax reveal about the individual’s emotions, sentiments, and personality, often in real time.
Businesses devoted to personalized selling — including some name brand favorites— are also preparing to link what your vocal cords supposedly reveal about your emotional state to more traditional demographic, psychographic, and behavioral information.
If during a call with a customer agent this biometric technology tags you as “tense,” you may be offered a discount on your purchase, especially if the company’s records also indicate that you’re a big spender. Being identified as a certain type can also get you routed to a customer service representative whom the company believes works best with your presumed personality: maybe “logical and responsible” or “creative and playful,” two such categories.
Company executives claim they are fulfilling their responsibility to make callers aware of these voice analyses by introducing the customer service interactions with an ambiguous sentence such as, “This call may be recorded for training and quality control purposes.” But this legal turn of phrase is evidence of a growing threat that could turn our very voices into insidious tools for corporate profit.
It’s not just call centers. Devices such as smart speakers and smartphones are now capturing both our words and the timbre of our voices.
Rohit Prasad, Amazon’s chief Alexa scientist, told the online technology publication OneZero that “when she recognizes you’re frustrated with her, Alexa can now try to adjust, just like you or I would do.”
Soon companies may also draw conclusions about your weight, height, age, ethnicity, and more — all characteristics that some scientists believe are revealed by the human voice.
Amazon and Google, the highest-profile forces in voice surveillance today, are not yet using the maximum potential of these tools, seemingly because they are worried about inflaming social fears. The technology is based on the idea that voice is biometric — a part of the body that can be used to identify and evaluate us either instantly and permanently. Businesses using this voice technology to offer us better pricing sounds great, unless you’re in the camp that loses the discount. What if you end up being refused insurance or having to pay much more for it? What if you find yourself turned away during early job screenings or have your cultural tastes prejudged as you surf the internet?
On Jan. 12, Spotify received an extraordinary patent that claims the ability to pinpoint the emotional state, gender, age, accent, and “numerous other characterizations” of an individual, with the aim of recommending music based on its analysis of those factors. In May, a coalition of over 180 musicians, human rights organizations, and concerned individuals sent Spotify a letter demanding that it never use or monetize the patent. Spotify claims it has “no plans” to do so, but the coalition wants a stronger disavowal.
I signed that letter but am also acutely aware that Spotify’s patent is just a tiny outcropping in the emerging voice intelligence industry. One of Google’s patents claims it can analyze the patterns of household movement via special microphones placed throughout the home and identify which resident is in which room.
Based on voice signatures, patented Google circuitry infers gender and age. A parent can program the system to turn electronic devices on or off as a way to control children’s activities. Amazon already claims that its Halo wrist band is able to identify your emotional state during your conversations with others. (The company assures device owners that it cannot use that information). Many hotels have added Amazon or Google devices in their rooms. Construction firms are building Amazon’s Alexa and Google’s Assistant into the walls of new homes.
Major advertisers and ad agencies are already preparing for a not-too-distant future when extracting competitive value from older forms of audience data (demographics, psychographics, internet behavior) will, as one business executive told me, “start to plateau.” They too will turn to voice profiling “to create value.”
Ad executives I’ve interviewed also expressed annoyance that Amazon and Google do not allow them to analyze the words or voices of people who speak to the companies’ apps in Echo and Nest smart speakers. Some advertisers, without hard proof, worry that Amazon and Google are appropriating the voiceprints for their own use. Those concerns have led advertisers to start exploring their own ways to exploit customers’ voice signatures.
All these players recognize that we could be entering a voice-first era, where people will speak their instructions and thoughts to their digital companions rather than type them.
Because of recent major advances in natural language processing and machine learning, individuals will soon be able to speak conversationally not just to their phone assistant or smart speaker but to their dedicated bank assistant, kitchen equipment, restaurant menu, hotel room console, homework assignment, or car.
In a way, much of this sounds incredibly cool — like we may finally be reaching the age of the Jetsons. These head-turning developments sound all the more exciting when some physicians and health care firms argue that a person’s sounds may betray diseases such as Alzheimer’s and Parkinson’s. But these technologies are also worrisome because we engage a slippery slope whenever we start allowing the sounds of our voice and the syntax of our words to personalize ads and offers based on profit motives.
VICE reported that Cerence’s chief technology officer told investors, “What we’re looking at is sharing this data back with” automakers, then “helping them monetize it.”
It could all seem like a small price to pay until you project out the use of this tech into the near future. An apparel store clerk uses an analysis of your voice to determine the likelihood of whether you can be sold certain clothing. You call a fancy restaurant for a reservation, but its voice analysis system concludes that you don’t meet its definition of an acceptable diner and are refused. A school denies a student enrollment in a special course after voice analysis determines that the student was insincere about their interest in it.
How would such a future materialize? It all starts with users giving companies permission.
These laws don’t go far enough to stop voice profiling. Companies will gain customers’ approval by promoting the seductive value of voice-first technologies and exploiting people’s habit-forming tendencies, and by stopping short of explaining how voice analysis will actually work.
Many people don’t tend to think of nice-sounding humanoids as threatening or discriminatory, but they can be both. We’re in a new world of biometrics, and we need to be aware of the dangers it can bring — even to the point of outlawing its use in marketing.
Joseph Turow is a professor of media systems and industries at the University of Pennsylvania. He is the author of “The Voice Catchers: How Marketers Listen in to Exploit Your Feelings, Your Privacy, and Your Wallet.”
The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips. And here’s our email: firstname.lastname@example.org.