Communication, Language and Behavior

It could be argued, that all technologies need the user to adapt to them in a certain way. We need to learn how to operate a smartphone with touch gestures. That is not necessarily a natural, given thing. In the case of voice assistants we sometimes have to go extrem ways to accomodate this technology.

With a voice assistant, we’re given another interface to the cloud and its services, with the promise of interaction natural for humans. After all, what is more human than our spoken language. In reality, communication with the device or the assistant is rather difficult. It starts with the limitations of language one can use and ends with eradicate and haunted response patterns across several devices in your home.

The way we have to communicate to the assistants was one of the highest rated problems by my participants.

The device is always listening. It waits for a keyword after which it starts recording and then sends that piece of audio to the servers to be taken apart, “understood”, and to produce an adequate response. This process introduces the element of delay. I observed more than once, that users are highly irritated and distrustful of the device in terms of hearing. Did the device not hear me? Did the assistant not understand me? What follows are awkward moments of waiting, going closer to the device, or trying to issue a command just for the assistant to start speaking again as well.

Exactly, and now when I say “Alexa, stop the music” The participants hurls the command towards the other room, where Alexa resides, waits, and then repeats louder “Alexa! Stop the music!”

In the context of my research, I interviewed and observed Swiss-German speaking people, and that is a big issue. The assistant doesn’t understand dialects and especially not the Swiss-German ones. Users have to fall back to proper German or another language, like French or English. And even if you speak a language the assistant can understand a lot of misunderstandings happen, for example, that it sets an alarm at 01:00 in the night instead of a one-hour timer.

“Ok Google, LA Salami spielen” The Answer from the device is not understandable, since I sit in the other room. It seems to make the participant uncomfortable, she laughs it off. The device seems to have misunderstood salami. “Ok Google, Musik von LA Salami spielen.” This time the device reacts correctly but weirdly spells “LA” in German, not knowing it’s part of a name.

More than once, the assistant falls down the uncanny valley, when it is working with data from other services. The announcement of the Spotify Christmas playlist can be hilarious. In those moments, the users become acutely aware of the machine behind it all. The communication becomes artificial, code-like, as not to upset the assistant.

The simple instructions are the key to a successful relationship with this device. After a relatively short honeymoon phase, in which the user is enthusiastic or enchanted by this magical little thing, asking for facts or offloading some simple mental tasks is all that is left. The assistant is perceived as single-minded, as it can’t learn through the interaction. The user was promised more, but the assistant just can’t uphold those expectations. Because as opposed to smartphones and computers, where you can at least change the software, the voice assistant device is static.

Stuff like that, so it’s like, she doesn’t learn anything. You can say “shutters up” 100 times and the next time she says “I don’t know any device called shutters”.

The absence of any physicality isn’t helping in creating bonds between user and machine. After the unboxing, which was generally loved by my participants, the device is placed strategically within the apartment. It is not omnipresent and can’t listen into every single corner of the appartment. Another alternative is to buy additional devices to serve a multi-room home. The device allows for simple touch gestures. It includes three buttons and a switch. Curiously enough, the designers were so mindful as to add a switch to turn of the microphone, to tackle privacy issues. My participants were suspicious. Why would you want to add a switch for that, if you have nothing to hide? The three buttons control the volume and the last button is generally to stop the device from talking.

I had only two cases, where physicality was mentioned specifically. One participant let the device accidentally fall, for which she was very very sorry. Another participant, in anger, kicks the device now and then…