Alexa and Siri Can Hear Hidden Commands

coleca · on May 10, 2018

Isn't there an inaudible tone that you can use to disable the assistant? I recall reading somewhere that Amazon used it in their commercials for Alexa so that everyone's Echos weren't lighting up during the commercials. I know when a commercial for the Echo comes on and the voices repeat "Alexa, do X" the Echo I have near the TV speaker doesn't light up.

adambowles · on May 10, 2018

If I recall correctly, the adverts omit a certain frequency range from the assistant's invocation phrase which human's won't notice is missing

Edit: Yep, omit / reduce tones in the 3000 - 6000 Hz range https://www.reddit.com/r/amazonecho/comments/5oer2u/i_may_ha...

SlowRobotAhead · on May 10, 2018

Which makes WAY more sense than a range outside of human hearing that OP implicated ("inaudible").

Expecting TV's to play tones outside the range people can hear is ridiculous.

jimrandomh · on May 10, 2018

TV channels already contain inaudible-sound identifiers. Nielsen has people put listeners in their homes, and uses the identifiers to track which channels are getting played.

jdietrich · on May 10, 2018

You can't count on a TV being able to reproduce sounds below about 100Hz or above about 16kHz. The position of the speakers on the back of the TV means you're likely to get a lot of weird phase effects and many TVs have quite heavy audio DSP to compensate for the inadequacy of their speakers. Any hidden signals will need to be in-band and low bit rate with a high level of redundancy.

tinus_hn · on May 11, 2018

You can even less expect all the compression in the system that’s designed to leave out everything people can’t hear, to leave in this signal people can’t hear.

derefr · on May 11, 2018

So, basically, steganography.

buro9 · on May 10, 2018

Er... holy shit to the thought that (Alexa|Assistant|Siri) + Neilsen (or just by Google) could monitor advert viewing in millions of homes.

taneq · on May 11, 2018

I thought a few apps had been discovered over the years that fingerprinted TV ads played near the phone and reported them?

https://www.howtogeek.com/338409/hundreds-of-smartphone-apps...

jsjohnst · on May 11, 2018

To scare you further, multiple brands and models of “SmartTVs” have the ability to fingerprint what is displayed on the screen and report back to a cloud service. Said services are also, not surprisingly, poorly secured.

jedberg · on May 10, 2018

> and uses the identifiers to track which channels are getting played.

I don't think they're using infrasound for that. I think they use a technology similar to Shazam where it just analyzes the sound to determine what's on.

amitparikh · on May 10, 2018

They are, it's one of the reasons they [Nielsen] bought Arbitron back in 2013. The PPM devices use low-frequency tones encoded into the audio stream at the time of broadcast.

https://en.wikipedia.org/wiki/Portable_People_Meter

leggomylibro · on May 10, 2018

Sometimes they're fairly audible; try listening closely to re-runs of Arrested Development as an example, there are scenes with fairly loud high-pitched noises which I believe were listened for by some phone apps to try estimating viewership.

derefr · on May 11, 2018

Interestingly, though, if psychoacoustic compression was used in when reuploading a video, that 3000-6000Hz range might actually be restored—it’s less information if it’s there than if it’s not.

Similarly, if the video is heard through a phone line, the call might be using a symbolic voice codec, which would also restore the range (in the sense that it’s not even storing sound, just phonemes.)

shawn · on May 10, 2018

"Inaudible" doesn't necessarily mean a range outside of human hearing.

For example you could embed a 6kHz tone in a way that's inaudible to humans due to the other frequencies in the waveform.

sideshowb · on May 10, 2018

And promptly lose it to lossy compression

Jun8 · on May 10, 2018

The interesting thing is that Google Assistant has teh same problem, it's right there in the subtitle. It's interesting that it was omitted from the main title.

Rather than journalistic oversight I think this verifies what people have commented many times: that the fact that GA does not have a personalized name makes it to refer to it. SO much so that a very distant third product is included rather than GA.

lozenge · on May 10, 2018

Usually the attack requires the source code (or weightings of the neural network), I'd be surprised if they are able to actually attack these systems.

saagarjha · on May 10, 2018

Does it really? As far as I was aware, it is still possible to perform a black box attack without knowing the weights of a network. Using specially crafted input, it's even possible to "steal" weights from a network!

srtjstjsj · on May 11, 2018

https://www.usenix.org/conference/usenixsecurity16/technical...

"We evaluate these attacks under two different threat models. In the black-box model, an attacker uses the speech recognition system as an opaque oracle. We show that the adversary can produce difficult to understand commands that are effective against existing systems in the black-box model.

Under the white-box model, the attacker has full knowledge of the internals of the speech recognition system and uses it to create attack commands that we demonstrate through user testing are not understandable by humans."

ninguem2 · on May 10, 2018

Can't you just "machine learn" the attack?

dirkgently · on May 11, 2018

And ".... hidden commands that are undetectable to the human ear to Apple’s Siri, Amazon’s Alexa and Google’s Assistant."

And then "digital assistants like Amazon’s Alexa or Apple’s Siri are set to outnumber people by 2021".

Why is Google mentioned in one context but not other?

tinus_hn · on May 11, 2018

Dark times are ahead for Assistant. You must warn her of your discovery!

baxtr · on May 10, 2018

Maybe the main title would be too long if it included “and google’s voice assistant”

GW150914 · on May 10, 2018

This was the title I used, it must have been changed: “Alexa and Siri Can Hear This Hidden Command. You Can’t.”

I couldn’t have added the subtitle because as you say, it would have been too long.

osteele · on May 10, 2018

"Alexa, Siri, and Google Home, can hear inaudible commands"

"Alexa, Siri, and Google Home, can hear commands that humans can't"

srtjstjsj · on May 11, 2018

That's less clickbaity than "This Hidden Command" and "You can't." that are specifically designed to hack your psychology and make you feel uncomfortable and threatened by not knowing what they are referring to.

GW150914 · on May 11, 2018

I agree with you and Osteele, but I didn’t feel it was my place to editorialize the title, but I’m happy for the moderators to do it.

sailfast · on May 10, 2018

I can't recommend highly enough that everyone at least turn on the audible sound when their assistants are listening. At a minimum you should know the kinds of things that end up triggering the device. There's a really wide area of detection and it's interesting to see where that is.

I would also recommending changing your default word at a minimum. Then again, I might also recommend ditching the device entirely but I happen to have one in my kitchen that I like OK sometimes.

mkirklions · on May 10, 2018

I want to get rid of my alexa, but my wife uses it as a kitchen timer.

We literally dont use it for anything else.

Might get one of those philips light sets since our living room is weird... Tbh, id rather not use alexa..

toast0 · on May 10, 2018

The clock / timer functions of Alexa/Google Home are super useful, but nothing else seems compelling to me.

Someone could clearly make an offline device that did voice recognition clock and timer, but does anyone?

Normal_gaussian · on May 10, 2018

With the echo/Alexa, I use timers, clock, alarms (morning), control Spotify, and call my mother and grandparents.

The last is the most important for me; I live hundreds of mile from both, and the key is that the echo is a super easy device for them to use.

I can send a chatty audio message telling them I'm looking to call, or asking a question, which is asynchronous in nature, and when they are ready they can call or message me.

It is conference calling by default. Which is great for family stuff, they wouldn't meet my girlfriend otherwise.

I got them the echo show (I have the normal echo) and it is fabulous. Video calling with no effort. I use my phone to do video on my side.

For this, the echo is worth every damn penny and then some.

chatmasta · on May 10, 2018

Or you could spend $3 and get a kitchen timer that you can slap to start, and beeps when it’s done!

Normal_gaussian · on May 10, 2018

Alexa has named timers, which means you can time multiple things (a common occurrence in cookery).

It is also voice activated, which means it can be done when your hands are full (another common occurrence in cookery).

gus_massa · on May 10, 2018

It looks like nice product request that someone here can make with knowledge in the area. Perhaps a kickstarter.

Bender · on May 10, 2018

Perhaps you could connect it to foot peddles, commonly used by guitarists that have their hands full.

toast0 · on May 11, 2018

Setting the timer takes a lot of steps and usually wants clean hands.

l1n · on May 10, 2018

Not my project, but here's an offline voice assistant: https://hackaday.io/project/32425-modular-smart-speaker-assi...

sailfast · on May 11, 2018

I've started using the device when I lose my FireTV remote. It's a good fallback so the kids don't accidentally start another episode of whatever they're watching and melt down haha.

jjulius · on May 10, 2018

That's literally the only thing I use "OK, Google" on my phone for, too.

daseiner1 · on May 11, 2018

Siri timer-only user chiming in

tomcam · on May 10, 2018

I use my iPhone for that

sailfast · on May 10, 2018

Gotta be hands free in the kitchen! Hard to use your iPhone when you're covered in raw turkey.

SlowRobotAhead · on May 10, 2018

Turn "Hey Siri" off though!

Say something like "My firetruck is red" -then- manually activate siri, she was listening the whole time.

TheCoreh · on May 10, 2018

WHAAAAT, I'm kinda creeped out

Though it makes sense that it would have a rolling buffer

jackhack · on May 10, 2018

but "Hey Spybot-Listening-To-Every-Conversation-In-This-Household-And-Building-A-Transcript-For-The-NSA" doesn't roll off the tongue the same way.

crowbahr · on May 10, 2018

Come now, on a tech and hacking website of all places we should all be better informed than this. There is a voice activation only chip tied directly into the lights that only listens for commands. Once the activation is heard it sends it off to the servers to get processed etc.

With everyone scrutinizing the web traffic around them trying to prove NSA/Google spying with them we'd definitely have found something by now.

So it's probably just yours that is sending off to the NSA. I'd send it back to them for repairs, see if they don't give you a free home mini in compensation!

0xCMP · on May 10, 2018

I can't find the article, but there was a post where someone turned on the Audible Alert of when it was listening and it began listening for a lot of times.

They also said they found a lot of recordings in their Google History that really shouldn't have been there.

Here's another article about the history: https://qz.com/526545/googles-been-quietly-recording-your-vo...

mynameisvlad · on May 10, 2018

Are you talking about the Google Home Minis? Because that was a hardware defect with its capacitive sensors that caused them to completely remove the feature:

https://www.theverge.com/circuitbreaker/2017/10/11/16462572/...

0xCMP · on May 11, 2018

I think this was what I was thinking about.

astronautjones · on May 10, 2018

https://gizmodo.com/the-house-that-spied-on-me-1822429852

TeMPOraL · on May 10, 2018

"Hey, NSA snitch"? "Hello Big Brother?"

sudouser · on May 10, 2018

that should be the activation command on all devices

caf · on May 10, 2018

"I love Big Brother" for more accurate flavour.

konschubert · on May 10, 2018

> I can't recommend highly enough that everyone at least turn on the audible sound when their assistants are listening.

How do I do that for Google home?

scotth · on May 11, 2018

The Google Home app. I remember it being under accessibility.

teachrdan · on May 10, 2018

The real problem is that Alexa and Siri increase your attack surface area to include every speaker in your home--including any cheapo Bluetooth or internet-connected speakers that could get hacked to produce these human-inaudible sounds.

(To be fair, I recall this specific point coming up in an HR thread about Alexa being able to open your door for Amazon deliveries, but I thought it was worth reiterating here.)

bsenftner · on May 10, 2018

Hidden audio is simply too easy. Hidden audio is the knife that kills desire for any financial services access through a voice assistant - for those smart enough to not follow the horde.

Shivetya · on May 10, 2018

for me it kills all desire to have a voice assistant. I am already in the camp of taping over the camera's in my computers now will I need to worry about the microphone or what comes from the speakers?

so the question is, shouldn't they be able to detect the wavelength of what they are processing to weed out some of the more obvious tricks? with voice recognition could it also not be limited to a voice it is trained to know?

djsumdog · on May 10, 2018

I don't bother taping over stuff. If you think about it, there are probably 10+ microphones in your room (Samsung TVs, phones, laptops, tablets, etc.)

I run 3rd party roms, Linux on all my dev/tv machines, disable Cortana on my gaming laptop, and hope there isn't something listening in all that trusted, untrusted and oss code I'm running.

I told my roommate I'd move out if he ever got an Alexa or Google home device. I do want to run Jarvis, or one of the OSS alternatives. Many of them send your data to Google/Amazon as well if you enable using their Speech-to-Text services, but they also have options for using local OSS decoders as well (and typically enable those by default).

Our phones are so powerful today there is no reason to send your speech to the cloud (someone else's computer). It should just be done locally; and tech should be improved so accuracy is improved locally without needing the larger datasets that Google/Amazon/Apple use.

More devs need to use the OSS assistance instead, and maybe that will push other engineers to no go the easy route and opt to protect their privacy instead.

ComputerGuru · on May 10, 2018

Without commenting on anything else you said, sending voice to the cloud isn’t necessarily for processing power reasons as much as it is for access to a dynamically tuned ML model that is constantly changing and improving based off of the samples it receives on a daily basis. In theory, anyway.

hunter2_ · on May 10, 2018

Can't that model be pushed out to each local device on a regular basis, and sending back the dynamic enhancements of the local copy (from your own usage) could be opt-in? The master wouldn't grow nearly as quickly, but it could be a decent compromise. Or maybe it would grow almost as quickly, if its owner also had a fully hosted option that enough people used.

ComputerGuru · on May 10, 2018

ML models (depending on how implemented and the scope thereof) can be _massive_ (as in gigabytes to terabytes).

hunter2_ · on May 10, 2018

So maybe not on each of your devices, but on your home server. Something fewer people even have these days, but the ones who want this level of privacy might go for it.

bluGill · on May 10, 2018

There is a reason I backed mycroft.ai on kickstarter. I also know of snips.ai doing a similar thing. (I picked one at random) I want OSS to succeed and I'm willing to put something into it.

freehunter · on May 10, 2018

You should bother taping over cameras you're not using though. It's way too easy to hijack them and keep the light from turning on when you do.

mortenjorck · on May 10, 2018

The way I look at it, if someone has hijacked my MacBook Pro's camera to the degree that they can turn off the light, my entire computer has been rooted and I have far bigger problems than someone seeing me in a towel.

As for IoT cameras, though, it's the opposite. I assume all of them have already been pwned, so I never buy them in the first place.

ComputerGuru · on May 10, 2018

As of now, I don’t believe any OS’s AV framework allows for multiple video sinks with the default stack and drivers; ie if you are able to use the camera in an app, it can be taken as a sign that no other app is using it at the same time. Which can be a source of some consolation/reassurance.

jonknee · on May 10, 2018

Do you really put tape on your phone?

freehunter · on May 10, 2018

I use my phone's camera, so no.

reaperducer · on May 10, 2018

> could it also not be limited to a voice it is trained to know?

iPhones already do this. My wife's iPhone won't respond to me, and vice versa.

Though I don't know if this is enough to mitigate the attack mentioned in TFA.

dragonwriter · on May 10, 2018

The article notes that both Amazon and Google speaker-based assistants do this for sensitive operations (Google Assistant on Android does it for everything.) A hidden command that can't mimic your voice could, say, play media with Google Home, but wouldn't have full access.

Retra · on May 11, 2018

If someone (or something) is watching me through my laptop camera, then they are going to get a very boring show that cannot possibly be an efficient use of their available time and risk.

ComputerGuru · on May 10, 2018

iOS now has a “text to siri” feature that can disable spoken interfacing but retain the “smart” capabilities of the digital assistant.

Not that digital assistants are worth the risk they bring. Until now I can’t get Siri to do anything useful that isn’t very artificially and carefully phrased.

mkirklions · on May 10, 2018

I got an iphone from work and I was shocked at how little Siri could do.

Given all the hype from my friends and the commercials, I expected something outstanding.

Nope, significantly worse than google's assistant.

That was the start of my complete disappointment in apple as I continued to use an iphone and wonder- Why is anyone buying this?

mattnewton · on May 10, 2018

They aren’t buying it for Siri

kfrzcode · on May 10, 2018

The same reason people buy Coach bags etc... status and brand.

sillyquiet · on May 10, 2018

Why do people keep repeating this tired and offensive myth? I bought my iphone for the hardware and software capabilities that I judged to be the best for my use cases. And I am not the only one that actually had a non-trite reason, I am sure.

craftyguy · on May 10, 2018

Sure, you did (there are always outliers in every group), but for every one of you there are dozens who are buying it as a status symbol.

svnsets · on May 10, 2018

Why do you think iPhone has become a status symbol? I would guess it's partially because Android has had so many issues with fake apps and malware in the Google Play Store that Android has become somewhat laughable. Also, other than the antenna, every iPhone is essentially the exact same whether you buy it from AT&T or Verison or the Apple Store etc. There is no shitty Samsung or LG skin on the UI and no bloatware (other than maybe a single app preinstalled from your carrier such as the AT&T app which you can can easily uninstall in a matter of seconds). It integrates with other Apple products without needing much if any configuration.

Also, tons of people choose certain companies to buy from because of their status and reputation. That's how most industries work. Some people only buy American made cars, or only Chevy or only Honda. There's nothing wrong with brand loyalty especially when the brand is consistently delivering quality products to it's customers. iPhone isn't an arbitrary status symbol. Apple put years and years of effort into building up the reputation they have.

sillyquiet · on May 10, 2018

there are? what's the evidence for that other than a subjective impression of Apple iphone's inferiority and a lack of imagination as to other people's motivations?

mkirklions · on May 10, 2018

Is this a myth? What does apple have better than android in 2018?

sillyquiet · on May 10, 2018

As far as what Apple iphone does better? Privacy, os updates, integration with my Mac, App store apps, and a bunch of other things. But that all is completely besides the point. Even if there were nothing at all iPhones do better, it's a bit absurd to go from 'well apple is not better than android' to 'people therefore only buy apple because they are shallow'

always_good · on May 10, 2018

I just saw someone on HN saying how amazing Google Photos was because of its learning in the cloud.

Having not used Photos on macOS in years beyond ensuring it actually imported my photos, I opened it up and was surprised at the level of analysis it had. It made an album for each city I visited in Mexico. "Puerto Vallarta 2017" and such. Even had a "Furry Friends in Mexico" album that was all the furred beasts I met along the way.

Really well done, and all done locally on my computer.

This is the sort of thing I have no problem voting for with my dollars.

mkirklions · on May 10, 2018

Has it been a while since you used an alternative?

All of those seem like expected features in any OS/phone.

saagarjha · on May 10, 2018

What other phone does this?

ctoth · on May 10, 2018

Accessibility. I can pick up any Apple device created in the last 9 years and be guaranteed that as a totally blind person I can use it. Phone, iPad, Watch, whatever. Just works. This is very very far from being the case in Android land.

majormajor · on May 10, 2018

Their face unlock is better than anything I saw on my Pixel or my Windows Hello laptop. To the point of being usable and preferred over fingerprints instead of a pointless extra.

I switched back because of that + Google Assistant's unwillingness to work without Google tracking my location history constantly (assuming that the "off" switch there even actually stops them).

tcoff91 · on May 10, 2018

Standby battery life. I switched because I got tired of pulling out my android phone and seeing it has lost an appreciable amount of battery in the last 45 mins just sitting in my pocket. When I"m not using my iphone the battery drain is minimal.

mulletbum · on May 10, 2018

Everyone who likes iPhones owns one, so that removes status.

Personal anecdote, I dislike Apple computers, but I like their phones. So I am at least one person who doesn't buy them for brand.

This notion is outdated.

mkirklions · on May 10, 2018

Can you explain what is good about their phones?

Things that drive me crazy-

>No widgets

>constant reminders to sign-in

>constant reminders to update

>no double tap/settings seem harder to find

>Finger print scanner sucks so bad.

>Little annoyances like the animations to change screen take 0.5 seconds too long.

I'm not sure what I'm supposed to be enjoying on my iphone.

xena · on May 10, 2018

> No widgets

Flick to the left from the homescreen or from the notification screen. They have them.

> constant reminders to sign-in

After an update, sure, but TouchID got rid of most of these.

> constant reminders to update

iPhones actually get updated. Reminding you to install them is good practice.

> no double tap/settings seem harder to find

Please explain this gesture.

> Finger print scanner sucks so bad.

You probably (like me) only scanned part of your fingerprint.

> Little annoyances like the animations to change screen take 0.5 seconds too long.

Dig into the accessibility settings. They let you change a lot more than you think you should be able to.

mulletbum · on May 22, 2018

>No widgets Widgets suck and slow down android. Apple stores them in a swipe so they dont refresh constantly. >constant reminders to sign-in Never had this problem >constant reminders to update Constant? You mean, they tell you when they have an update. >no double tap/settings seem harder to find Because it's force touch. >Finger print scanner sucks so bad. Laughable as Apple makes the best one on the market, you should just move your finger more or enter it twice. >Little annoyances like the animations to change screen take 0.5 seconds too long. This is 100% why I hate Android, weird lag in every device. >I'm not sure what I'm supposed to be enjoying on my iphone Maybe that Apple isn't selling your privacy for money.

always_good · on May 10, 2018

Please spare us from Android vs iPhone fanboi wars on HN.

mynameisvlad · on May 10, 2018

> Finger print scanner sucks so bad.

This is demonstrably false. iPhone's second-gen Touch ID sensor is one of, if not the best in the business. It's fast and ridiculously accurate, and is the reason people were disappointed in Face ID when it was first released.

svnsets · on May 10, 2018

I'm sure there's some people that buy an iPhone for status (gotta have those blue iMessage bubbles), but I don't think it's any kind of majority. I've had an iPhone for the last 5 years or so because I've found there's better quality apps in the app store, and in the past their camera was way better than any other mainstream smartphones (not so much today though. The camera on the Pixel 2 looks way better IMO).

Also, Siri is a joke. If your priority in a smartphone is to make use of it's virtual assistant, then an iPhone is not for you.

devy · on May 10, 2018

Just out of curiosity, what's the frequency range that a typical mic can pick up the signal from? The article did not specially mention about the range instead it said inaudible.

And here is another article I found that mentions the normal 20-20kHz frequency response range: http://blog.shure.com/mic-basics-frequency-response/

Isn't that mostly overlap with the human ear capability? I understand each person is different, etc. But just curious the specifics.

hunter2_ · on May 10, 2018

Every model will be different, but the important thing is that the boundaries represent the frequencies within which the signal will stay above a particular threshold of amplitude. A good spec sheet will tell you that threshold, and I've seen things like -3, -6, or -10 dB. It will still pass audio outside of the range but at an undisclosed attenuation.

jerf · on May 10, 2018

Yes, typical mics tend to pick up the typical human frequency range, though cheaper mics may have some really poor characteristics at the edges. Usually in the speech range they'll be pretty solid.

However, there's a lot of play within the space. One difference is that microphones do a very direct recording of the sound waves, but what we hear is actually very distorted compared to the "real" sound by the nature of our ear. One of the big differences is that if there is a very loud 4000Hz sound, we can't hear a soft 4005Hz sound near it very well, but the microphone "hears" it just fine. So for instance, you could put out a loud sound for a user, but embed a very quiet command in frequencies the human couldn't hear, but if the listening model doesn't account for that (and there are reasons it wouldn't necessarily want to, because it wants to hear commands even in the presence of significant background noise), you could get commands in to a system. See https://en.wikipedia.org/wiki/Psychoacoustics for discussion about how our ears fail to pick up the "real audio" signal, and how much we've exploited that in music compression.

Now, that was a very brute force example. It sounds to me like what this article is talking about are called "adversarial examples" (https://blog.acolyer.org/2017/02/28/when-dnns-go-wrong-adver... ). Voice recognition doesn't listen the same way we do, it doesn't necessarily take a holistic view of the signal, but is looking for specific frequency patterns and changes and turning that into phonemes, into words, etc. (There's a lot of ways of doing this and I don't specifically know what Alexa and Siri are doing, so that's a really vague overview.) If you know what they are looking for, you can use filters to very, very selectively remove the patterns from a bit of music or something that Alexa might trigger on, and then insert just the bare minimum skeleton of the sounds that it is really recognizing. A human won't be able to hear the difference (most likely; depends on how badly the original is mangled but even if it is audible it is almost certainly not audible without an A/B test and very good ears), but the probably-neural-nets monitoring for sounds will end up superstimulated and interpret the adversarial example as words.

While the adversarial examples work best with tuning to the target network, widely-shared networks like Alexa or Siri mean that such tuning is practical where attacking some custom-trained model used by one person isn't, and experiments have shown that adversarial examples travel between separately-trained nets and even non-neural-net models to a much, much greater degree than what at least my own intuition would have suggested before hand. (See previous link and look for the discussion of "Practical black-box attacks against deep learning systems using adversarial examples". It is extremely counter-intuitive to me how easy this is.)

floatrock · on May 10, 2018

hmm... the big idea that MP3 figured out was you can document all these "if there is a very loud 4000Hz sound, we can't hear a soft 4005Hz sound near it very well" psychoacoustic phenomena and just throw away all that extra "can't hear it very well" information, resulting in a vastly-smaller filesize that still sounds reasonable (yeah yeah it's not FLAC and the purist needs their gold-plated Monster cables, lets not go there, that's not the point)

So this attack is kinda a "reverse-MP3" that adds those lossy bits back in, but shaped with an attack payload. Or at least it adds enough pieces of the attack payload that the neural net pattern recognition triggers, while the humans say "Doesn't sound like anything to me".

Is that a close-enough explain-like-im-a-freshman?

jerf · on May 10, 2018

I primarily brought up psychoacoustics as an example of the way we don't hear the way microphones do. While you could abuse them, it would be more obvious. In this case what we're getting is the audio equivalent of adversarial examples; see the link I gave for some visual examples. What's interesting there is that they are basically invisible to us, but surprisingly robust.

(As another sort of philosophical sidebar, this either proves, or provides very strong evidence, that whatever it is our brains are doing, it is not what deep learning nets are doing, nor anything else vulnerable to such trivial adversarial examples. I've seen adversarial examples against another technique that do seem to work against humans as well, but it requires such a distortion to the image that "I can't tell if that's a dog or a toaster" actually makes sense; it's not just some sort of attack against human vision or something, it's a fancy morphed thing halfway between the two that would probably confuse anything and anybody.)

floatrock · on May 10, 2018

ah, thanks for the clarification! (and the interesting philosophical sidebar!)

floatrock · on May 10, 2018

Hey Berkeley researchers, if you're reading this and want to make a demo that will really freak people out, embed an alexa activation command into this clip: https://youtu.be/iyXtGo418TY?t=1m11s

newsbinator · on May 10, 2018

I can't get Siri to turn on the flashlight or lock my phone.

What can Siri do that's dangerous?

Nelson69 · on May 10, 2018

Well, it can read your schedule, tell you your location and then there is the homekit stuff. It could potentially disable certain security features you might have installed at your house, it could possible perform a very expensive modification to your HVAC configuration, in an extreme case that could maybe be fatal (disable heating in the winter at an older person's home or something like that.) It can also read your messages which are used for MFA in some situations. My wife and I have our accounts hooked together and I can ask Siri where she is and it uses find my friends, it can also kick off find my iPhone which shows my wife's presumed location on a map.

I think it can do Apple Pay actions too.

Degrees of dangerous. I don't have a homepod but presumably it couldn't do anything with Apple pay or your messages. Having Alexa or Siri control home automation stuff seems like something you might want to think about a little, leaving the lights on all day and burning some energy is a very different thing than re-configuring your HVAC or a security camera.

happyopossum · on May 10, 2018

At least on your phone, almost everything it can do that'd be 'dangerous' required your device to be unlocked or has a confirmation button (or both). Examples would be unlocking a door, opening your garage, sending an email/text message, sending Apple Pay Cash, etc.

exodust · on May 10, 2018

Send emails is one example. I don't use Siri, but apparently it can send emails including to multiple recipients. The "danger" is only limited by your imagination in the scenario where a malicious stranger has access to your email client.

newsbinator · on May 10, 2018

I guess you could tell Siri,

> Send an email to my mom that says, "I have an emergency and I need $2000. Here's the account number to send it to: 12345. Mom, please don't ask questions. This is urgent. Send the money now."

Would Siri lookup your mom's email and send that?

happyopossum · on May 10, 2018

In practice this will not work - assuming you only have a single email address for 'mom', it then will prompt you to unlock your phone, then show a confirmation screen with a send button on it.

There are way too many interaction steps required by the device owner to make this specific one a feasible attack.

exodust · on May 10, 2018

I have no idea, but a "smarter" email would be "mum can I borrow $100, pay you back ASAP, just a bit short today sorry and thanks!"... Mum would be less inclined to phone you in a panic.

Infernal · on May 10, 2018

If you have previously identified one of your contacts as your mother, yes. If not, Siri will ask who your mother is and if Siri should remember that piece of information.

Infernal · on May 11, 2018

Tagging on to say I just tried this - even though I activate Siri without unlocking, I was asked to unlock my phone to continue with the email.

tinus_hn · on May 11, 2018

Call or text an expensive pay line, read the codes sent in confirmation text messages or mails. Open a web page that has an exploit on it.

exodust · on May 10, 2018

I hadn't thought of this, it's quite concerning. I don't see how they can safeguard against this without reducing the effectiveness of the voice recognition.

A secret command to "paste clipboard into new email, send to [address]" is a shiny new attack vector without any apparent straight forward way to plug the security hole.

happyopossum · on May 10, 2018

The obvious plug would be to not allow such a ridiculously unsafe command without requiring you to unlock your device, much like my iPhone does today.

exodust · on May 10, 2018

Sure, but your phone is sometimes already unlocked because you used it 30 seconds ago and it now sits on the table. Or it's playing music, or your kid has it etc. I don't think I was thinking about a phone anyway, more the dedicated devices that sit there listening all the time.

DecoPerson · on May 10, 2018

So lock your phone every time you set it down. Never leave it unlocked.

I used to have my iPhone lock 5min after I pressed the sleep button. Now that TouchID makes it very easy to unlock, I have it locking immediately.

When I let my friend's 4yo use my iPad, I triple tap the home button and press "Guided Access", which can prevent the user from accessing other apps until I disable it. (I do this because I'm worried about what he may accidentally search on the web, not because I'm worried he'll steal my data!)

kalleboo · on May 10, 2018

Siri is supposed to be tailored to your own voice and not accept commands from anyone else. Sounds like they need to approve that fingerprinting. (or this different on the HomePod since it's supposed to be used by multiple people?)

hedora · on May 10, 2018

Am I the only one that wants a nice cherry mechanical keyboard that transcribes typed commands to inaudible voice commands?

freeone3000 · on May 10, 2018

Why use the voice commands, then? Why not just type?

tudelo · on May 10, 2018

The only thing I can think of is to either not be heard by others nearby or to mess with people who have these devices. 1 can be done by just typing to something that can actually natively store what you want and 2 is just for fun I guess.

jrochkind1 · on May 10, 2018

it's Art.

ct0 · on May 10, 2018

Skunkleton · on May 10, 2018

Usually when a comment starts with "am I the only one", the answer is NO. I think this is the first time I have ever seen a possible exception. Well done.

asdsa5325 · on May 10, 2018

yes, that sounds silly

callumprentice · on May 10, 2018

DolphinAttack: Inaudible Voice commands: https://youtu.be/21HjF4A3WE4

Froyoh · on May 10, 2018

Can't they just limit the activation frequency? Seems easy enough

tasty_freeze · on May 11, 2018

I don't know anything about it, but based on the name, I'll venture a guess.

The audio system has an A/D converter which samples audio at a specific rate -- say 48 KHz. Aliasing occurs when the input to the D/A convert is above 1/2 the sample rate. A 24001 Hz signal is indistinguishable from a 23999 Hz signal. A 25000 Hz signal is indistinguishible from a 23000 Hz signal, etc.

To eliminate these types of problems, there will be an analog lowpass filter before the sampling circuit. There is a gradual rolloff of signal sensitivity. Aliasing still occurs, but the energy of the aliased signals is significantly reduced.

My guess is you take a voice command, even if it constrained to be say 200 Hz to 2KHz, then invert the spectrum and shift it to the 46-48 KHz range. When this high frequency is played back, due to aliasing, the software after the A/D converter sees it as a 0-2KHz signal, though greatly attenuated. To overcome that, the source audio can be tremendously loud. Humans can't hear it, so it remains stealthy.

cozzyd · on May 11, 2018

That's clever but that's so many dB down with any sane anti-aliasing filter that it would require quite the sound source.

Based on flipping through the pages of the paper (https://arxiv.org/pdf/1708.09537.pdf), it looks like they're taking advantage of the non-linearity in the response at high frequencies to effectively demodulate a lower-frequency signal that was mixed up to ~22 KHz.

Which, if that's what they're doing, is totally awesome!

ColanR · on May 10, 2018

This kind of thing has been on HN before...the new thing here is that the command is embedded in a human-audible sound clip.

TheGuyWhoCodes · on May 10, 2018

Can't this just be fixed so that Alexa, Siri, etc. will only accept your voice pattern?

tinus_hn · on May 11, 2018

While that’s really difficult it would also mean you’d have to train the assistant before you’d be able to use it, which is a big hurdle most customers probably don’t want.

mdekkers · on May 11, 2018

It is beyond me why people would want to put a live mic in their home. Every dystopian story, real or fiction, features some element of constant observation, and here we go, happily placing these devices in our homes. Insane

danShumway · on May 11, 2018

> This month, some of those Berkeley researchers published a research paper that went further

Pet peeve, I really wish that this was a link.

Was I just blind? Is the actual paper linked anywhere in the article?

srtjstjsj · on May 11, 2018

It's obliquely linked at "More recently, Mr. Carlini and his colleagues at Berkeley have [LINK: incorporated commands] into audio recognized by Mozilla’s DeepSpeech voice-to-text translation software, an open-source platform." https://nicholas.carlini.com/code/audio_adversarial_examples...

It's probably this paper: https://nicholas.carlini.com/papers/2018_dls_audioadvex.pdf

discussed in January when it went up on Arxiv: https://news.ycombinator.com/item?id=16220376

danShumway · on May 11, 2018

Thanks, I guess I was just blind :)

jacksmith21006 · on May 10, 2018

Replaced our Echos with Google homes. Curious if they are also vulnerabile?

frenchie4111 · on May 10, 2018

> In the wrong hands, the technology could be used to unlock doors, wire money or buy stuff online — simply with music playing over the radio.

Why hide it in radio content? Couldn't they just play it out loud when I am not home?

saagarjha · on May 10, 2018

> Amazon said that it doesn’t disclose specific security measures, but it has taken steps to ensure its Echo smart speaker is secure.

So, security by obscurity?

notsofastbuddy · on May 10, 2018

> So, security by obscurity?

Obscurity is a perfectly valid layer in a security system. It's just not sufficient as the primary security mechanism.

banku_brougham · on May 10, 2018

Serious off-topic question:

Are there docs for Siri so that i can learn what it can/can’t do?

I have tried skipping songs, playing a genre, set random play, and similar in iTunes — generally a failure, often initiates an unwanted phone call.

On the phone I can successfully call the intended contact about 50% of the time, possibly because I have ~250 contacts.

I suspect that if i knew the right words to interact with the API I could have a more enjoyable Siri experience.

Alternatively, is there a way to disable it completely — as in long hold on headphones button does not initiate.

djsumdog · on May 10, 2018

It makes me think of old text based adventure games where you type in "open door" or "draw gun". All these complex "A.I." based voice assistants still break down to the vocabulary problem. They try to solve the general case by not giving humans/customers the language spec.

There is probably more than just an AI that does speech to text and then a second phase interpreter. I suspect there is some AI in the first layer of Siri/OKGoogle/Alexa that uses context clues to narrow down what you're asking, but who knows for sure. It's a big black box.

Eventually it's like the 90s again where you type "Get ye flask" and you get a box saying, "You cannot get ye flask" and you're left playing Peasants Quest asking, "Why in the world can I not 'get ye flask?!'"

framebit · on May 10, 2018

To be fair, in Thy Dungeonman if you asked three times to "Get ye flask" it was revealed that it was a load-bearing flask.

hunter2_ · on May 10, 2018

But you wouldn't know to type xyzzy.

_dan · on May 10, 2018

I have a Sonos setup in my house, it's connected to Spotify Premium (as I suspect most Sonos systems are). I recently added a Sonos/Alexa hybrid thing because I liked the idea of being able to play whatever I fancied while cooking.

"Alexa play <whatever> in the kitchen from spotify"

No other combination of words works. I'm not sure why it needs me to say "kitchen", all my Sonos systems are connected together, but if I say anything else it'll either play on just the one speaker or not work at all. I'm not sure why I must say "from spotify", but apparently I do or it ends up playing some random radio station from some other service.

I find things like this quite the mouthful:

"Alexa play Black Sabbath by Black Sabbath in the kitchen from spotify"

..and with a statement so complicated it often misunderstands and starts doing something random.

I would MUCH prefer a simple voice based API. Attempting to understand conversational speech properly rarely seems to work effectively and often just ends up with users memorising a command just to get it to understand.

hunter2_ · on May 10, 2018

This doesn't address your question but might be interesting. As someone who has used Android forever and never tried Siri, those anecdotes are mind blowing to me. For me with Google Assistant, media commands work about 80% and successful call initiation is about 90%. And I haven't looked for documentation either. YMMV of course.

wlesieutre · on May 10, 2018

A recent survey of iPhone X users backs up the notion that people aren't happy with it

https://techpinions.com/wp-content/uploads/2018/04/Screen-Sh...

rootusrootus · on May 10, 2018

Sometimes it works great. Sometimes I can't get Siri to do anything right. Anecdotally I've found that Google's voice assistant is quite a lot better. Unfortunately I am unwilling to accept the rest of Google's terms and conditions so I am stuck with Siri for the foreseeable future.

hunter2_ · on May 10, 2018

I disagree with the terms of plenty of things that I use anyway, and I ain't dead yet. I do realize that it could jeopardize me in the future, but the convenience outweighs most of my cares. It's horrible, really. I imagine it's how smokers with no intention of quitting feel.

rootusrootus · on May 10, 2018

Ha! Every time I pick up a big juicy cheeseburger...

dirkgently · on May 11, 2018

And what are those specific "terms and conditions"?

It's funny hoe Google is somehow perceived as evil while Apple or Amazon not.

If I have to trust someone with mt data (and we all do), I will choose Google over anyone else.

denverkarma · on May 11, 2018

I disagree that Amazon is not considered evil.

But really I don’t think anyone deeply believes that these companies are good or evil in the personal human sense, rather it’s a question of incentives and interests.

Google makes money by selling me to advertisers. I understand the business value but I’m personally not comfortable with it.

Amazon makes money by selling me other people’s stuff. I’m comfortable with the business, but sometimes I’m concerned that what’s good for Amazon isn’t what’s good for the people who make the stuff I like.

Apple makes money by selling me stuff that they make. This is the business model that I like best, because when they make stuff I don’t like I don’t buy it, and when they make stuff I love I’m happy to give them my money in exchange.

Buying from the maker is the best win-win virtuous cycle, in my opinion.

rootusrootus · on May 11, 2018

I do not perceive corporations as evil or not; I look at how their interests intersect with mine and select the most comfortable fit.

Google makes money by knowing everything they can figure out about me, and they're not especially forthcoming about what they know (or worse, what they think they know). Weirdly, I actually sort of trust them at some level, so if they offered me an option to pay up for a guarantee they won't track me or sell my information to the highest bidder, I would be more interested in their services.

Apple is unapologetically interested in getting the largest capital investment from me while being sufficiently committed to keeping my stuff private that the FBI periodically tries to use law to force them to provide a backdoor. At this time I feel that my data is safer with them than any other viable provider. Also, please note the gov't does not seem too concerned about Android devices. That tells me what I need to know, even if the constant security holes and utter lack of updates for devices more than a year old weren't obvious enough (I've had a bunch of Android phones, I'm not an Apple fanboy)

You are absolutely welcome to trust whichever corporation makes you most comfortable, no quibbles from me :). It's still a mostly free country.

matwood · on May 10, 2018

As someone who has used both, the problem with Siri is when it works it's great, but it will fail on the exact same command the next time you try it. Consistency is key in these things working.

hammock · on May 11, 2018

https://assistant.google.com/explore

rayiner · on May 10, 2018

So a simple media command fails 1 in 5 times? This is the best SV can do with a incredibly well-studied technology (voice recognition)?

hunter2_ · on May 10, 2018

Well, I try new things all the time. For example, sometimes it finds an artist or song whose name contains the genre I was trying to play. I could approach 100% with "hey google, play" when something is paused (but not "hey google, stop" when it's playing, because of the ambient noise of whatever is playing making the wake word fail).

primitivesuave · on May 10, 2018

For disabling Siri, the toggles are in Settings under "Siri and Search" - the long hold was a major annoyance to me as well.

tinus_hn · on May 11, 2018

If you activate Siri and say nothing you’ll get a list of examples. Or you can just ask Siri ‘What can you do for me?’

mikeash · on May 10, 2018

Ask it “What can I ask you?” and it will show a big list of capabilities and phrases to trigger them.

mistermann · on May 10, 2018

Is there a web page one can refer to instead? Seems like it could be far more efficient way to learn, and would also be able to easily highlight newly released features.

happyopossum · on May 10, 2018

Googling for 'list of siri commands' comes up with many excellent web pages, including this one from Apple themselves:

https://www.apple.com/ios/siri/

mistermann · on May 10, 2018

I probably should have specified I'm interested in the Google one.

As far as I can tell, Google doesn't post a comprehensive reference, based on:

google now command reference site:google.com

Most any list that's been published is from 3rd party sites, and usually from 2016.

Google's documentation that I've found tends to be of a form of a random list of various different scenarios you can do, but nothing comprehensive.

And besides, my sense is that new development is on Google Assistant, which (I think) requires web search history to be turned on, which in my opinion is stepping over the line. I'm getting tired enough of Google's invasiveness that I'd like to switch to iOS, but I can't stand the UI, and the hardware is all too expensive for my tastes.

Bjartr · on May 10, 2018

I found this at least

[1] https://www.google.com/search?q=site:assistant.google.com/se...

nitrogen · on May 10, 2018

It is really weird to refer to a device in the second person. Why isn't there some other way to get that info?

hunter2_ · on May 10, 2018

"What can I ask Google Assistant" also works. But I assume what you really mean is something out of band. Google Assistant actually does finish with a recommendation to see more in the app.

on May 10, 2018

[dead]

CurtHagenlocher · on May 10, 2018

Most things with attached microphones aren't using them as a command port.

ocdtrekkie · on May 10, 2018

The main concern would be that these voice assistants are designed to auto activate on that audio and can do everything from make purchases for you to activating devices in your home.

TeMPOraL · on May 10, 2018

1) You have no control over it, a third party in the cloud does.

2) Scale of deployment.