Isn't there an inaudible tone that you can use to disable the assistant? I recall reading somewhere that Amazon used it in their commercials for Alexa so that everyone's Echos weren't lighting up during the commercials. I know when a commercial for the Echo comes on and the voices repeat "Alexa, do X" the Echo I have near the TV speaker doesn't light up.
TV channels already contain inaudible-sound identifiers. Nielsen has people put listeners in their homes, and uses the identifiers to track which channels are getting played.
You can't count on a TV being able to reproduce sounds below about 100Hz or above about 16kHz. The position of the speakers on the back of the TV means you're likely to get a lot of weird phase effects and many TVs have quite heavy audio DSP to compensate for the inadequacy of their speakers. Any hidden signals will need to be in-band and low bit rate with a high level of redundancy.
You can even less expect all the compression in the system that’s designed to leave out everything people can’t hear, to leave in this signal people can’t hear.
To scare you further, multiple brands and models of “SmartTVs” have the ability to fingerprint what is displayed on the screen and report back to a cloud service. Said services are also, not surprisingly, poorly secured.
> and uses the identifiers to track which channels are getting played.
I don't think they're using infrasound for that. I think they use a technology similar to Shazam where it just analyzes the sound to determine what's on.
They are, it's one of the reasons they [Nielsen] bought Arbitron back in 2013. The PPM devices use low-frequency tones encoded into the audio stream at the time of broadcast.
Sometimes they're fairly audible; try listening closely to re-runs of Arrested Development as an example, there are scenes with fairly loud high-pitched noises which I believe were listened for by some phone apps to try estimating viewership.
Interestingly, though, if psychoacoustic compression was used in when reuploading a video, that 3000-6000Hz range might actually be restored—it’s less information if it’s there than if it’s not.
Similarly, if the video is heard through a phone line, the call might be using a symbolic voice codec, which would also restore the range (in the sense that it’s not even storing sound, just phonemes.)
The interesting thing is that Google Assistant has teh same problem, it's right there in the subtitle. It's interesting that it was omitted from the main title.
Rather than journalistic oversight I think this verifies what people have commented many times: that the fact that GA does not have a personalized name makes it to refer to it. SO much so that a very distant third product is included rather than GA.
Does it really? As far as I was aware, it is still possible to perform a black box attack without knowing the weights of a network. Using specially crafted input, it's even possible to "steal" weights from a network!
"We evaluate these attacks under two different threat models. In the black-box model, an attacker uses the speech recognition system as an opaque oracle. We show that the adversary can produce difficult to understand commands that are effective against existing systems in the black-box model.
Under the white-box model, the attacker has full knowledge of the internals of the speech recognition system and uses it to create attack commands that we demonstrate through user testing are not understandable by humans."
That's less clickbaity than "This Hidden Command" and "You can't." that are specifically designed to hack your psychology and make you feel uncomfortable and threatened by not knowing what they are referring to.
I can't recommend highly enough that everyone at least turn on the audible sound when their assistants are listening. At a minimum you should know the kinds of things that end up triggering the device. There's a really wide area of detection and it's interesting to see where that is.
I would also recommending changing your default word at a minimum. Then again, I might also recommend ditching the device entirely but I happen to have one in my kitchen that I like OK sometimes.
With the echo/Alexa, I use timers, clock, alarms (morning), control Spotify, and call my mother and grandparents.
The last is the most important for me; I live hundreds of mile from both, and the key is that the echo is a super easy device for them to use.
I can send a chatty audio message telling them I'm looking to call, or asking a question, which is asynchronous in nature, and when they are ready they can call or message me.
It is conference calling by default. Which is great for family stuff, they wouldn't meet my girlfriend otherwise.
I got them the echo show (I have the normal echo) and it is fabulous. Video calling with no effort. I use my phone to do video on my side.
For this, the echo is worth every damn penny and then some.
I've started using the device when I lose my FireTV remote. It's a good fallback so the kids don't accidentally start another episode of whatever they're watching and melt down haha.
Come now, on a tech and hacking website of all places we should all be better informed than this. There is a voice activation only chip tied directly into the lights that only listens for commands. Once the activation is heard it sends it off to the servers to get processed etc.
With everyone scrutinizing the web traffic around them trying to prove NSA/Google spying with them we'd definitely have found something by now.
So it's probably just yours that is sending off to the NSA. I'd send it back to them for repairs, see if they don't give you a free home mini in compensation!
I can't find the article, but there was a post where someone turned on the Audible Alert of when it was listening and it began listening for a lot of times.
They also said they found a lot of recordings in their Google History that really shouldn't have been there.
Are you talking about the Google Home Minis? Because that was a hardware defect with its capacitive sensors that caused them to completely remove the feature:
The real problem is that Alexa and Siri increase your attack surface area to include every speaker in your home--including any cheapo Bluetooth or internet-connected speakers that could get hacked to produce these human-inaudible sounds.
(To be fair, I recall this specific point coming up in an HR thread about Alexa being able to open your door for Amazon deliveries, but I thought it was worth reiterating here.)
Hidden audio is simply too easy. Hidden audio is the knife that kills desire for any financial services access through a voice assistant - for those smart enough to not follow the horde.
for me it kills all desire to have a voice assistant. I am already in the camp of taping over the camera's in my computers now will I need to worry about the microphone or what comes from the speakers?
so the question is, shouldn't they be able to detect the wavelength of what they are processing to weed out some of the more obvious tricks? with voice recognition could it also not be limited to a voice it is trained to know?
I don't bother taping over stuff. If you think about it, there are probably 10+ microphones in your room (Samsung TVs, phones, laptops, tablets, etc.)
I run 3rd party roms, Linux on all my dev/tv machines, disable Cortana on my gaming laptop, and hope there isn't something listening in all that trusted, untrusted and oss code I'm running.
I told my roommate I'd move out if he ever got an Alexa or Google home device. I do want to run Jarvis, or one of the OSS alternatives. Many of them send your data to Google/Amazon as well if you enable using their Speech-to-Text services, but they also have options for using local OSS decoders as well (and typically enable those by default).
Our phones are so powerful today there is no reason to send your speech to the cloud (someone else's computer). It should just be done locally; and tech should be improved so accuracy is improved locally without needing the larger datasets that Google/Amazon/Apple use.
More devs need to use the OSS assistance instead, and maybe that will push other engineers to no go the easy route and opt to protect their privacy instead.
Without commenting on anything else you said, sending voice to the cloud isn’t necessarily for processing power reasons as much as it is for access to a dynamically tuned ML model that is constantly changing and improving based off of the samples it receives on a daily basis. In theory, anyway.
Can't that model be pushed out to each local device on a regular basis, and sending back the dynamic enhancements of the local copy (from your own usage) could be opt-in? The master wouldn't grow nearly as quickly, but it could be a decent compromise. Or maybe it would grow almost as quickly, if its owner also had a fully hosted option that enough people used.
So maybe not on each of your devices, but on your home server. Something fewer people even have these days, but the ones who want this level of privacy might go for it.
There is a reason I backed mycroft.ai on kickstarter. I also know of snips.ai doing a similar thing. (I picked one at random) I want OSS to succeed and I'm willing to put something into it.
The way I look at it, if someone has hijacked my MacBook Pro's camera to the degree that they can turn off the light, my entire computer has been rooted and I have far bigger problems than someone seeing me in a towel.
As for IoT cameras, though, it's the opposite. I assume all of them have already been pwned, so I never buy them in the first place.
As of now, I don’t believe any OS’s AV framework allows for multiple video sinks with the default stack and drivers; ie if you are able to use the camera in an app, it can be taken as a sign that no other app is using it at the same time. Which can be a source of some consolation/reassurance.
The article notes that both Amazon and Google speaker-based assistants do this for sensitive operations (Google Assistant on Android does it for everything.) A hidden command that can't mimic your voice could, say, play media with Google Home, but wouldn't have full access.
If someone (or something) is watching me through my laptop camera, then they are going to get a very boring show that cannot possibly be an efficient use of their available time and risk.
iOS now has a “text to siri” feature that can disable spoken interfacing but retain the “smart” capabilities of the digital assistant.
Not that digital assistants are worth the risk they bring. Until now I can’t get Siri to do anything useful that isn’t very artificially and carefully phrased.
Why do people keep repeating this tired and offensive myth? I bought my iphone for the hardware and software capabilities that I judged to be the best for my use cases. And I am not the only one that actually had a non-trite reason, I am sure.
Why do you think iPhone has become a status symbol? I would guess it's partially because Android has had so many issues with fake apps and malware in the Google Play Store that Android has become somewhat laughable. Also, other than the antenna, every iPhone is essentially the exact same whether you buy it from AT&T or Verison or the Apple Store etc. There is no shitty Samsung or LG skin on the UI and no bloatware (other than maybe a single app preinstalled from your carrier such as the AT&T app which you can can easily uninstall in a matter of seconds). It integrates with other Apple products without needing much if any configuration.
Also, tons of people choose certain companies to buy from because of their status and reputation. That's how most industries work. Some people only buy American made cars, or only Chevy or only Honda.
There's nothing wrong with brand loyalty especially when the brand is consistently delivering quality products to it's customers. iPhone isn't an arbitrary status symbol. Apple put years and years of effort into building up the reputation they have.
there are? what's the evidence for that other than a subjective impression of Apple iphone's inferiority and a lack of imagination as to other people's motivations?
As far as what Apple iphone does better? Privacy, os updates, integration with my Mac, App store apps, and a bunch of other things. But that all is completely besides the point. Even if there were nothing at all iPhones do better, it's a bit absurd to go from 'well apple is not better than android' to 'people therefore only buy apple because they are shallow'
I just saw someone on HN saying how amazing Google Photos was because of its learning in the cloud.
Having not used Photos on macOS in years beyond ensuring it actually imported my photos, I opened it up and was surprised at the level of analysis it had. It made an album for each city I visited in Mexico. "Puerto Vallarta 2017" and such. Even had a "Furry Friends in Mexico" album that was all the furred beasts I met along the way.
Really well done, and all done locally on my computer.
This is the sort of thing I have no problem voting for with my dollars.
Accessibility. I can pick up any Apple device created in the last 9 years and be guaranteed that as a totally blind person I can use it. Phone, iPad, Watch, whatever. Just works. This is very very far from being the case in Android land.
Their face unlock is better than anything I saw on my Pixel or my Windows Hello laptop. To the point of being usable and preferred over fingerprints instead of a pointless extra.
I switched back because of that + Google Assistant's unwillingness to work without Google tracking my location history constantly (assuming that the "off" switch there even actually stops them).
Standby battery life. I switched because I got tired of pulling out my android phone and seeing it has lost an appreciable amount of battery in the last 45 mins just sitting in my pocket. When I"m not using my iphone the battery drain is minimal.
>No widgets
Widgets suck and slow down android. Apple stores them in a swipe so they dont refresh constantly.
>constant reminders to sign-in
Never had this problem
>constant reminders to update
Constant? You mean, they tell you when they have an update.
>no double tap/settings seem harder to find
Because it's force touch.
>Finger print scanner sucks so bad.
Laughable as Apple makes the best one on the market, you should just move your finger more or enter it twice.
>Little annoyances like the animations to change screen take 0.5 seconds too long.
This is 100% why I hate Android, weird lag in every device.
>I'm not sure what I'm supposed to be enjoying on my iphone
Maybe that Apple isn't selling your privacy for money.
This is demonstrably false. iPhone's second-gen Touch ID sensor is one of, if not the best in the business. It's fast and ridiculously accurate, and is the reason people were disappointed in Face ID when it was first released.
I'm sure there's some people that buy an iPhone for status (gotta have those blue iMessage bubbles), but I don't think it's any kind of majority. I've had an iPhone for the last 5 years or so because I've found there's better quality apps in the app store, and in the past their camera was way better than any other mainstream smartphones (not so much today though. The camera on the Pixel 2 looks way better IMO).
Also, Siri is a joke. If your priority in a smartphone is to make use of it's virtual assistant, then an iPhone is not for you.
Just out of curiosity, what's the frequency range that a typical mic can pick up the signal from? The article did not specially mention about the range instead it said inaudible.
Every model will be different, but the important thing is that the boundaries represent the frequencies within which the signal will stay above a particular threshold of amplitude. A good spec sheet will tell you that threshold, and I've seen things like -3, -6, or -10 dB. It will still pass audio outside of the range but at an undisclosed attenuation.
Yes, typical mics tend to pick up the typical human frequency range, though cheaper mics may have some really poor characteristics at the edges. Usually in the speech range they'll be pretty solid.
However, there's a lot of play within the space. One difference is that microphones do a very direct recording of the sound waves, but what we hear is actually very distorted compared to the "real" sound by the nature of our ear. One of the big differences is that if there is a very loud 4000Hz sound, we can't hear a soft 4005Hz sound near it very well, but the microphone "hears" it just fine. So for instance, you could put out a loud sound for a user, but embed a very quiet command in frequencies the human couldn't hear, but if the listening model doesn't account for that (and there are reasons it wouldn't necessarily want to, because it wants to hear commands even in the presence of significant background noise), you could get commands in to a system. See https://en.wikipedia.org/wiki/Psychoacoustics for discussion about how our ears fail to pick up the "real audio" signal, and how much we've exploited that in music compression.
Now, that was a very brute force example. It sounds to me like what this article is talking about are called "adversarial examples" (https://blog.acolyer.org/2017/02/28/when-dnns-go-wrong-adver... ). Voice recognition doesn't listen the same way we do, it doesn't necessarily take a holistic view of the signal, but is looking for specific frequency patterns and changes and turning that into phonemes, into words, etc. (There's a lot of ways of doing this and I don't specifically know what Alexa and Siri are doing, so that's a really vague overview.) If you know what they are looking for, you can use filters to very, very selectively remove the patterns from a bit of music or something that Alexa might trigger on, and then insert just the bare minimum skeleton of the sounds that it is really recognizing. A human won't be able to hear the difference (most likely; depends on how badly the original is mangled but even if it is audible it is almost certainly not audible without an A/B test and very good ears), but the probably-neural-nets monitoring for sounds will end up superstimulated and interpret the adversarial example as words.
While the adversarial examples work best with tuning to the target network, widely-shared networks like Alexa or Siri mean that such tuning is practical where attacking some custom-trained model used by one person isn't, and experiments have shown that adversarial examples travel between separately-trained nets and even non-neural-net models to a much, much greater degree than what at least my own intuition would have suggested before hand. (See previous link and look for the discussion of "Practical black-box attacks against deep learning systems using adversarial examples". It is extremely counter-intuitive to me how easy this is.)
hmm... the big idea that MP3 figured out was you can document all these "if there is a very loud 4000Hz sound, we can't hear a soft 4005Hz sound near it very well" psychoacoustic phenomena and just throw away all that extra "can't hear it very well" information, resulting in a vastly-smaller filesize that still sounds reasonable (yeah yeah it's not FLAC and the purist needs their gold-plated Monster cables, lets not go there, that's not the point)
So this attack is kinda a "reverse-MP3" that adds those lossy bits back in, but shaped with an attack payload. Or at least it adds enough pieces of the attack payload that the neural net pattern recognition triggers, while the humans say "Doesn't sound like anything to me".
Is that a close-enough explain-like-im-a-freshman?
I primarily brought up psychoacoustics as an example of the way we don't hear the way microphones do. While you could abuse them, it would be more obvious. In this case what we're getting is the audio equivalent of adversarial examples; see the link I gave for some visual examples. What's interesting there is that they are basically invisible to us, but surprisingly robust.
(As another sort of philosophical sidebar, this either proves, or provides very strong evidence, that whatever it is our brains are doing, it is not what deep learning nets are doing, nor anything else vulnerable to such trivial adversarial examples. I've seen adversarial examples against another technique that do seem to work against humans as well, but it requires such a distortion to the image that "I can't tell if that's a dog or a toaster" actually makes sense; it's not just some sort of attack against human vision or something, it's a fancy morphed thing halfway between the two that would probably confuse anything and anybody.)
Hey Berkeley researchers, if you're reading this and want to make a demo that will really freak people out, embed an alexa activation command into this clip: https://youtu.be/iyXtGo418TY?t=1m11s
Well, it can read your schedule, tell you your location and then there is the homekit stuff. It could potentially disable certain security features you might have installed at your house, it could possible perform a very expensive modification to your HVAC configuration, in an extreme case that could maybe be fatal (disable heating in the winter at an older person's home or something like that.) It can also read your messages which are used for MFA in some situations. My wife and I have our accounts hooked together and I can ask Siri where she is and it uses find my friends, it can also kick off find my iPhone which shows my wife's presumed location on a map.
I think it can do Apple Pay actions too.
Degrees of dangerous. I don't have a homepod but presumably it couldn't do anything with Apple pay or your messages. Having Alexa or Siri control home automation stuff seems like something you might want to think about a little, leaving the lights on all day and burning some energy is a very different thing than re-configuring your HVAC or a security camera.
At least on your phone, almost everything it can do that'd be 'dangerous' required your device to be unlocked or has a confirmation button (or both). Examples would be unlocking a door, opening your garage, sending an email/text message, sending Apple Pay Cash, etc.
Send emails is one example. I don't use Siri, but apparently it can send emails including to multiple recipients. The "danger" is only limited by your imagination in the scenario where a malicious stranger has access to your email client.
> Send an email to my mom that says, "I have an emergency and I need $2000. Here's the account number to send it to: 12345. Mom, please don't ask questions. This is urgent. Send the money now."
In practice this will not work - assuming you only have a single email address for 'mom', it then will prompt you to unlock your phone, then show a confirmation screen with a send button on it.
There are way too many interaction steps required by the device owner to make this specific one a feasible attack.
I have no idea, but a "smarter" email would be "mum can I borrow $100, pay you back ASAP, just a bit short today sorry and thanks!"... Mum would be less inclined to phone you in a panic.
If you have previously identified one of your contacts as your mother, yes. If not, Siri will ask who your mother is and if Siri should remember that piece of information.
I hadn't thought of this, it's quite concerning. I don't see how they can safeguard against this without reducing the effectiveness of the voice recognition.
A secret command to "paste clipboard into new email, send to [address]" is a shiny new attack vector without any apparent straight forward way to plug the security hole.
Sure, but your phone is sometimes already unlocked because you used it 30 seconds ago and it now sits on the table. Or it's playing music, or your kid has it etc. I don't think I was thinking about a phone anyway, more the dedicated devices that sit there listening all the time.
So lock your phone every time you set it down. Never leave it unlocked.
I used to have my iPhone lock 5min after I pressed the sleep button. Now that TouchID makes it very easy to unlock, I have it locking immediately.
When I let my friend's 4yo use my iPad, I triple tap the home button and press "Guided Access", which can prevent the user from accessing other apps until I disable it. (I do this because I'm worried about what he may accidentally search on the web, not because I'm worried he'll steal my data!)
Siri is supposed to be tailored to your own voice and not accept commands from anyone else. Sounds like they need to approve that fingerprinting. (or this different on the HomePod since it's supposed to be used by multiple people?)
The only thing I can think of is to either not be heard by others nearby or to mess with people who have these devices. 1 can be done by just typing to something that can actually natively store what you want and 2 is just for fun I guess.
Usually when a comment starts with "am I the only one", the answer is NO. I think this is the first time I have ever seen a possible exception. Well done.
I don't know anything about it, but based on the name, I'll venture a guess.
The audio system has an A/D converter which samples audio at a specific rate -- say 48 KHz. Aliasing occurs when the input to the D/A convert is above 1/2 the sample rate. A 24001 Hz signal is indistinguishable from a 23999 Hz signal. A 25000 Hz signal is indistinguishible from a 23000 Hz signal, etc.
To eliminate these types of problems, there will be an analog lowpass filter before the sampling circuit. There is a gradual rolloff of signal sensitivity. Aliasing still occurs, but the energy of the aliased signals is significantly reduced.
My guess is you take a voice command, even if it constrained to be say 200 Hz to 2KHz, then invert the spectrum and shift it to the 46-48 KHz range. When this high frequency is played back, due to aliasing, the software after the A/D converter sees it as a 0-2KHz signal, though greatly attenuated. To overcome that, the source audio can be tremendously loud. Humans can't hear it, so it remains stealthy.
That's clever but that's so many dB down with any sane anti-aliasing filter that it would require quite the sound source.
Based on flipping through the pages of the paper (https://arxiv.org/pdf/1708.09537.pdf), it looks like they're taking advantage of the non-linearity in the response at high frequencies to effectively demodulate a lower-frequency signal that was mixed up to ~22 KHz.
Which, if that's what they're doing, is totally awesome!
While that’s really difficult it would also mean you’d have to train the assistant before you’d be able to use it, which is a big hurdle most customers probably don’t want.
It is beyond me why people would want to put a live mic in their home. Every dystopian story, real or fiction, features some element of constant observation, and here we go, happily placing these devices in our homes. Insane
It's obliquely linked at "More recently, Mr. Carlini and his colleagues at Berkeley have [LINK: incorporated commands] into audio recognized by Mozilla’s DeepSpeech voice-to-text translation software, an open-source platform."
https://nicholas.carlini.com/code/audio_adversarial_examples...
It makes me think of old text based adventure games where you type in "open door" or "draw gun". All these complex "A.I." based voice assistants still break down to the vocabulary problem. They try to solve the general case by not giving humans/customers the language spec.
There is probably more than just an AI that does speech to text and then a second phase interpreter. I suspect there is some AI in the first layer of Siri/OKGoogle/Alexa that uses context clues to narrow down what you're asking, but who knows for sure. It's a big black box.
Eventually it's like the 90s again where you type "Get ye flask" and you get a box saying, "You cannot get ye flask" and you're left playing Peasants Quest asking, "Why in the world can I not 'get ye flask?!'"
I have a Sonos setup in my house, it's connected to Spotify Premium (as I suspect most Sonos systems are). I recently added a Sonos/Alexa hybrid thing because I liked the idea of being able to play whatever I fancied while cooking.
"Alexa play <whatever> in the kitchen from spotify"
No other combination of words works. I'm not sure why it needs me to say "kitchen", all my Sonos systems are connected together, but if I say anything else it'll either play on just the one speaker or not work at all. I'm not sure why I must say "from spotify", but apparently I do or it ends up playing some random radio station from some other service.
I find things like this quite the mouthful:
"Alexa play Black Sabbath by Black Sabbath in the kitchen from spotify"
..and with a statement so complicated it often misunderstands and starts doing something random.
I would MUCH prefer a simple voice based API. Attempting to understand conversational speech properly rarely seems to work effectively and often just ends up with users memorising a command just to get it to understand.
This doesn't address your question but might be interesting. As someone who has used Android forever and never tried Siri, those anecdotes are mind blowing to me. For me with Google Assistant, media commands work about 80% and successful call initiation is about 90%. And I haven't looked for documentation either. YMMV of course.
Sometimes it works great. Sometimes I can't get Siri to do anything right. Anecdotally I've found that Google's voice assistant is quite a lot better. Unfortunately I am unwilling to accept the rest of Google's terms and conditions so I am stuck with Siri for the foreseeable future.
I disagree with the terms of plenty of things that I use anyway, and I ain't dead yet. I do realize that it could jeopardize me in the future, but the convenience outweighs most of my cares. It's horrible, really. I imagine it's how smokers with no intention of quitting feel.
But really I don’t think anyone deeply believes that these companies are good or evil in the personal human sense, rather it’s a question of incentives and interests.
Google makes money by selling me to advertisers. I understand the business value but I’m personally not comfortable with it.
Amazon makes money by selling me other people’s stuff. I’m comfortable with the business, but sometimes I’m concerned that what’s good for Amazon isn’t what’s good for the people who make the stuff I like.
Apple makes money by selling me stuff that they make. This is the business model that I like best, because when they make stuff I don’t like I don’t buy it, and when they make stuff I love I’m happy to give them my money in exchange.
Buying from the maker is the best win-win virtuous cycle, in my opinion.
I do not perceive corporations as evil or not; I look at how their interests intersect with mine and select the most comfortable fit.
Google makes money by knowing everything they can figure out about me, and they're not especially forthcoming about what they know (or worse, what they think they know). Weirdly, I actually sort of trust them at some level, so if they offered me an option to pay up for a guarantee they won't track me or sell my information to the highest bidder, I would be more interested in their services.
Apple is unapologetically interested in getting the largest capital investment from me while being sufficiently committed to keeping my stuff private that the FBI periodically tries to use law to force them to provide a backdoor. At this time I feel that my data is safer with them than any other viable provider. Also, please note the gov't does not seem too concerned about Android devices. That tells me what I need to know, even if the constant security holes and utter lack of updates for devices more than a year old weren't obvious enough (I've had a bunch of Android phones, I'm not an Apple fanboy)
You are absolutely welcome to trust whichever corporation makes you most comfortable, no quibbles from me :). It's still a mostly free country.
As someone who has used both, the problem with Siri is when it works it's great, but it will fail on the exact same command the next time you try it. Consistency is key in these things working.
Well, I try new things all the time. For example, sometimes it finds an artist or song whose name contains the genre I was trying to play. I could approach 100% with "hey google, play" when something is paused (but not "hey google, stop" when it's playing, because of the ambient noise of whatever is playing making the wake word fail).
Is there a web page one can refer to instead? Seems like it could be far more efficient way to learn, and would also be able to easily highlight newly released features.
I probably should have specified I'm interested in the Google one.
As far as I can tell, Google doesn't post a comprehensive reference, based on:
google now command reference site:google.com
Most any list that's been published is from 3rd party sites, and usually from 2016.
Google's documentation that I've found tends to be of a form of a random list of various different scenarios you can do, but nothing comprehensive.
And besides, my sense is that new development is on Google Assistant, which (I think) requires web search history to be turned on, which in my opinion is stepping over the line. I'm getting tired enough of Google's invasiveness that I'd like to switch to iOS, but I can't stand the UI, and the hardware is all too expensive for my tastes.
"What can I ask Google Assistant" also works. But I assume what you really mean is something out of band. Google Assistant actually does finish with a recommendation to see more in the app.
The main concern would be that these voice assistants are designed to auto activate on that audio and can do everything from make purchases for you to activating devices in your home.