The AI on display, Surgical Robot Transformer[1], is based on the work of Action Chunking with Transformers[2]. These are both transformer models, which means they are fundamentally token-based. The whitepapers go into more detail on how tokenization occurs (it's not text, like an LLM, they are patches of video/sensor data and sequences of actions).
Why wouldn't you look this up before stating it so confidentally? The link is at the top of this very page.
EDIT: I looked it up because I was curious. For your chosen example, Waymo, they also use (token based) transformer models for their state tracking.[3]
Why wouldn't you look this up before stating it so confidentally? The link is at the top of this very page.
EDIT: I looked it up because I was curious. For your chosen example, Waymo, they also use (token based) transformer models for their state tracking.[3]
[1]: https://surgical-robot-transformer.github.io/
[2]: https://tonyzhaozh.github.io/aloha/
[3]: https://waymo.com/research/stt-stateful-tracking-with-transf...