The AI on display, *Surgical Robot Transformer*[1], is based on the work of *Act...

The AI on display, Surgical Robot Transformer[1], is based on the work of Action Chunking with Transformers[2]. These are both transformer models, which means they are fundamentally token-based. The whitepapers go into more detail on how tokenization occurs (it's not text, like an LLM, they are patches of video/sensor data and sequences of actions).

Why wouldn't you look this up before stating it so confidentally? The link is at the top of this very page.

EDIT: I looked it up because I was curious. For your chosen example, Waymo, they also use (token based) transformer models for their state tracking.[3]

[1]: https://surgical-robot-transformer.github.io/

[2]: https://tonyzhaozh.github.io/aloha/

[3]: https://waymo.com/research/stt-stateful-tracking-with-transf...