Great find. They're using pocket sphinx as well I see.
Their mic module listens for 1 second, establishes a threshold volume, then for the next 9 seconds it listens for a disturbance above a weighted threshold. They're sampling at a reasonably high rate.
Either way I'm going to try their code and see the performance. The approach should be faster than mine but I didn't see how they handle the edge case of a command coming in over the 10 sec boundary when it restarts listening.
It's all open source, so hopefully you can find what you're looking for.