It's definitely possible to focus on the CUDA/GPU side without diving deep into ...

It's definitely possible to focus on the CUDA/GPU side without diving deep into the math. Understanding parallel computing principles and memory optimization is key. I've found that focusing on specific use cases, like optimizing inference, can be a good way to learn. On that note, you might find https://github.com/codelion/optillm useful – it optimizes LLM inference and could give you practical experience with GPU utilization. What kind of AI applications are you most interested in optimizing?