The wasm security model is a lot more complex than your CPU's. It was never a virtual CPI architecture.
And because it's not a virtual CPU architecture, it need those extra complexities. Your CPU is designed for running high-level languages without any of those extra features, but wasm can't do it (well enough).
I think you could really speed up WASM a lot if CPUs supported the WASM sandbox better. For example, it would be nice to have modern segmented memory support where you could set an offset and max address such that all other pointer operations worked off that mini-TLB, generating an interrupt if the bounds are bypassed.
More complex designs tend to go for a full-blown MMU, and I wonder if the presence of such a feature would be warranted when you could go for either full-blown process isolation (which afaik is not that expensive on modern CPUs), or just go with static and dynamic checks (basically and if statement that checks if the address is valid, which can be optimized away like 90% of the time, when iterating through known arrays etc.)
The part that would be nice to bypass is TLB switching and cache invalidation. WASM doesn't need full page translation because it largely assumes the containing process already does that and because it doesn't support allocating non-contiguous memory without using multiple linear memories. Even with multiple linear memories, it still doesn't require (or event want) page translation because these memories each have their own address space.
The issue with if statements are that the stats/bits branch predictors use are a finite resource. You really need these checks to be inlined as well because otherwise you'll thrash the instruction cache. If it just had some special register with an interrupt, the CPU could just always assume the index is valid for the purposes of speculative execution and branch prediction.
This is sorta what plenty runtimes do, but on a larger scale with pages with incorrect permissions. I don’t think that interrupts going through the OS would be feasibly fast for scales at typical array sizes, probably a properly branch-predicted conditional will be faster.
WASM has a very simple security model compared to almost any modern CPU. No processor has a linear memory map with no segmentation or permissions these days.
The wasm security model is a lot more complex than your CPU's. It was never a virtual CPI architecture.
And because it's not a virtual CPU architecture, it need those extra complexities. Your CPU is designed for running high-level languages without any of those extra features, but wasm can't do it (well enough).