I mention the reasons for using SunSpider. They're effectively micro-benchmarks that exercise various implementation features on their respective platforms, they're easy to port, and simple to understand.
If we treat them appropriately and carefully (i.e. we don't look at two scores and use it to make an absolute judgement, but simply as a starting point for investigation and thinking about what it implies about the platform), they are of some use.
Even with the transcendental-heavy benches, it might still have been the case that there was some other implementation issue that slowed down asm.js on those benches. It's good to confirm that there aren't.
NBodies suggests asm.js costs associated with double-indirection. NSieve suggest that asm.js ARM codegen could be improved relative to x86. Binary-trees suggests that Dalvik may have an issue with highly recursive code. Binary-trees also suggests that there may be a perf issue with the default NDK libc's malloc (or free, or both) implementation.
All of these things are useful to think about, as long as we avoid the pitfall of using the benchmark as a judgement tool, and remember to use it as a starting point for analysis.
Lastly, I felt the exercise was useful in confirming that across a set of commonly accepted microbenches, asm.js was generally able to hold its own. It's good to confirm these things empirically, instead of assuming them.
Yes, I understand what VM implementors can derive from individual results of each and every micro-benchmark.
[I usually go as far as saying that only VM implementors and people with deep enough understanding of VMs should ever micro-benchmark and pay attention to micro-benchmarking results]
I would however argue that you could just run each microbenchmark separately and report results without even mentioning that those benchmarks (in their short running forms) constitute SunSpider.
Another thing that you could have done is either disable transcendental cache for IonMonkey generated code altogether (or add transcendental cache in Java / C++ code) and reported pure JavaScript results on the main, prominently visible graph.
> asm.js was generally able to hold its own
I am sure that you wanted to say OdinMonkey here instead of asm.js. asm.js is a set of syntactic and typing rules, it does not have any performance per se. OdinMonkey is an implementation.
I have seen people conflate these two things together again and again.
I disagree that only VM implementors should think about these things. It suggests a degree of overspecialization that I think is good to avoid. Even if one is not a VM implementor, it's always good to have a good grasp on critical evaluation of results. That's something I tried to promote with my article: to encourage people to think more deeply about benchmarks than simply noting the final number and proceeding.
I considered disabling the math cache in SpiderMonkey and using those scores, but it seemed inappropriate. I don't know to what extent the tuning and other optimizations are otherwise targeted to SunSpider. The math cache is obvious, but there may be many non-obvious tunings. Let's face it, JS engine devs have been trying to improve scores for a long time now. To what degree have tuning of GC, of when-to-jit, of what-to-jit, of when-to-inline, of what-to-inline, of the numerous thresholds.. been targeted at SunSpider?
I didn't feel comfortable just removing the math cache and simply stating "well now we've leveled the playing field so native JS is not at an advantage".
Also, you're entirely correct about OdinMonkey vs asm.js. It's too easy to use "asm.js" as a shorthand for particular implementations. It's something people tend to do (e.g. "java" to mean "the Sun JVM"), but I should definitely try to avoid that.
If we treat them appropriately and carefully (i.e. we don't look at two scores and use it to make an absolute judgement, but simply as a starting point for investigation and thinking about what it implies about the platform), they are of some use.
Even with the transcendental-heavy benches, it might still have been the case that there was some other implementation issue that slowed down asm.js on those benches. It's good to confirm that there aren't.
NBodies suggests asm.js costs associated with double-indirection. NSieve suggest that asm.js ARM codegen could be improved relative to x86. Binary-trees suggests that Dalvik may have an issue with highly recursive code. Binary-trees also suggests that there may be a perf issue with the default NDK libc's malloc (or free, or both) implementation.
All of these things are useful to think about, as long as we avoid the pitfall of using the benchmark as a judgement tool, and remember to use it as a starting point for analysis.
Lastly, I felt the exercise was useful in confirming that across a set of commonly accepted microbenches, asm.js was generally able to hold its own. It's good to confirm these things empirically, instead of assuming them.