> it became necessary to see how our python version compared to the high perform...

> it became necessary to see how our python version compared to the high performance reference implementation in Java.

It sounds like the Java version was also optimized - though it's hard to say to what degree. I think in general if you're doing a lot of numerical computing and matrix operations, the basic java builtins are not going to cut it anyway, and you're going to end up using something like ND4j or JBLAS to get comparable performance to something like numpy, in which case I have hard time imagining it could possibly be /easier/ than numpy.

Granted, the cython stuff is more "advanced", and you probably would struggle a bit (or just would not care for the task) if you were a researcher with a surface understanding of what's really going on, but it's no big deal for a programmer. The other thing is that the defaults usually work well enough that you don't really have to pull out the big guns most of the time. You can also decide to do your optimization progressively. "Let me rewrite this part in cython" is a much quicker win than "let me rewrite all of this to run on top of platform X" or something.

Also it definitely does depend on the kind of problem. For example, the JVM does have awesome tools for distributed computing and streams like http://akka.io/, vs python is more lacking.

In conclusion though, I've been always very pleasantly surprised by how far I can get away with by just dumping larger and larger datasets (current record is in the hundreds of GB for me) into the python / pandas / scipy / numpy stack, and how much of a pleasure it is to use compared to anything else. To toss that away, I'd want to have tried and failed at a problem with the python stack first.