Can you elaborate more on what you mean by "lackluster performance"? What is the use case? If you're looking for top speed in terms of C/assembly performance - I'd say yes, probably the JVM will get in your way. However, I spent months building a database/key-value store in Clojure and it's quite doable to write very high performance code in Clojure as long as you put the right type hints everywhere. Tools like YourKit can help you identify the bottlenecks in your program. Again, depends on the use case, but Clojure makes it very, very idiomatic to write code that makes full use of a multicore system - something possible in the like of Java, C, et al but definitely not idiomatic at all.
I'm not the author, but looking through the sibling comments I feel I should point out the obvious: startup time.
It can take long seconds for Clojure REPL to start up. No other dynamic language I had contact with was that bad on that front. Even Erlang and Elixir boot up faster!
Of course, I know about various ways of setting up a daemon process in the background, but then there is a need for reloading/hot-swapping code, which Clojure also fails to do well. Both Erlang and Common Lisp (also Emacs... even Emacs!) are better in this regard, both for different reasons.
Anyway, Clojure overall performance impression - how fast it feels for users - is bad and will remain bad unless the startup problem is fixed.
Not so much the JVM, but Clojure and how it uses the JVM (although, as you say, it's possible to get not too far from top JVM performance with Clojure). It's fairly easy to get C performance (and even beat it in concurrent code) for the same amount of effort on the JVM. Currently, the main handicap the JVM has is the lack of arrays-of-structs which may cause lots of cache-misses, and requires less-than-elegant code to overcome. This, thankfully, is being addressed by the addition of value types.
Yup, the JVM is something that people (esp. startups & new-co's) are, all things being equal, vastly underusing or over-complaining about :).
Before I started doing Clojure the JVM ecosystem was this "scary" thing but 4 years later I've learned to appreciate it a lot - stable AND sound libraries, high performance out-of-the box, and very,very high performance if you really need it.
There's absolutely nothing about native languages that makes them a-priori faster than the JVM. On the contrary, the JVMs compiler is likely to emit betteroptimized machine code than their compilers. Some low-level languages (like C/C++/Rust) can (and do in many cases) offer better performance, but that's not because they're "native" but because they allow (and require) finer control over every operation. This level of control comes at a significant cost, though.
What clojure and even scala do with the JVM scares me. The most concrete example I know of is using objects for everything. 1 of the fundamental problems I have with many of the higher level JVM languages is how much they hide from you. That being said, most people won't notice. I do numerical software in java so might be a bit more sensitive to some of this stuff.
It can though. We are a java shop that does HPC and deep learning. Our work is open source. We've been around 2.5 going on 3 years now. Over the years we've surpassed bottleneck after bottleneck in the JVM.
The speed gains we've seen are massive. Java can't compete with good ole simd and the like for numerical computing.
We are a big advocate of the JVM as a platform but let's be clear about its weaknesses. You need to use unsafe (see: netty,aeron, and any other low level database written in java or using java as the core language with some c++)and other tricks out of the box for real performance.
I'm willing to bet that 99% of the performance difference will be solved by value types. Anyway, Cliff Click, who wrote H2O[1], a large machine learning platform, reports achieving Fortran speeds (i.e. maximum throughput) with pure Java[2]. This means that a C application won't even be 1% faster.
> Java can't compete with good ole simd and the like for numerical computing.
Oh, sure, for some specialized use cases of course that's true, but you could use OpenCL in Java, too. With 10+ years with C/C++ and 10+ years with Java, I'd bet on Java when it comes to performance bang-for-the-buck almost in every case (given that it's a large app), and even more than that the more concurrent, complex and unpredictable the app is (but this requires carefully looking at the design).
Value types will make Java competitive in absolute terms in more and more domains. Also, with the new JIT (Graal) you can control machine-code generation at whatever level of detail you want.
> You need to use unsafe (see: netty,aeron, and any other low level database written in java or using java as the core language with some c++)and other tricks out of the box for real performance.
They're not using unsafe for throughput but (mostly) for latency. That's a whole other matter. Also, some of those use "mechanical sympathy" as a driver of performance instead of algorithms that the JVM makes easier. I've built a concurrent DB in pure Java that relies on synchronization that would require at least double the effort if I were to write it in C (hazard pointers, etc.), and may not even have better performance.
>> I'm willing to bet that 99% of the performance difference will be solved by value types. Anyway, Cliff Click, who wrote H2O[1], a large machine learning platform, reports achieving Fortran speeds (i.e. maximum throughput) with pure Java[2]. This means that a C application won't even be 1% faster.
I've had a personal conversation with Cliff himself. Java no matter what you do can't deal with hardware acceleration and gpus. We agreed on that. Numerical software is a different beast. I also kept mentioning "simd instructions" as well as things like openmp.
You are talking about systems software. Unfortunately that matters a lot for machine learning. The axis along which you can get equivalent speeds should be specified here. "values types" != "runs on faster chips"
You aren't likely to beat intel or nvidia's compilers at their own game here . Java will always be playing catch up to last gen's tech there.
Disclosure: I'm more than aware of what's going on in the space. We compete with them for customers and have a very clear understanding of their offerings. H20 has a great k/v store based on the exact mechanics you're talking about. That's about it though. Also of note: Cliff doesn't work on h20 anymore: https://twitter.com/cliff_click/status/700817408110399492
>> Oh, sure, for some specialized use cases of course that's true, but you could use OpenCL in Java, too....
OpenCL isn't exactly the industry standard for this stuff. You always end up using cuda, and you always end up dropping down to c. There's just no way to avoid that if you want the fastest out there.
I agree with you on the last part, but I keep mentioning "numerical software" for a reason. There are certain things the JVM is good at, writing a database and systems software is one of those things. There are still bits of HDFS in c++ though. I don't think you'll be able to get around having bits of your code in c which is what I emphasize here.
I've also had a personal conversation with Cliff about this (I know he's no longer with H2O), and he isn't (or, at least wasn't when I spoke to him about a year ago) aware of what's going on in Graal. Now, I completely agree that for specialized stuff, it's likely you'll get the most out of specialized compilers, but I'd closely watch the Graal space for specialized code generation. I had a talk with John Rose about how best to combine Graal with the work on VarHandles[1] and Panama[2], precisely to address things like SIMD. There's also interesting work done using Graal to directly emit GPU code (from Java code) for streams, but that's largely experimental at this stage (I think AMD did some work on that and then abandoned it).
We wrote a lot of that stack ourselves. And our compute code is in c++. There is a lot of stuff going on in big data land (including us) wrapping a lot of the c++ in java and enabling people to use the JVM for what it's good at (data access). We are pushing this with nvidia:
https://blogs.nvidia.com/blog/2016/10/06/how-skymind-nvidia-...
I also work pretty closely with a lot of the spark/gpu folks at IBM.
While I do largely agree with you, we have our own JNI compiler called javacpp that alleviates a lot of those concerns already:
https://github.com/bytedeco/javacpp
Having our own pointer class and doing our own memory management has helped a lot.
What you're talking about is using java for the compute.
There's no reason you can't wrap that in a runtime that most people know how to use. A lot of python folks do that now..but then you have to deal with python's limitations at which point the majority of your code (way more than needs be) will end up being in c anyways vs java where you can write a significant part of your app in java and have it be fast out of the box.
I am mostly a line-of-business developer doing enterprise consultancy and that is how we use C++, just as infrastructure language when either JVM or .NET stacks need a bit of outside help.
Just wondering if sometimes having a 100% C++ solution would be a better approach than the added integration effort it requires, on the other hand, without people like you guys we wouldn't have access to nice tooling in Java for similar work, so congratulations on the work thus far and all the best for the project.
It's fairly easy to get C performance (and even beat it in concurrent code) for the same amount of effort on the JVM.
Ha.
That's what we were promised back in the mid-1990s. It's no more true today than it was then. Well written Java will always be about 2-4x slower than similar C code.
JVM applications have many compensating advantages, including the aforementioned ease of exploiting concurrency. That doesn't erase the reality of single thread performance where Java has never, in twenty years of strong expert effort, caught up.
Agreed, in that the performance benefits you can get out of a higher level language will often depend on the stuff where the limits of human cognition is the performance boundary. As computing gets faster in general, the practical considerations should shift more towards the performance you can get out of an entire system than unit-for-unit performance. On that level, human comprehension often seems to be what needs to be accounted for.
I think of it as problem akin to how you can make traffic run better in an entire city rather than optimising the performance of an engine in a single car. It doesn't invalidate making more performant engines, but it's a different level of consideration.
When saying, "JVM is as performant as C", it's easy to run the numbers and see whether it's true or not, objectively. The caveat that GP throws in is, for the same amount of effort. Then you need to specify which human and under what circumstances, and the entire thing gets hairier.
As computing gets faster in general, the practical considerations should shift more towards the performance you can get out of an entire system than unit-for-unit performance.
Those days are over, unfortunately. Computing stopped getting faster in general around 2010. And that also changes the calculus around single thread performance. You can no longer count on hardware eventually to solve performance problems.
Which is precisely why it matters that I can write functionally pure code that is trivial to parallelize. At $FORMER_WORK, I even wrote (the same!) code as reasonably idiomatic-looking Clojure (essentially, reducer fns) that transparently runs locally single-threadedly, multi-threadedly, and on Hadoop.
That C program might be fast, but it's not a great tool for processing petabytes of data on a stampeding herd of angry elephants. So, I think performance arguments need a little nuance about what you're doing.
(My experiments include soft real time with deadline scheduling. Clojure's fine.)
I can't remember the source, but I remember reading about a case where someone replaced a large hadoop cluster with one node running highly tuned C. Distributed computing comes with a lot of overhead and you might be surprised at just how much you can get out of a highly tuned C application.
And this is why parallel/concurrent computing/applications are necessary to ease speedups. And that's what the new languages promise to let you get right. For the same effort.
> Well written Java will always be about 2-4x slower than similar C code.
Nope. A large, concurrent app is likely to be faster in Java given similar effort (of course, given enough effort -- which may be double -- C will eventually surpass that, sometimes even significantly, depending on usage). Currently the main bottleneck, which makes the above statement very dependent on application type, is lack of value types, and that's being addressed. In small sequential apps, there will be a significant advantage to C, which diminishes with the size of the app. The reason is that as the app grows, it gets harder to write manual optimizations while keeping the code modular and maintainable, while HotSpot can do all sorts of optimizations even with nice abstractions.
I "kinda" agree with you. Java has the right stability and speed trade offs and can be wicked fast. It beats the crap out of most garbage collected languages save maybe .net and the CLR.
For real applications I agree with you that c is the way to go. Java with c and off heap memory gets you a pretty long way.
> This, thankfully, is being addressed by the addition of value types.
Last I heard (JLS 2016 keynote by Brian Goetz) they have yet to commit/guarantee when value types will land. Java 10 is planned; if they're completed by then we're looking at probably 2020 given the delays with Java 9 release.
That's 3 years for other languages/platforms to evolve while the JVM unboxes itself.
You don't need to specify what type a function takes in Clojure, which is often nice. If you do though, the compiler can sometimes optimize, and make things much faster. Basically, the compiler does not have to
* look up the type of the thing
* run the action on it
but can instead skip straight to step 2.
If you somewhere in your Clojure program (or repl) run (set! warn-on-reflection true) it will print to the console when you are doing reflection.
I think it's a good idea to put in the beginning of your core namespace - so that it is always on. Why not always wear the seatbelt when driving, you know?
Correct, a lot of the average application-level code you see in the wild does not have to worry about this. If we're talking about things like load-balancers or databases or endpoints/functions that will be hit 1000s of times per second - then it's a diff. conversation.
I think you meant to say,
(defn func[o n] (.write ^WriteInterface o ^long n))
This is needed only when calling Java methods, because Java allows overloading. Clojure only does arity-overloading, not type overloading, and hence doesn't need the type info.
No, it was a convoluted way to say that the attempt to measure C programs by their quality as "idiomatic" will unlikely bear any fruits of value.
That said, if there is any non-idiomatic C code then the attempts of 'experts' in other languages to mangle C to look more like their language surely fall into this category (There are lots of examples where the author starts 'We can make C look more like fortran by defining our macros to replace these keywords..).
Can you elaborate more on what you mean by "lackluster performance"? What is the use case? If you're looking for top speed in terms of C/assembly performance - I'd say yes, probably the JVM will get in your way. However, I spent months building a database/key-value store in Clojure and it's quite doable to write very high performance code in Clojure as long as you put the right type hints everywhere. Tools like YourKit can help you identify the bottlenecks in your program. Again, depends on the use case, but Clojure makes it very, very idiomatic to write code that makes full use of a multicore system - something possible in the like of Java, C, et al but definitely not idiomatic at all.