That's how you deal with that problem with Java as well: arrays of primitives. If you need to operate on million of points, you can either use Point[] which will incur something like 16 MB additional memory or use two int[] arrays which won't incur any extra overhead short of few bytes. Your code won't be pretty, but it'll be fast and you can always hide this weirdness behind pretty API.
"Struct of Arrays", to be contrasted with "Array of Structs", because in Java it's actually "Array of pointers to Struct". In languages where "Array of Structs" is actually possible, the decision of which to use is less clear-cut and depends on how big the Struct is, what the access patterns are like, and whether you're trying to perform SIMD operations on multiple Structs at once.
I was bored and had time on my hands, so I played around a bit with some of the suggestions in this thread, but it seems as though a naive ctypes approach is the worst possible approach in the two cases I tested (arrays of integers, arrays of x,y struct).
Is there an approach that you use that produces better results than this? Because my naive approach is unmanageably worse for both performance (not recorded) and impact on Python's memory usage.
Depends on the algorithms that use that code. Two separate arrays is not a bad way to represent a vector-of-tuples. And so if you think of everything as a vectorised operation (and if that makes sense for your use case) then the code can be clean enough.
Efficiency-wise it depends on how the locality of reference falls out. There are cases where it's more cache-efficient to store the pairs next to each other, but again for big vectorised operations you might lose nothing by using two cache-ways instead of one.
Or you can just use a more memory efficient language like Java. Python and JS use tons of memory compared to strongly typed VM languages like C#, Go, Java. People lament Java memory usage but Python/Node use several times more
>People lament Java memory usage but Python/Node use several times more
The people who lament Java's memory usage never complain about Python or Ruby because... if you're worried about the JVM you would never consider Python!
I'm mostly referring to those that say the JVM is too "resource heavy" then go on to use JS/Python/Ruby instead, which have much higher runtime overhead.
In my experience many dev think the scripting language heritage means these languages have smaller/lighter runtimes
Not that I want to go down the rabbit hole, but I probably fall into the category of “JVM is heavy, but I use Python”
Sure, memory consumption and CPU perf are both pretty bad in Python, but latency and memory footprint of the runtime itself are pretty good, so it’s ideal for tooling, crons, lambdas etc. JVM is comparatively efficient once you have the JVM ready, but that sure takes tons of resources.
I’m hoping Graal changes this. I don’t like JVM-based languages, but I do like technological progress.
Python startup time can be significant for non-trivial programs as well, and it's been a big problem for Mercurial and other projects. For example, see these threads:
If your script is expected to be run interactively on a frequent basis, then Go, Rust, C++, or even Bash (for simple stuff) will give you much lower user-perceived latency than Python.
It's fairly easy to use Class Data Storage and include an archive in Java 12 where startup time is important. This reduces start time significantly.
JVM starts really quick (~100ms on my machine) and doesn't use much resources as long as your app is small. that's... Uncommon in Java land though. Even simply apps pull in Guava/Apache Commons and a few client libraries. This can easily be thousands of classes. Nobody thinks about it because runtime cost for loading shitloads of code is so low. But you can improve this a ton by using ProGaurd and stripping out stuff you don't need
If I can do it in numpy, I can probably get a lighter, faster implementation with Python than in Java. More generally, if I can do it with a Python package that's actually a fairly thin wrapper around a C, C++ or Fortran library, then Python also has a decent chance of being the easy winner.
If none of those situations apply, then yeah, typically Java ends up being more efficient.
I have seen completely the opposite in the wild. In fact I've only seen it so far in the opposite direction (Java consuming far more memory than Python) that I have to ask what on Earth you are doing with Python to have found yourself in such a position.
(I've never successfully used Node so I have no idea what that's like in memory use.)