Python keeps growing in number of users because it’s easy to get started, has libraries to load basically any data, and to perform any task. It’s frequently the second best language but it’s the second best language for anything.
By the time a python programmer has «graduated» to learning a second language, exponential growth has created a bunch of new python programmers, most of which don’t consider themselves programmers.
There are more non-programmers in this world, and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing. These people all use python. This seems to be a perspective missing from most hackernews discussions, where people work on high performance Big corp big data web scale systems.
What worries me, though, is that the features that make Python quite good at prototyping make it rather bad at auditing for safety and security. And we live in a world in which production code is prototyping code, which means that Python code that should have remained a quick experiment – and more often than not, written by people who are not that good at Python or don't care about code quality – ends up powering safety/security-critical infrastructures. Cue in the thousands of developer-hours debugging or attempting to scale code that is hostile to the task.
I would claim that the same applies to JavaScript/Node, btw.
I sometimes think about what Python would be like if it were written today, with the hindsight of the last thirty years.
Immutability would be the default, but mutability would be allowed, marked in some concise way so that it was easy to calculate things using imperative-style loops. Pervasive use of immutable instances would make it impossible for libraries to rely on mutating objects a la SQLAlchemy.
The language would be statically type-checked, with optional type annotations and magic support for duck typing (magic because I don't know how that would work.) The type system would prioritize helpful, legible feedback, and it would not support powerful type-level programming, to keep the ecosystem accessible to beginners.
It would still have a REPL, but not everything allowed in the REPL would be allowed when running code from a file.
There would be a strong module system that deterred libraries from relying on global state.
Support for at least one fairly accessible concurrency paradigm would be built in.
I suspect that the error system would be exception-based, so that beginners and busy people could write happy path code without being nagged to handle error values and without worrying that errors could be invisibly suppressed, but there might be another way.
I think free mutability and not really needing to know about types are two things that make the language easier for beginners.
If someone who's not familiar with programming runs into an error like "why can't I change the value of X" that might take them multiple hours to figure out, or they may never figure it out. Even if the error message is clear, total beginners often just don't know how to read them and use them.
They provide longer term advantages once your program becomes larger but the short term advantages are more important as a scripting language imo
The type system I want would just be a type system that tells you that your code will fail, and why. Pretty much the same errors you get at runtime. Hence the need for my hypothetical type system to handle duck typing.
I don't think mutability by default is necessary for beginners. They just need obvious ways of getting things done. There are two places beginners use mutability a lot. The first is gradual transformation of a value:
line = "The best of times, the worst "
line = line.trim()
line = line[:line.find(' ')]
This is easily handled by using a different name for each value. The second is in loops:
word_count = 0
for line in lines():
word_count += num_words(line)
I think in a lot of cases beginners will have no problem using a map or list comprehension idiom if they've seen examples:
word_counts = [num_words(line) for line in lines]
# or word_counts = map(num_words, line)
word_count = sum(word_counts)
But for cases where the immutable idiom is a bit tricker (like a complicated fold) they could use a mutable variable using the mutability marker I mentioned. Let's make the mutability marker @ since it tells you that the value can be different "at" different times, and let's require it everywhere the variable is used:
word_count @= 0
for line in lines():
word_count @= word_count + num_words(line)
Voila. The important thing is not to mandate immutability, but to ensure that mutability is the exception, and immutability the norm. That ensures that library writers won't assume mutability and rely on it (cough SQLAlchemy cough), and the language will provide good ergonomic support for immutability.
It's a common claim that immutability only pays off in larger programs, but I think the mental tax of mutability starts pretty immediately for beginners. We're just used to it. Consider this example:
Beginners shouldn't have to constantly wrestle with the difference between value semantics and reference semantics! This is the simplest possible example, and it's already a mind-bender for beginners. In slightly more complicated guises, it even trips up professionals programmers. I inherited a Jupyter notebook from a poor data scientist who printed out the same expression over and over again in different places in the notebook trying to pinpoint where and why the value changed. (Lesson learned: never try to use application code in a data science calculation... lol.) Reserving mutability for special cases protects beginners from wrestling with strange behavior from mistakes like these.
Julia is both dynamic and fast. It doesn’t solve all issues but uniquely solves the problem of needing 2 languages if you want flexibility and performance.
Exception error handling - and their extensive use in the standard library -is the fundamental design mistake that prevented Python becoming a substantial programing language.
Coupled with the dynamic typing and mutability by default, it guarantees Python programs won't scale, relegating the language to the role of a scratchpad for rough drafts and one off scripts, a toy beginner's language.
I have no idea why you say that it's a scratchpad or a toy language consdering that far more production lines of code are getting written in Python nowadays than practically any other language with the possible exception of Java.
But that's the same with Excel: massive usage for throwaway projects with loose or non-existing requirements or performance bounds that end-up in production. Python is widely used, but not for substantial programming in large projects - say, projects over 100 kloc. Python hit the "quick and dirty" sweet spot of programming.
This is absolutely not true. I’ve made my living working with Python and there’s an astounding amount of large Python codebases. Onstage and YouTube alone have millions of lines of code. Hedge funds and fintechs base their entire data processing workflows around Python batch jobs. Django is about as popular as Rails and powers millions of websites and backends.
None of those applications are toys. I have no idea where your misperception is coming from.
I guess I'm more than a little prejudiced from trying to maintain all sorts of CI tools, web applications and other largeish programs somebody initially hacked in Python in an afternoon and which grew to become "vital infrastructure". The lack of typing bytes you hard and the optional typing that has been shoehorned into the language is irrelevant in practice.
All sorts of problems would simply have not existed if the proper language was used from the beginning, as opposed to the one where anyone can hack most easily.
We still live in a world where many outward facing networked applications are written in C. Dynamic languages with safe strings are far from the floor for securable tools.
However, I hope that these C applications are written by people who are really good at C. I know that some of these Python applications are written by people who discovered the language as they deployed into production.
That’s a measure of programming prowess, not the actual security concern at hand.
If the masterful C developer still insists on using a language that has so many footguns and a weird culture of developers pretending that they’re more capable than they are, then their C mastery could very well’ve not been worth much against someone throwing something together in Python, which will at the very least immediately bypass the vast majority of vulnerabilities found in C code. Plus, my experience with such software is that the sort of higher level vulnerabilities that you’d still see in Python code aren’t ones that the C developer has necessarily dealt with.
A popular opinion in game development is that you should write a prototype first to figure out what works and is fun, and once you reach a good solution throw away that prototype code and write a proper solution with the insigts gained. The challenge is that many projects just extend the prototype code to make the final product, and end up with a mess.
Regular sofware development is a lot like that as well. But you can kind of get around that by having Python as the "prototyping language", and anything that's proven to be useful gets converted to a language that's more useful for production.
What audits need most is some ability to analyze the system discretely and really "take it apart" into pieces that they can apply metrics of success or failure to(e.g. pass/fail for a coding style, numbers of branches and loops, when memory is allocated and released).
Python is designed to be highly dynamic and to allow more code paths to be taken at runtime, through interpreting and reacting to the live data - "late binding" in the lingo, as opposed to the "early binding" of a Rust or Haskell, where you specify as much as you can up front and have the compiler test that specification at build time. Late binding creates an explosion of potential complexity and catastrophic failures because it tends to kick the can down the road - the program fails in one place, but the bug shows up somewhere else because the interpreter is very permissive and assumes what you meant was whatever allows the program to continue running, even if it leads to a crash or bad output later.
Late binding is very useful - we need to assume some of it to have a live, interactive system instead of a punchcard batch process. And writing text and drawing pictures is "late binding" in the sense of the information being parsed by your eyes rather than a machine. But late binding also creates a large surface area where "anything can happen" and you don't know if you're staying in your specification or not.
There are many examples, but let's speak for instance of the fact that Python has privacy by convention and not by semantics.
This is very useful when you're writing unit tests or when you want to monkey-patch a behavior and don't have time for the refactoring that this would deserve.
On the other hand, this means that a module or class, no matter how well tested and documented and annotated with types, could be entirely broken because another piece of code is monkey-patching that class, possibly from another library.
Is it the case? Probably not. But how can you be sure?
Another (related) example: PyTorch. Extremely useful library, as we have all witnessed for a few years. But that model you just downloaded (dynamically?) from Hugging Face (or anywhere else) can actually run arbitrary code, possibly monkey-patching your classes (see above).
Is it the case? Probably not. But how can you be sure?
Cue in supply chain attacks.
That's what I mean by auditing for safety and security. With Python, you can get quite quickly to the result you're aiming for, or something close. But it's really, really, really hard to be sure that your code is actually safe and secure.
And while I believe that Python is an excellent tool for many tasks, I am also something of an expert in safety, with some experience in security, and I consider that Python is a risky foundation to develop any safety- or security-critical application or service.
There's also the argument that at a certain scale the time of a developer is simply more expensive than time on a server.
If I write something in C++ that does a task in 1 second and it takes me 2 days to write, and I write the same thing in Python that takes 2 seconds but I can write it in 1 day, the 1 day of extra dev time might just pay for throwing a more high performance server against it and calling it a day. And then I don't even take the fact that a lot of applications are mostly waiting for database queries into consideration, nor maintainability of the code and the fact that high performance servers get cheaper over time.
If you work at some big corp where this would mean thousands of high performance servers that's simply not worth it, but in small/medium sized companies it usually is.
Realistically something that takes 1 second in C++ will take 10 seconds (if you write efficient python and lean heavily on fast libraries) to 10 minutes in python. But the rest of your point stands
I spend most of my time waiting on IO, something like C++ isn't going to improve my performance much. If C++ takes 1ms to transform data and my Python code takes 10ms, it's not much of a win for me when I'm waiting 100ms for IO.
With Python I can write and test on a Mac or Windows and easily deploy on Linux. I can iterate quickly and if I really need "performance" I can throw bigger or more VPSes at the problem with little extra cognitive load.
I do not have anywhere near the same flexibility and low cognitive load with C++. The better performance is nice but for almost everything I do day to day completely unnecessary and not worth the effort. My case isn't all cases, C++ (or whatever compiled language you pick) will be a win for some people but not for me.
And how much code is generally written that actually is compute heavy? All the code I've ever written in my job is putting and retrieving data in databases and doing some basic calculations or decisions based on it.
Code is "compute heavy" (could equally be memory heavy or IOPs heavy) if it's deployed into many servers or "the cloud" and many instances of it are running serving a lot of requests to a lot of users.
Then the finance people start to notice how much you are paying for those servers and suddenly serving the same number of users with less hardware becomes very significant for the company's bottom line.
The other big one is reducing notable latency for users of your software.
Damn! Is the rule of thumb really a 10x performance hit between Python/C++? I don’t doubt you’re correct, I’m just thinking of all the unnecessary cycles I put my poor CPU through.
Outside cases where Python is used as a thin wrapper around some C library (simple networking code, numpy, etc) 10x is frankly quite conservative. Depending on the problem space and how aggressively you optimize, it's easily multiple orders of magnitude.
FFI into lean C isn't some perf panacea either, beyond the overhead you're also depriving yourself of interprocedural optimization and other Good Things from the native space.
Of course it depends on what you are doing, but 10x is a pretty good case. I recently re-wrote a C++ tool in python and even though all the data parsing and computing was done by python libraries that wrap high performance C libraries, the program was still 6 or 7 times slower than C++. Had I written the python version in pure python (no numpy, no third party C libraries) it would no doubt have been 1000x slower.
It depends on what you're doing. If you load some data, process it with some Numpy routines (where speed-critical parts are implemented in C) and save a result, you can probably be almost as fast as C++... however if you write your algorithm fully in Python, you might have much worse results than being 10x slower. See for example: https://shvbsle.in/computers-are-fast-but-you-dont-know-it-p... (here they have ~4x speedup from good Python to unoptimized C++, and ~1000x from heavy Python to optimized one...)
Last time I checked (which was a few years ago), the performance gain of porting a non-trivial calculation-heavy piece of code from Python to OCaml was actually 25x. I believe that performance of Python has improved quite a lot since then (as has OCaml's), but I doubt it's sufficient to erase this difference.
And OCaml (which offers a productivity comparable to Python) is sensibly slower than Rust or C++.
It really depends on what you're doing, but I don't think it is generally accurate.
What slows Python down is generally the "everything is an object" attitude of the interpreter. I.e. you call a function, the interpreter has to first create an object of the thing you're calling.
In C++, due to zero-cost abstractions, this usually just boils down to a CALL instruction preceded by a bunch of PUSH instructions in assembly, based on the number of parameters (and call convention). This is of course a lot faster than running through the abstractions of creating some Python object.
> What slows Python down is generally the "everything is an object" attitude of the interpreter
Nah, it’s the interpreter itself. Due to it not having JIT compilation there is a very high ceiling it can not even in theory surpass (as opposed to things like pypy, or graal python).
I don't think this is true: Other Python runtimes and compilers (e.g. Nuitka) won't magically speed up your code to the level of C++.
Python is primarily slowed down because of the fact that each attribute and method access results in multiple CALL instructions since it's dictionaries and magic methods all the way down.
Which can be inlined/speculated away easily. It won’t be as fast as well-optimized C++ (mostly due to memory layout), but there is no reason why it couldn’t get arbitrarily close to that.
How so? Python is dynamically typed after all and even type annotations are merely bolted on – they don't tell you anything about the "actual" type of an object, they merely restrict your view on that object (i.e. what operations you can do on the variable without causing a type error). For instance, if you add additional properties to an object of type A via monkey-patching, you can still pass it around as object of type A.
A function/part of code is performed say a thousand times, the runtime collects statistics that object ‘a’ was always an integer, so it might be worthwhile to compile this code block to native code with a guard on whether ‘a’ really is an integer (that’s very cheap). The speedup comes from not doing interpretation, but taking the common case and making it natively fast and in the slow branch the complex case of “+ operator has been redefined” for example can be handled simply by the interpreter. Python is not more dynamic than Javascript (hell, python is strongly typed even), which hovers around the impressive 2x native performance mark.
Also, if you are interested, “shapes” are the primitives of both Javascript and python jit compilers instead of regular types.
> it's a VM reading and parsing your code as a string at runtime.
Commonly it creates the .pyc files, so it doesn't really re-parse your code as a string every time. But it does check the file's dates to make sure that the .pyc file is up to date.
On debian (and I guess most distributions) the .pyc files get created when you install the package, because generally they go in /usr and that's only writeable by root.
It does include the full parser in the runtime, but I'd expect most code to not be re-parsed entirely at every start.
The import thing is really slow anyway. People writing command lines have to defer imports to avoid huge startup times to load libraries that are perhaps needed just by some functions that might not even be used in that particular run.
That is true, but there are relatively few real world applications that consist of only those operations. In the example I mentioned below, there where actually some parts of my python rewrite that ended up faster than the original C++ code, but once everything was strung together into a complete application those parts where swamped by the slow parts.
Most of the time these are arithmetic tight loops that require optimisations, and it's easy to extract those into separate compiled cython modules without losing overal cohesion within the same Python ecosystem.
At some point, every engineer has heard this same argument but in favor of all kinds of dubious things such as emailing zip files of source code, not having tests, not having a build system, not doing IaC, not using the type system, etc.
I'm sure Rust was the wrong tool for the job in your case but I find this type of get shit done argument unpersuasive in general. It overestimates the value of short-term delivery and underestimates how quickly an investment in doing things "the right way" pays off.
If you're dealing in areas with short time limits then Python is great,
because you can't sell a ticket for a ship that has sailed.
And I've seen "the right way" which, again, depending on the business may
result in a well designed product that is not what's actually needed (because
people are really bad at defining what they want)
What's brilliant with Python compared to other hacky solutions that it
does support test, type hints, version control and other things. It just
doesn't force you to work that way. But if you want to write stable, maintainable
code, you can do it.
That means you can write your code without types and add them later.
Or add tests later once your prototype was been accepted. Or whenever something
goes wrong in production, fix it and then write a test against that.
Oh and I totally agree you should certainly try to "do things the right way",
if the business allows it.
It is hard to believe that Python is objectively that much more productive than other languages. I know Python moderately well (with much more real world experience in C#). I like Python very much but I don't think it is significantly more productive than C#.
This. C#, Java or even newcomers such as Kotlin/Go are even in the same ballpark due to the REPL/Jupyter alone. Let alone when you consider the ecosystem
If you are in a lab (natural science lab) or anywhere close to data, I bet you it is much more productive, even more so when you have to factor in that the code might be exposed to non-technical individuals.
The thing is that the short term is much easier to predict what you're going to need and where the value is, and in the long term you might not even work on this codebase anymore. Lot of incentives to get things done in the short term.
The business owner (whoever writes the checks) prefers get shit done over "the right way". Time to completion is a key factor of the payoff function of the devs work.
The entire point of doing things the right way is that you end up delivering more value in the long term, and "long term" can be as soon as weeks or even days in some cases.
Business owners definitely prefer less bugs, less customer complaints, less support burden, less outages, less headaches. Corner cutting doesn't make economic sense for most businesses and good engineering leadership doesn't have much trouble communicating this up the chain. The only environment where I've seen corner cutting make business sense is turd polishing agencies whose business model involves dumping their mistakes on their clients and running away so the next guy can take the blame.
Try the travel/event booking business (where I'm in) - and no, people don't dump their mistakes on the next guy here - to the contrary, the "hacky" Python solutions are supported for years and teams stay for decades (allthough a decade ago we had not discovered how great Python was)
What business owners actually don't like at all is how long is takes traditional software development to actually solve problems - which then don't really fit the business after wasting a few years of ressources... and the dumping and running away is worse in Java and other compiled software. With Python you can at least read the source in production if the team ran away...
> the dumping and running away is worse in Java and other compiled software. With Python you can at least read the source in production if the team ran away...
Java (and dotnet, the two big "VM" languages) is somewhat of a strange example for that; JVM bytecode is surprisingly stable and reverse engineering is reasonably easy unless the code was purposely obfuscated - a bad sign on any language anyways.
> underestimates how quickly an investment in doing things "the right way" pays off.
What time horizon should a startup optimize delivery for? Minutes, hours, days, weeks? Say you're a startup dev in a maximalist "get shit done now" mindset so you're skipping types, tests, any forethought or planning so you can get the feature of the week done as fast as possible. This makes you faster for one week but slower the week after, and the week after, and the week after that.
Say a seed stage startup aims for 12 months runway to achieve some key outcomes. That's still a marathon. It still doesn't make sense to sprint the first 200 meters.
> coworkers who churn out shiny new things at 10x the speed
Sounds like a classic web-dev perspective, my customers hate when we ship broken tools because it ruins their work, new feature velocity be dammned. We love our borrow checker because initially you run at 0.5x velocity but post-25kSLOC you get to run at 2x velocity, which continues to mystify managers worldwide.
With Python, testing, good hygiene and a bit of luck you can write core that is maybe 99% reliable. It is very, very hard to get to (100-eps)% for eps < 0.1% or so. Rust seems better suited to that.
Anything else, especially if there isn't a huge premium on speed, meh - Python is almost always sufficient, and not in the way.
I use the same combo: lots of Python to analyse problems, test algos, process data, etc. Then, once I settle on a solution but still need more performance (outside GPU's), I go to rust.
I'm simulating an audio speaker in real time. So I do the data crunching, model fitting, etc. in python and this gives me a godd theoretical model of the speaker. But to be able to make a simulation in realtime, I need lots of speed so rust makes sense there (moreover, the code I have to plug that in is rust too, so one more reason :-)). (now tbh, my realtime needs are not super hard, so I can avoid a DSP and a real time OS :-) )
I don't need rust specifically. It's just that its memory and thread management really help me to continue what I do in python: focusing on my core business instead of technical stuff.
My most successful career epiphany was realizing that everyone -- my customers, my boss, etc -- was happier if I shipped code when I thought it was 80% ready. That long tail from 80-100% generates a lot of frustration.
It's just an application of the Pareto principle. That last 20% of work to make perfect software costs a lot of time. Customers (and by extension, management) do not care how pretty your code is, how perfect your test coverage is (unless your manager is a former developer, then they might have more of an opinion), they care most that you ship it. Bugs are a minor irritation compared to sitting around waiting for functionality they need, as long as you're responsive in fixing the bugs that do come up.
Thanks. I thought that is what you meant but another possible take was that the last 20% is actually important. Getting something 80% finished is fast and then the long tail to get it to 100% is frustrating for everyone because the work, in theory is finished. I think that can happen as well.
Of course there are at least three dimensions to discuss here: internal quality, external quality and product/feature fit. Lower quality internal code eventually leads to slower future development and higher turnover as no one wants to work with the crappy code base. Lower external quality (i.e. bugs) can lead to customers not liking your product. Interestingly the relationship between internal and external quality is not as direct as one might think. Getting features out the door more quickly (at the expense of other things) can help with product fit. Essentially, like most things, this is an ongoing optimization problem and different approaches are appropriate for different problem domains.
That is interesting. I went in the other direction :)
I am tired of having to refactor shiny new things churned out at 10x the speed and that keep breaking in production. These days, if given a choice, I prefer writing them in Rust code, spending more time writing and less time refactoring everything as soon as it breaks or needs to scale.
When the pointer chasing (sometimes) comes in handy, is once you have a successful business with a lot of data and/or users, and suddenly the cost of all those EC2 instances comes to the attention of the CFO.
That's when rewriting the hot path in Go or Rust or Java or C or C++, can pay off and make those skills very valuable to the company. Making contributions to databases, operating systems, queueing systems, interpreters, Kubernetes etc. also fall into that category.
But yeah if you are churning out a MVP for a new business, yeah starting with Python or Ruby or Javascript is a better bet.
(Erlang/Elixir is also an interesting point in the design space, as it's very high level and concise, but also scales better than anything else, although not especially efficient for code executing serially. And Julia offers the concision of Python with much higher performance for numerical computing.)
Or there are programmers who write both. Something that I want to write once, have run on several different platforms, handle multi-threading nicely, and never have to think about again? Rust. Writing something to read in some data to unblock an ML engineer or make plots for management? Definitely not Rust, probably python. Then you can also churn out things at 10x the speed, but by writing the tricky parts in something other than python, you don't get dragged back down by old projects rearing their ugly heads, so you outpace the python-only colleagues in the long-term.
Programming is secondary to my primary duties and only a means for me to get other things done. I'm in constant tension between using Python and Rust.
With Python I can get things up and going very quickly with little boilerplate, but I find that I'm often stumbling on edge cases that I have to debug after the fact and that these instances necessarily happen exactly when I'm focused on another task. I also find that packaging for other users is a major headache.
With Rust, the development time is much higher for me, but I appreciate being able to use the type-system to enforce business logic and therefore find that I rarely have to return to debug some issue once I have it going.
It's a tough trade-off for me, because I appreciate the velocity of Python, but Rust likely saves me more time overall.
If you're 'tired of chasing pointers', Rust's a lot closer to (and I'd argue better than) Python than say Go - it'll tell you where the issue is and usually how to fix it; Go will just blow up at run time. (Python (where applicable) will do something unexpected and wrong but potentially not error (..great!))
I completely agree - but you say that like it's a bad thing. I work as a developer alongside data scientists, who might have strong knowledge of statistics or machine learning frameworks rather than traditional programming chops.
For the most part they don't need to know about concurrency, memory efficiency etc, because they're using a library where those issues have been abstracted away.
I think that's what makes python ideal - it's interoperability with other languages and library ecosystem means less technical people can produce good, efficient work without having to take on a whole bunch of the footguns that would come from working directly in a language like c++ or Rust.
But this is a false dichotomy. The space of options isn't C++/Rust or Python. There are languages which attempt to give the best of both worlds, e.g. Julia.
> they're using a library where those issues have been abstracted away.
I work in Python, and while libraries like numpy have certainly abstracted away some of those issues, there's still so much performance left of the table because Python is still Python.
Oh, I'm familiar with numba and while it certainly helps, it has plenty of it's own issues. You don't always get a performance gain and you only find this out at the end of a refactoring. Your code can get less readable if you need to transport data in and out of formats that it's compatible with (looking at you List()).
To say nothing of adding yet another long dependancy chain to the language (python 3.11 is still not supported even though work started in Aug of last year).
I do wonder if the effort put into making this slow language fast could have been put to better use, such as improving a language with python's ease of use but which was build from the beginning with performance in mind.
I've rewritten real world performance critical numpy code in C and easily gotten 2-5x speedup on several occasions, without having to do anything overly clever on the C side (ie no SIMD or multiprocessing C code for example).
Did you rewrite the whole thing or just drop into C for the relevant module(s)? Because the ability to chuck some C into the performance critical sections of your code is another big plus for Python.
But... pretty much any language can interoperate with C, it's calling conventions have become the universal standard. I mean, I still remember at $previousJob when I was deprecating a C library and carefully searched for any mention of the include file... only to discover that a whole lot of Fortran code depended on the thing I was changing, and I had just broken all of it (since Fortran doesn't use include files the same way, my search for "#include <my_library" didn't return any hits, but the function calls were there none-the-less).
Julia, to use the great-great-grand-op's example, seems to also have a reasonably easy C interop (I've never written any Julia, so I'm basing this off skimming the docs, dunno, it might actually be much more of a pain than it looks like here).
I’ve done the same but moved from vanilla numpy to numba. The code mostly stayed the same and it took a couple hours vs however long a port to C or Rust would have taken.
For a package whose pitch is "Just apply one of the Numba decorators to your Python function, and Numba does the rest." a few hours of work is a long time.
2-5x speedup is not a lot, I would say it is not worth it to rewrite from py to C if you don't have an order of magnitude improvement.
Because if you compare the benefit to the cost of rewrite from py to C and cost of maintaining/updating C code and possible C footguns like manual memory safety, etc - then there is no benefit left
I highly doubt that numpy can ever be a bottleneck. In typical python app - there are other things like I/O that consume resources and become bottleneck, before you run into numpy limits and justify rewrite in C.
I haven't personally run into IO bottlenecks so I have no idea how you would speed those up in Python.
But there's two schools of thoughts I've heard from people regarding how to think about these bottlenecks:
1. IO/network is such a bottleneck so it doesn't matter if the rest is not as fast as possible.
2. IO/network is a bottleneck so you have to work extra hard on everything else to make up for it as much as possible.
I tend to fall in the second camp. If you can't work on the data as it's being loaded and have to wait till it's fully loaded, then you need to make sure you process it as quickly as possibly to make up for the time you spend waiting.
In my typical python apps, it's 0.1-20 seconds of IO and pre-processing, followed by 30 seconds to 10 hours of number crunching, followed by 0.1-20 seconds of post processing and IO.
2-5x speedup barely seems worth re-writing something for, unless we're talking calculations that take literally days to complete, or you're working on the kernel of some system that is used by millions of people.
> For the most part they don't need to know about concurrency [...]
In my opinion, this is the part that Go got mostly right. Concurrency is handled by the runtime, and held behind a very thin veil. As a programmer you don't really need to know about it, but it's there when you need to poke at it directly. Exposing channels as a uniform communication mechanism has still enough footguns to be unpleasant, though.
In an ideal world, I should be able to decorate a [python] variable and behind the scenes the runtime would automatically shovel all writes to it through an implicitly created channel. Instead of me as a coder having to think about it. Reads could still go through directly because they are safe.
If I could have Python syntax and stdlib, with Go's net/http and crypto libraries included, and have concurrency handled transparently in Go-style without having to think about it, that would be pretty close to an all-wishes-come-true systems language. Oh, and "go fmt", "go perf" and "go fuzz" as first-class citizens too.
Someone else in this thread brought up the idea of immutable data structures as a default. I wouldn't mind that. Python used to have frozenset (technically it still does but I haven't seen a performance difference for a while), so extending the idea of freeze()/unfreeze() to all data types certainly has appeal.
In fact, the development of the world is based on constant levels of abstraction, just think of assembly language and computer punch tape programming, those days are not long past.
> without having to take on a whole bunch of the footguns that would come from working directly in a language like c++ or Rust.
Don't forget the footguns of working with developers who do those things. Ask them to do something simple and you get something complex and expensive after months of back and forth about what is wanted. You're likely to a framework for a one off SQL query.
I hear it being said already, "You're using software developers wrong!" Well, maybe software developers shouldn't be so hard to use?
> maybe software developers shouldn't be so hard to use?
This whole take assumes bad intention on both sides. Nobody's job is easy in this situation. Leadership's job is to set everyone up for success. If things go off the rails and end up with months of back and forth leading to nobody being happy despite good intentions and honest effort, then the problem lies with leadership.
Sure thing! Footguns might be the wrong word, and I know as a low level language Rust is insanely safe, but for a high level developer it's type system is gonna mean spending a lot of time in the compiler figuring out type errors, at least initially. That might not be a traditional footgun, but if you're just trying to, I dunno, build a crud api or something, its gonna nuke your development time.
Please don't read this as "rust is difficult and bad", I definitely don't think it is! But its a low level language, and working with it means dealing with complexity that for some tasks just might not be relevant.
I agree, but for something like the CRUD app example I made bringing in pydantic or something would solve that. Rust's type system is a lot stricter because it's solving problems in a space that doesn't touch a lot of Python developers.
>and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing.
Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax. So there's not much incentive for me to switch to a more efficient language when as near as I can tell the best tooling for dealing with SIMD or weird gpu bullshit seems to be being created for python developers. If you want to write fast code do it in c/rust/whatever, if you want to write really fast code do it in python with <some-ML-library>.
For a very specific definition of the word "fast" at least.
> Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax. So there's not much incentive for me to switch to a more efficient language when as near as I can tell the best tooling for dealing with SIMD or weird gpu bullshit seems to be being created for python developers. If you want to write fast code do it in c/rust/whatever, if you want to write really fast code do it in python with <some-ML-library>.
Rather unfortunately, my current bugbear is that Pytorch is... slow. On the CPU. One of the most common suggestions for people who want stable diffusion to be faster is, wait for it, "Try getting a recent Intel CPU, you'll see a real uplift in performance".
This despite the system only keeping a single CPU core busy. Of course, that's all you can do in Python most of the time.
(You can also use larger batch sizes. But that only partially papers over the issue, and also it uses more GPU memory.)
Your OS, the linear algebra libraries themselves, much of the user-facing software that you use (latency sensitive rather than throughput sensitive), image/video encoding/decoding, most of the language runtimes that you use, high volume webservers, high volume data processing (where your data is not already some nice flat list of numbers you're operating on with tensor operations), for some examples.
Really, for almost any X, somebody somewhere has to do X with strict performance requirements (or at very large scale, so better perf == savings)
Most of these python libraries are only fast for relatively large and relatively standard operations in the first place. If you have a lot of small/weird computations, they come with a ton of overhead. I've personally had to write my own fast linear algebra libraries since our hot loop was a sort of modified tropical algebra once.
They asked for examples of non-numpy/tf/had use cases and I gave some including my own experience? No disagreement, HPC Python in practice is heavily biased towards numpy and friends
Your comment is super interesting because it suggests Python has evolved in a direction opposite to the Python Paradox - http://www.paulgraham.com/pypar.html
Whereas before you could get smarter programmers using Python, now because of the exponential growth of Python, the median Python programmer is likely someone with little or no software engineering or computer architecture background who is basically just gluing together a lot of libraries.
Neat observation. I wasn't doing much programming in 2004, but, I'm guessing 2004 Python would be like today's Rust. People learn it because they love it.
I think more so Rust than even Python on 2004 since Rust has a pretty steep learning curve and does require a non-trivial amount of dedication to learning it.
> It’s frequently the second best language but it’s the second best language for anything.
This myth wasn't even true many years ago, it certainly isn't true today. You can build a mobile app, game, distributed systems, OS, GUI, Web frontend, "realtime" systems, etc in Python, but it is a weak choice for most of those things (and many others) let alone the second best option.
The saying does not mean that in a rigorous evaluation Python would be second best out of all programming ecosystems for all problems.
The saying means that for any given problem, there is a better choice, but second best is the language you know which has all of the tools to get the job done, so the answer is probably just a bunch of pip installs, imports, and glue code.
It’s kind of like “the best camera is the one you have with you” — it’s a play on the differing definitions of “best” to highlight the value of feasibility over technical perfection.
When I switched from PHP to Python years ago I had the same feeling as the OP, then it became the third best, then the fourth, then situational when object-orientation makes sense, then for just scripting, and now... unsure beyond a personal developer comfort/productivity preference. TUIs and GUIs built on Python on my machine seem to be the first things to have issues during system upgrades because of the package management situation.
Anything that doesn't require high performance that is. Is there any 3D game engine for python yet? I guess Godot has gdscript which is 90% python by syntax, but that doesn't quite count I think.
You won't get high performance out of Python directly, but there are a lot of Python libraries that use C or a powerful low level language underneath. The heavy lifting in so much of machine learning is CUDA, but most people involved in ML are writing Python.
Sure, but what's not really python per se. One could also call C++ libraries from java via JNI and pretend java is super fast.
If people write program logic in python it will run at python speeds. Otherwise you're not really writing python, like nobody says some linux native program is bash because it happens to be launched from a bash script.
> Sure, but what's not really python per se. One could also call C++ libraries from java via JNI and pretend java is super fast.
But that's how every scripting language obtains good-not-just-decent performance. A strong culture of dropping down to C for any halfway-important library is why PHP's so hard to beat in real-world use, speed-wise (whatever its other shortcomings).
Java is super fast though, it almost never uses JNI as it doesn’t need it as opposed to Python. It uses JNI for integrating with the C world (e.g. opengl bindings).
Python isn't a joke either. I'm a full-on programmer who started with C and branched out to several other languages, and I'd still pick Python for a lot of new tasks, even things that aren't little scripts. Or NodeJS, which has similar properties but has particular advantages for web backends.
I’ve been a Python developer for 15 years, and Python might have been the second best language for anything when I started my career, but there are so many better options for just about any domain except maybe data science. Basically for any domain that involves running code in a production environment (as opposed to iterating in a Jupiter notebook) in which you care about reliability or performance or developer velocity, Python is going to be a pretty big liability (maybe it will be manageable if you’re just building a CRUD app atop a database). Common pain points include performance (no you can’t just multiprocess or numpy your way out of performance problems), packaging/deployment, and even setting up development environments that are reasonably representative of a production environment (this depends a lot on how you deploy to production—I’m sure lots of people have solved this for their production environment).
By the time a python programmer has «graduated» to learning a second language, exponential growth has created a bunch of new python programmers, most of which don’t consider themselves programmers.
There are more non-programmers in this world, and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing. These people all use python. This seems to be a perspective missing from most hackernews discussions, where people work on high performance Big corp big data web scale systems.