Rust is one of those languages that I just can't see myself ever working with (I mostly do web dev), but it's a language I am really glad I learned and has made me a better overall developer.
I kinda felt similarly until I found one of my existing web services wasn't quite fast enough (currently serving around 1000req/s at peak). I could either go for a third rewrite in yet another framework hoping to increase performance by maybe 30%, or I could try something entirely new.
First attempt with rust more than doubled performance of the existing system on the same hardware. Now I feel a need to learn more.
The system's peak _usage_ is 60kreq/min (avg 1k/s), the Python code it's written in can handle maybe 850/s per instance at 100% CPU.
Rust is fine, rust is managing almost 2k/s on the same hardware. And this is when I know practically nothing about Rust, I expect I'm likely able to do better given time.
It's acting as a simple filter and cache over JSON data on Postgres.
I’m shocked that Rust only gave you a 2X improvement over Python (I’ve rewritten a handful of Python services to Go and typically see 100-1000X improvement). What is the bottleneck? Was scaling horizontally an option?
I suspect the reason it's not faster is one of the following:
1. backend DB at its query limit
2. this is basically my first rust program and I've done something stupid
3. ab/wrk aren't scaling properly either (I know, not likely, but worth considering) when I'm testing it
4. bandwidth/IO limits on the AWS instance types I'm running this on
There are other possibilities too I guess. I was mostly doing this as a means to learn Rust though, I'd only go live with it once I had decent tests and everything else.
actix/r2d2 on the Rust side (I haven't worked out bb8 yet), aiohttp/uvloop/asyncpg/gunicorn on the Python side (Postgres for the DB). This kinda _is_ the cache at present, it's a distillation of a lot of other data into a given form. Rust's async story re: databases still looks a little up-in-the-air right now from an outside perspective though, so I guess I'll wait and see how it pans out.
It's definitely a work in progress and I didn't expect this much interest in my offhand "I tried Rust even though I couldn't see myself using it for real, but..." comment.
Interestingly, I was able to get an amazing speed improvement without going completely to Rust.
I didn't want to port APM code as well, so I kept node.js/express for routing. Very simple middleware that immediately passed the request and body buffer to Rust for handling. Rust returned a buffer back to express for sending the response.
It's easily able to hit 100k RPM on a single instance before hitting a CPU bottleneck in the Rust code (validating an RSA signature). It only needs to handle 90k RPM total.
I have multiple instances to keep the latency down overall (and for security should we lose an availability zone or whatever else). But, as others have mentioned, instances cost, I'd like to be able to reduce that to the minimum if possible.
Yeah, if I was still doing primarily web applications, particularly version 1 apps where you don't know what the final product will actually be, I would probably still be using Rails for that. It's just insanely fast for creating those apps and once you've found the features that are hits that need to scale, you can pull those pieces out into services written in languages like Rust or Go. But since I mostly do lower level stuff now, Rust is looking very compelling.
Rocket requires the nightly Rust compiler, which isn't something I find acceptable for production applications, and it's synchronous, so it's really slow. I have been rather disappointed by what I perceive as the author's unwillingness to work towards stable Rust. They have a GitHub issue where they track all of their dependencies on nightly, and kind of just say they aren't going to do anything about it -- it's up to the Rust core team to just make all the things stable that Rocket is using, or else it will stay on nightly forever.
Rocket's biggest feature is their flashy website, in my opinion. Their website is really nice looking and the self-assured marketing is convincing to readers.
Actix-Web seems like a more mature framework (I don't expect my application to randomly break), it's asynchronous (so it's really fast) and it works on stable Rust. Warp and Tower Web look like promising async frameworks, but they're not very mature yet. Rouille is a pretty stable option for a simple, synchronous web framework.
It's not that he's unwilling to make it run on stable, it's that he has to sacrifice ergonomics to make it run on stable. Despite it not being async, yet, Rocket is my go-to because it has the best ergonomics. Something like actix-web may be more performant, but it's ergonomics are poor in comparison.
Regardless, Rust is stabilizing procedural macros which just leaves the never type as the last stabilization required for Rocket to be able to run on stable Rust. Additionally, I believe their next release is targeting the rewrite to async. A lot of it has to do with Rust firming up its own story around async.
> It's not that he's unwilling to make it run on stable, it's that he has to sacrifice ergonomics to make it run on stable.
Two sides of the same coin. I clearly disagree about the amount of ergonomics that would have to be "sacrificed" in order to make the library usable in a production environment (on stable). So, it's simply an unwillingness to bring the library to stable, from my point of view.
This has very similar ergonomics to Rocket in that it allows decorating handlers with their route, but it runs on Stable Rust. If you want to use `async fn`, that requires nightly for now since that's literally the nightly syntax for an async function, but the route decorators work on stable. As I previously mentioned, Tower Web is not mature, so I would not recommend it at this stage, but it shows what is possible.
I don't personally think that decorating handlers with routes is significantly more ergonomic than defining a table of contents somewhere else, like Actix does it. Defining routes is usually a very small part of your code that you do once and move on. Beyond that, what ergonomics are we talking about?
Anecdotally, I've been hearing claims that Rocket will run on stable "real soon now" for quite a while now. It just doesn't look like it's going to happen.
Never type can use a crate. The hygiene and error APIs aren’t going to be stable soon.
In theory it looks like rocket could use that package and accept worse error messages and build on stable. Sergio would have to weigh in on that, though, I could be wrong.
That was definitely true before Rust 1.15, but it becomes even less beneficial with each new stable release.
I work for one such prominent early adopter of Rust, and we were very happy to move to stable sometime back. Stable versions are widely used, so issues are much more likely to be noticed, and fixes are backported to the current stable release.
If you pick a random nightly version, you have much less support to begin with simply because fewer people use it, and you're on nightly because you're dependent on features that haven't been as thoroughly tested as those features that exist on stable, and which could disappear entirely in the next nightly. Each time you upgrade to a different nightly increases the risk of introducing some hard-to-find bug into production.
Some companies do use nightly Rust in production. You're completely correct. It's just not something that I would personally be willing to do unless I had absolutely no reasonable alternative.
I don't understand the point you're trying to make. In networked stuff, the (a)synchronicity absolutely does make something fast or slow.
Suppose you have a synchronous web server with 4 threads, synchronous naturally means that each thread can only handle one request at a time. Each request can take dozens or hundreds of milliseconds to complete, just by being bottlenecked on latency. If it takes 100ms to complete each request, that server can only handle 40 requests per second.
If that server were asynchronous, each thread could handle thousands of simultaneous connections, making the performance literally thousands of times better at a minimum.
If you suggest spinning up an unlimited number of OS threads, that's quickly going to run into problems because most OSes can only handle a couple of thousand threads before you start running into OS limits that you have to adjust, knobs for which aren't always easy to find, and even then, each thread takes a significant amount of time to start and stop, as well as much larger amounts of RAM.
An alternative solution is green threading, which Go calls Goroutines, for example. In such a system, the asynchronous operations are handled by the underlying runtime and your code can treat them as synchronous and just spawn new "threads".
If you disagree, I would love to hear a more detailed response, because I can't see how your claim could be true.
All network I/O is inherently asynchronous of course, so the sync vs async debate is more about whether to use the blocking abstraction the kernel provides versus bringing your own or writing async code directly.
Using async I/O or a userland abstraction like green threads necessarily means you're moving the I/O scheduling work into userland. Sometimes this might be the right call, but it's effectively a bet that your userland scheduler can do a better job than the kernel's own scheduler.
This bet might pay off in highly specialised cases where your userland scheduler is well tuned to your workload, but the majority of userland schedulers (to name a few examples: the Go runtime, node.js's libuv) are also aimed at general purpose workloads and in many cases are far less mature than the kernel's scheduler.
There's been a massive amount of engineering work that's gone into the Linux kernel's various scheduling algorithms over the years. Pathological edge cases resulting in starvation or other performance issues have largely been identified and worked out. The various schedulers are also highly tuneable to the specifics of your workload if you need to do that.
These days OS threads are a totally viable option for building highly concurrent network services on Linux. Spawning hundreds of thousands of threads works fine and is fast enough for most applications. While threads will have 8MB of virtual address space reserved for their stack by default, this is just 'on paper' memory use - no pages are actually allocated until you use them.
> These days OS threads are a totally viable option for building highly concurrent network services on Linux. Spawning hundreds of thousands of threads works fine and is fast enough for most applications.
I don't agree. If you're at the point where you need to handle such large numbers of concurrent threads, my own experience and online benchmarks clearly show it is not fast enough. Why are none of the top performers on the TechEmpower benchmark using a thread per connection? The simplest answer is that that can't or they would, since it would be a waste of time to reimplement so much in userland.
More anecdotally, I have been unable to get my Fedora laptop to spawn more than a few tens of thousands of threads before the kernel tells everyone that "resource is temporarily unavailable" to any process that attempts to spawn a new thread or process until the offending process is stopped. I've googled the issue extensively, tweaked countless knobs, and come up empty handed. I'm certain the Linux kernel can spawn more threads than that, but my point still stands that it is nontrivial.
In theory, that's a great perspective you presented. However, the async interfaces into the kernel exist for very real reasons.
I would also challenge you (if you're bored) to write a C, C++, or Rust program that spawns 1 or 2 million threads that do nothing but sleep for a very long time, and write a similar program in Go. See how the two compare in how long it takes to launch all of the tasks, and see how much memory they use. Ideally, you would also use, say, Rust futures + tokio and spawn a million sleeping futures to compare there too, but futures and tokio aren't the most intuitive or well documented things yet, in my opinion.
I've done similar tests in the past, and OS threads are not a lightweight resource.
You don't need million threads to be really fast. You need enough threads for your load. If you need to handle 1000 requests per second with each request taking 10 ms in average, you need 100 threads. It's absolutely adequate number of threads for OS to manage. If you would write this code with async style, you won't achieve anything, because bottleneck would be with database or another service or disk I/O. Million threads is very rare case.