Rust is one of those languages that I just can't see myself ever working with (I...

tech2 · on Oct 22, 2018

I kinda felt similarly until I found one of my existing web services wasn't quite fast enough (currently serving around 1000req/s at peak). I could either go for a third rewrite in yet another framework hoping to increase performance by maybe 30%, or I could try something entirely new.

First attempt with rust more than doubled performance of the existing system on the same hardware. Now I feel a need to learn more.

vmchale · on Oct 22, 2018

What are you serving? Some of the Rust frameworks for backend should be able to do more than 1000req/s, but it depends what you're doing.

tech2 · on Oct 22, 2018

The system's peak _usage_ is 60kreq/min (avg 1k/s), the Python code it's written in can handle maybe 850/s per instance at 100% CPU.

Rust is fine, rust is managing almost 2k/s on the same hardware. And this is when I know practically nothing about Rust, I expect I'm likely able to do better given time.

It's acting as a simple filter and cache over JSON data on Postgres.

weberc2 · on Oct 23, 2018

I’m shocked that Rust only gave you a 2X improvement over Python (I’ve rewritten a handful of Python services to Go and typically see 100-1000X improvement). What is the bottleneck? Was scaling horizontally an option?

tech2 · on Oct 23, 2018

I suspect the reason it's not faster is one of the following:

  1. backend DB at its query limit
  2. this is basically my first rust program and I've done something stupid
  3. ab/wrk aren't scaling properly either (I know, not likely, but worth considering) when I'm testing it
  4. bandwidth/IO limits on the AWS instance types I'm running this on

There are other possibilities too I guess. I was mostly doing this as a means to learn Rust though, I'd only go live with it once I had decent tests and everything else.

the_duke · on Oct 23, 2018

What web framework are you using?

Are you using a database? If so, which one? Are you using a cache like Redis?

I agree that a 2x speedup is a terribly low for a Python to Rust rewrite and shows that Python is unlikely to be the relevant bottleneck here.

tech2 · on Oct 23, 2018

actix/r2d2 on the Rust side (I haven't worked out bb8 yet), aiohttp/uvloop/asyncpg/gunicorn on the Python side (Postgres for the DB). This kinda _is_ the cache at present, it's a distillation of a lot of other data into a given form. Rust's async story re: databases still looks a little up-in-the-air right now from an outside perspective though, so I guess I'll wait and see how it pans out.

It's definitely a work in progress and I didn't expect this much interest in my offhand "I tried Rust even though I couldn't see myself using it for real, but..." comment.

weberc2 · on Oct 23, 2018

> What web framework are you using?

I can't imagine the web framework is the bottleneck...

the_duke · on Oct 23, 2018

It might be because if it's a tokio-core based one OP might be blocking the event loop too much.

vkjv · on Oct 25, 2018

Interestingly, I was able to get an amazing speed improvement without going completely to Rust.

I didn't want to port APM code as well, so I kept node.js/express for routing. Very simple middleware that immediately passed the request and body buffer to Rust for handling. Rust returned a buffer back to express for sending the response.

It's easily able to hit 100k RPM on a single instance before hitting a CPU bottleneck in the Rust code (validating an RSA signature). It only needs to handle 90k RPM total.

emptysea · on Oct 22, 2018

Any reason you didn't scale up horizontally with more python instances?

steveklabnik · on Oct 22, 2018

We have seen production users choose to make this port to save on server costs. You can always scale horizontally, but it comes with a price tag.

richardwhiuk · on Oct 22, 2018

Also complexity - managing an order of magnitude more instances is more hassle.

cies · on Oct 22, 2018

But "more hassle" by an order of magnitude that gradualy reduces when growing the number of instances.

iopq · on Oct 23, 2018

But the cost keeps growing, especially when you need to go from 50 servers to 100

weberc2 · on Oct 23, 2018

The OP only doubled their performance by rewriting in Rust though...

beatgammit · on Oct 24, 2018

And depending on the amount of code, that's an awesome trade-off if OP can just have one instance instead of managing a cluster.

From what OP has said, there's probably a good change they can get more performance with some optimization.

weberc2 · on Oct 25, 2018

You still need redundancy, so you’re operating multiple instances either way (or you’re playing fast and loose).

tech2 · on Oct 23, 2018

I have multiple instances to keep the latency down overall (and for security should we lose an availability zone or whatever else). But, as others have mentioned, instances cost, I'd like to be able to reduce that to the minimum if possible.

pauldix · on Oct 22, 2018

Yeah, if I was still doing primarily web applications, particularly version 1 apps where you don't know what the final product will actually be, I would probably still be using Rails for that. It's just insanely fast for creating those apps and once you've found the features that are hits that need to scale, you can pull those pieces out into services written in languages like Rust or Go. But since I mostly do lower level stuff now, Rust is looking very compelling.

the_other_guy · on Oct 22, 2018

you may want to have a look at Rocket https://rocket.rs/, you won't be disappointed

coder543 · on Oct 22, 2018

Rocket requires the nightly Rust compiler, which isn't something I find acceptable for production applications, and it's synchronous, so it's really slow. I have been rather disappointed by what I perceive as the author's unwillingness to work towards stable Rust. They have a GitHub issue where they track all of their dependencies on nightly, and kind of just say they aren't going to do anything about it -- it's up to the Rust core team to just make all the things stable that Rocket is using, or else it will stay on nightly forever.

Rocket's biggest feature is their flashy website, in my opinion. Their website is really nice looking and the self-assured marketing is convincing to readers.

Actix-Web seems like a more mature framework (I don't expect my application to randomly break), it's asynchronous (so it's really fast) and it works on stable Rust. Warp and Tower Web look like promising async frameworks, but they're not very mature yet. Rouille is a pretty stable option for a simple, synchronous web framework.

Spartan-S63 · on Oct 22, 2018

It's not that he's unwilling to make it run on stable, it's that he has to sacrifice ergonomics to make it run on stable. Despite it not being async, yet, Rocket is my go-to because it has the best ergonomics. Something like actix-web may be more performant, but it's ergonomics are poor in comparison.

Regardless, Rust is stabilizing procedural macros which just leaves the never type as the last stabilization required for Rocket to be able to run on stable Rust. Additionally, I believe their next release is targeting the rewrite to async. A lot of it has to do with Rust firming up its own story around async.

coder543 · on Oct 22, 2018

> It's not that he's unwilling to make it run on stable, it's that he has to sacrifice ergonomics to make it run on stable.

Two sides of the same coin. I clearly disagree about the amount of ergonomics that would have to be "sacrificed" in order to make the library usable in a production environment (on stable). So, it's simply an unwillingness to bring the library to stable, from my point of view.

Take Tower Web for example: https://medium.com/@carllerche/tower-web-0-3-async-await-and...

This has very similar ergonomics to Rocket in that it allows decorating handlers with their route, but it runs on Stable Rust. If you want to use `async fn`, that requires nightly for now since that's literally the nightly syntax for an async function, but the route decorators work on stable. As I previously mentioned, Tower Web is not mature, so I would not recommend it at this stage, but it shows what is possible.

I don't personally think that decorating handlers with routes is significantly more ergonomic than defining a table of contents somewhere else, like Actix does it. Defining routes is usually a very small part of your code that you do once and move on. Beyond that, what ergonomics are we talking about?

Actix can easily and automatically deserialize JSON into structs, for example: https://actix.rs/docs/extractors/#json

foldr · on Oct 22, 2018

Is Rocket still on track to run on stable by the end of 2018, as per your comment here?

https://news.ycombinator.com/item?id=16543914

Anecdotally, I've been hearing claims that Rocket will run on stable "real soon now" for quite a while now. It just doesn't look like it's going to happen.

steveklabnik · on Oct 22, 2018

Looks like not quite: https://github.com/SergioBenitez/Rocket/issues/19

Never type can use a crate. The hygiene and error APIs aren’t going to be stable soon.

In theory it looks like rocket could use that package and accept worse error messages and build on stable. Sergio would have to weigh in on that, though, I could be wrong.

staticassertion · on Oct 22, 2018

> which isn't something I find acceptable for production applications

For what it's worth, at least some of the largest rust adopters just pin to a nightly version, and don't have a ton of issues upgrading.

coder543 · on Oct 22, 2018

That was definitely true before Rust 1.15, but it becomes even less beneficial with each new stable release.

I work for one such prominent early adopter of Rust, and we were very happy to move to stable sometime back. Stable versions are widely used, so issues are much more likely to be noticed, and fixes are backported to the current stable release.

If you pick a random nightly version, you have much less support to begin with simply because fewer people use it, and you're on nightly because you're dependent on features that haven't been as thoroughly tested as those features that exist on stable, and which could disappear entirely in the next nightly. Each time you upgrade to a different nightly increases the risk of introducing some hard-to-find bug into production.

Some companies do use nightly Rust in production. You're completely correct. It's just not something that I would personally be willing to do unless I had absolutely no reasonable alternative.

haileys · on Oct 23, 2018

> and it's synchronous, so it's really slow.

> it's asynchronous (so it's really fast)

Whether something is 'synchronous' or 'asynchronous' has absolutely no bearing on performance.

coder543 · on Oct 23, 2018

I don't understand the point you're trying to make. In networked stuff, the (a)synchronicity absolutely does make something fast or slow.

Suppose you have a synchronous web server with 4 threads, synchronous naturally means that each thread can only handle one request at a time. Each request can take dozens or hundreds of milliseconds to complete, just by being bottlenecked on latency. If it takes 100ms to complete each request, that server can only handle 40 requests per second.

If that server were asynchronous, each thread could handle thousands of simultaneous connections, making the performance literally thousands of times better at a minimum.

If you suggest spinning up an unlimited number of OS threads, that's quickly going to run into problems because most OSes can only handle a couple of thousand threads before you start running into OS limits that you have to adjust, knobs for which aren't always easy to find, and even then, each thread takes a significant amount of time to start and stop, as well as much larger amounts of RAM.

An alternative solution is green threading, which Go calls Goroutines, for example. In such a system, the asynchronous operations are handled by the underlying runtime and your code can treat them as synchronous and just spawn new "threads".

If you disagree, I would love to hear a more detailed response, because I can't see how your claim could be true.

haileys · on Oct 23, 2018

All network I/O is inherently asynchronous of course, so the sync vs async debate is more about whether to use the blocking abstraction the kernel provides versus bringing your own or writing async code directly.

Using async I/O or a userland abstraction like green threads necessarily means you're moving the I/O scheduling work into userland. Sometimes this might be the right call, but it's effectively a bet that your userland scheduler can do a better job than the kernel's own scheduler.

This bet might pay off in highly specialised cases where your userland scheduler is well tuned to your workload, but the majority of userland schedulers (to name a few examples: the Go runtime, node.js's libuv) are also aimed at general purpose workloads and in many cases are far less mature than the kernel's scheduler.

There's been a massive amount of engineering work that's gone into the Linux kernel's various scheduling algorithms over the years. Pathological edge cases resulting in starvation or other performance issues have largely been identified and worked out. The various schedulers are also highly tuneable to the specifics of your workload if you need to do that.

These days OS threads are a totally viable option for building highly concurrent network services on Linux. Spawning hundreds of thousands of threads works fine and is fast enough for most applications. While threads will have 8MB of virtual address space reserved for their stack by default, this is just 'on paper' memory use - no pages are actually allocated until you use them.

coder543 · on Oct 23, 2018

> These days OS threads are a totally viable option for building highly concurrent network services on Linux. Spawning hundreds of thousands of threads works fine and is fast enough for most applications.

I don't agree. If you're at the point where you need to handle such large numbers of concurrent threads, my own experience and online benchmarks clearly show it is not fast enough. Why are none of the top performers on the TechEmpower benchmark using a thread per connection? The simplest answer is that that can't or they would, since it would be a waste of time to reimplement so much in userland.

More anecdotally, I have been unable to get my Fedora laptop to spawn more than a few tens of thousands of threads before the kernel tells everyone that "resource is temporarily unavailable" to any process that attempts to spawn a new thread or process until the offending process is stopped. I've googled the issue extensively, tweaked countless knobs, and come up empty handed. I'm certain the Linux kernel can spawn more threads than that, but my point still stands that it is nontrivial.

In theory, that's a great perspective you presented. However, the async interfaces into the kernel exist for very real reasons.

I would also challenge you (if you're bored) to write a C, C++, or Rust program that spawns 1 or 2 million threads that do nothing but sleep for a very long time, and write a similar program in Go. See how the two compare in how long it takes to launch all of the tasks, and see how much memory they use. Ideally, you would also use, say, Rust futures + tokio and spawn a million sleeping futures to compare there too, but futures and tokio aren't the most intuitive or well documented things yet, in my opinion.

I've done similar tests in the past, and OS threads are not a lightweight resource.

vbezhenar · on Oct 23, 2018

You don't need million threads to be really fast. You need enough threads for your load. If you need to handle 1000 requests per second with each request taking 10 ms in average, you need 100 threads. It's absolutely adequate number of threads for OS to manage. If you would write this code with async style, you won't achieve anything, because bottleneck would be with database or another service or disk I/O. Million threads is very rare case.

Axsuul · on Oct 22, 2018

Can you please expand on how it made you a better developer?

EduardoBautista · on Oct 22, 2018

It made more conscious about when I was modifying values stored in memory or creating copies those values.