Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The actual regular expression implementation in Rust is going to be fast, but one of the things that caught a reddit poster out only recently was that the Rust regex crate's parser doesn't magically cache, so if you sit in a tight loop making the same regex over and over, it'll do all that work over and over, whereas the Python code might take ten times longer to do it once, then caches it, it doesn't take long for that to end up faster.

Now, if you're going to use RegexSet you're also smart enough to read "For example, it’s a bad idea to compile the same regex repeatedly in a loop" and say "Yeah, makes sense, I will not repeatedly compile the same regex". But some fraction of Python programmers won't read that - and it'll be very slow.



There's plenty of examples like that though. A Python programmer might not know to compile in release mode. They might not use buffering when reading from a file. They might pass around copius copies of Vec<T> instead of &[T]. The list could go on and on.


Sure, and there would probably be some value in a tool which can walk them through the easy stuff before they show a real human code which it turns out just wasn't tested with release optimisations or whatever.

Still, as I understand it CTRE means if you just "use" the same expression over and over in your inner loop in C++ (with CTRE) it doesn't matter, because the regular expression compilation happened in compilation as part of the type system, your expression got turned into machine code once for the same reason Rust will emit machine code for name.contains(char::is_lowercase) once not somehow re-calculate that each time it's reached - so there is no runtime step to repeat.

This is a long way down my "want to have" list, it's below BalancedI8 and the Pattern Types, it's below compile-time for loops, it's below stabilizing Pattern, for an example closer to heart. But it does remind us what's conceivable.


IDK how we jumped to CTRE. Python doesn't do CTRE. It's doing caching. In Rust, you use std::sync::LazyLock for that. I don't get what the problem is to be honest.

I assume by CTRE you're referring to the CTRE C++ project. That's a totally different can of worms and comes with lots of trade-offs. I wish it were easy to add CTRE to rebar, then I could probably add a lot more color to the trade-offs involved, at least with that specific implementation (but maybe not to "compile time regex" in general).


I jumped to CTRE because it's another way that you can get the better results. The programmer need have no idea why this works, just like with caches.

I agree that there are trade-offs, but nevertheless compile time regex compilation is on my want list, even if a long way down it. I would take compile time arithmetic compilation† much sooner, but since that's an unsolved problem I don't get that choice.

† What I mean here is, you type in the real arithmetic you want, the compiler analyses what you wrote and it spits out an approximation in machine code which delivers an accuracy and performance trade off you're OK with, without you needing to be an expert in IEEE floating point and how your target CPU works. Herbie https://herbie.uwplse.org/ but as part of the compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: