Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most likely a missued of Pandas. DF are heavy to create, but calculations on them are fast if you stay in the numpy world and stay vectorized.


Yeah, I'm a bit suspicious that they made two simultaneous changes:

  1. remove pandas
  2. externalize WEIGHTS and don't generate it every run
Point 2 is likely a huge portion of the runtime.


"It's fast so long as you don't use any of the many parts that aren't fast!"

This isn't great.


That's true for everything in computing.

Don't use a hammer as a screwdriver.

I'm not even implying they shouldn't have used pandas for this, I'm suggesting they probably wrote the wrong pandas code for this.

Pandas is typically 3 times faster than raw Python, not 10 times slower.


I used to be a hardcore functional programming weenie, but over time I realized that to do high-performance, systems programming in an FP language means writing a bunch of non-idiomatic code, to the point that it's worth considering C (or C++ for STL only, but not that OOP stuff) instead unless you have a good reason (which you might) for a nonstandard language.

The problem isn't Python itself. Python has come a long way from where it started. The problem is people using Python for modules where they actually end up needing, say, manual memory management or heterogeneous high performance (e.g. Monte Carlo algorithms).


No, I think it is fair to call out mediocrity, even when it tries to pull the "disclaim exactly the set of specific applications it gets called out on" trick.

Sure, pandas often beats raw python by a bit, but come on, there's so much mediocrity between the two that I doubt they even had to cheat to find a situation the other way around.


I wish for myself to create any project in my life that reach twice this level of mediocrity then.


People create accidentally quadratic code all the time. It's even easier in pandas because the feature set is so huge and finding the right way to do it takes some experience (see stackoverflow for a lot of plain loops over pandas dataframes).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: