Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why Pandas feels clunky when coming from R (sumsar.net)
20 points by braza on Feb 20, 2024 | hide | past | favorite | 6 comments


For even simpler data management in R, the author may have a alook at the `data.table` package [1]. His example would then be

    library(data.table)
    dt <- fread("purchases.csv")
    dt[, .(total = sum(amount - discount)), by = "country"]
and much faster to run

[1] https://rdatatable.gitlab.io/data.table/


Similarly, for me as an R person `polars` feels a bit more like R than `pandas`

    q = (
        pl.scan_csv("docs/data/iris.csv")
        .filter(pl.col("sepal_length") > 5)
        .group_by("species")
        .agg(pl.all().sum())
    )

    df = q.collect()


A number of aspects of the design of Pandas make more sense if you know it's background. It originated in a quantitative finance environment where it's common to be working with time series data. It grew from that in to a more general purpose data manipulation and analysis library.


its* background


This is fine and all, (although I’m not impressed by the quality of the python code), but the examples don’t show how to do meta programming.

I often dont want to manually write out column names, but programmatically specify them, and similar for a lot of other of these examples. I don’t want to manually configure them.

I haven’t seen examples of that higher level programming in these various R python comparisons. It’s always tediously manual examples.

The examples usually feel like manual analyst query type tasks. Even the tone strongly reinforces that with text like “oh and Maria asked me to xyz”


Are the "puRHcases" in the headings intentional or a typo?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: