Why Pandas feels clunky when coming from R

karencarits · on Feb 20, 2024

For even simpler data management in R, the author may have a alook at the `data.table` package [1]. His example would then be

    library(data.table)
    dt <- fread("purchases.csv")
    dt[, .(total = sum(amount - discount)), by = "country"]

and much faster to run

[1] https://rdatatable.gitlab.io/data.table/

a_bonobo · on Feb 21, 2024

Similarly, for me as an R person `polars` feels a bit more like R than `pandas`

    q = (
        pl.scan_csv("docs/data/iris.csv")
        .filter(pl.col("sepal_length") > 5)
        .group_by("species")
        .agg(pl.all().sum())
    )

    df = q.collect()

FateOfNations · on Feb 21, 2024

A number of aspects of the design of Pandas make more sense if you know it's background. It originated in a quantitative finance environment where it's common to be working with time series data. It grew from that in to a more general purpose data manipulation and analysis library.

efilife · on Feb 21, 2024

its* background

mint2 · on Feb 21, 2024

This is fine and all, (although I’m not impressed by the quality of the python code), but the examples don’t show how to do meta programming.

I often dont want to manually write out column names, but programmatically specify them, and similar for a lot of other of these examples. I don’t want to manually configure them.

I haven’t seen examples of that higher level programming in these various R python comparisons. It’s always tediously manual examples.

The examples usually feel like manual analyst query type tasks. Even the tone strongly reinforces that with text like “oh and Maria asked me to xyz”

efilife · on Feb 21, 2024

Are the "puRHcases" in the headings intentional or a typo?