> It's definitely not vastly more work to do statistical analysis using Python Pandas than in R anymore, perhaps it was several years ago.
Haha, what? What statistics can you do in pandas? You can do some statistics in python by cobbling together stuff from scipy and statsmodels (maybe I'm out of date, is there more?). I see a few modules for regression and stuff in pandas but they are marked as deprecated. I think perhaps you and I mean different things by "statistics". R provides a vast ecosystem covering, for example
- Gold standard implementations of simulation, PDFs, quantiles of any probability distribution you can mention (in python you can find some of this in scipy; not pandas. But scipy is a real mess compared to R and not as comprehensive.)
- Gold standard implementations of any classical hypothesis test you can mention
- Gold standard implementations of computational methods for fitting generalized linear models, mixed models, frameworks for MCMC samplers, graphical models, HMMs, and a vast amount of other stuff I'm not clever enough to name right now let alone understand.
Really any statistical procedure -- whether "classical" or "modern"/"computational statistics" -- in R you will find it, and furthermore it will be basically the reference implementation / gold standard.
That's not mentioning the plotting tools and the numerical computing and clean linear algebra syntax. But that's it, no more: the people who go further and suggest using R for building a web server or web scraping or something mostly haven't used real programming languages.
You're missing the point. The python ecosystem can't compete with R on the statistics front -- it would be crazy to try. That's certainly not the aim of pandas.
> you definitely don't need to use intermediary csv files anymore to move data back and forth between R and Python.
Perhaps not, but doesn't it please you to have a well-defined interface (a serialization format) between the two languages? I haven't tried Rpy2 for years. I don't like to have two different languages get their tentacles into each other like that if I can avoid it, but I'm sure it's a good project which has its use cases.
EDIT: thanks, I hadn't seen feather. That looks like the thing to use.
Haha, what? What statistics can you do in pandas? You can do some statistics in python by cobbling together stuff from scipy and statsmodels (maybe I'm out of date, is there more?). I see a few modules for regression and stuff in pandas but they are marked as deprecated. I think perhaps you and I mean different things by "statistics". R provides a vast ecosystem covering, for example
- Gold standard implementations of simulation, PDFs, quantiles of any probability distribution you can mention (in python you can find some of this in scipy; not pandas. But scipy is a real mess compared to R and not as comprehensive.)
- Gold standard implementations of any classical hypothesis test you can mention
- Gold standard implementations of computational methods for fitting generalized linear models, mixed models, frameworks for MCMC samplers, graphical models, HMMs, and a vast amount of other stuff I'm not clever enough to name right now let alone understand.
Really any statistical procedure -- whether "classical" or "modern"/"computational statistics" -- in R you will find it, and furthermore it will be basically the reference implementation / gold standard.
That's not mentioning the plotting tools and the numerical computing and clean linear algebra syntax. But that's it, no more: the people who go further and suggest using R for building a web server or web scraping or something mostly haven't used real programming languages.
You're missing the point. The python ecosystem can't compete with R on the statistics front -- it would be crazy to try. That's certainly not the aim of pandas.
> you definitely don't need to use intermediary csv files anymore to move data back and forth between R and Python.
Perhaps not, but doesn't it please you to have a well-defined interface (a serialization format) between the two languages? I haven't tried Rpy2 for years. I don't like to have two different languages get their tentacles into each other like that if I can avoid it, but I'm sure it's a good project which has its use cases.
EDIT: thanks, I hadn't seen feather. That looks like the thing to use.