I do this stuff foe my day job, and everything here rings true. Every part of th...

timClicks · on June 24, 2020

I think that notebooks have been a bit of a sideways step in a data science workflow. They're accessible, but they're tremendously brittle.

nl · on June 24, 2020

Notebooks are so much better than anything else I've used for data science.

I have a traditional SWEng background and came into data science never having used them. I'd never go back.

I'm not saying that they are impossible to improve, but as a general approach they are exactly right.

They are "brittle" when viewed as a software artefact. But that's not really what they are (or should be).

calebkaiser · on June 24, 2020

Not to be too self-promotion-y, but I work on an open source ML deployment that we built specifically because of how incongruous the data science workflow is to software engineering: https://github.com/cortexlabs/cortex

nostrebored · on June 24, 2020

I'm curious to get your input on the Model Monitor, Debugger, and Experiments features on Amazon SageMaker. Have you had a chance to play around with them?

nl · on June 24, 2020

I've tried Experiments. It's great at the easy part of the ML workflow: optimising a working model. But it doesn't really help with the hard part - the debugging at the interface of the model and the data.

Say you are building a car detector or something. Building the CNN is ML101, and SageMaker experiments helps with optimising the training parameters to get the best out of the model.

But that's not really a hard thing. The hard part is working out that your model is failing on cars with reflections of people in the windscreen or something, or your dataset co-ordinate space is "negative = up" so your in memory data augmentations are making the model learn upside down cars or something.

I don't know what Debugger gives me over a notebook, but I've only read the blog post.

I haven't tried Model Monitor but I do think that could be useful.

jiggunjer · on June 24, 2020

Any experience with ML in MATLAB?

Mumps · on June 24, 2020

I'll chime in, I did my thesis in MATLAB (specifically ML for MRI): While matlab itself wasn't the most fun to work with, they honestly have built a fantastic suite of tools. For example the Classifier App is amazing for brute forcing through a bunch of stuff.

Even went through a couple of hackathons with it and got some SoTA results.

I wouldn't ever go back to it, especially outside of academia. But it's not the worst thing out there.

nl · on June 24, 2020

Yes.

I mean it's fine, but I don't see any reason to use it instead of Python, and lots of reasons not to. But I'm not a mathematician by training.

I do quite like RStudio though, and I do see places where that is useful. So maybe MALBAB fits somewhere in between - less stats than R, less programmign that Python.

mkl · on June 24, 2020

I am a mathematician by training (+CS), I've worked in Matlab quite a bit, I've taught hundreds of students in it (having no say in language, but we're about to switch to Python), and I probably see even more reasons not to use it than you do. (I haven't done ML in it, but I'd be astounded if it wasn't terrible compared to Python and its ecosystem.)

I recommend avoiding Matlab for every use case unless they've got you trapped with a huge existing code base or reliance on a proprietary toolbox.

alexilliamson · on June 24, 2020

IMO whenever RStudio decides to support Jupyter notebooks, it's game over for everyone else. It's such a great piece of software for data analysis and I hope they continue to go broader than the just the R language.

langitbiru · on June 24, 2020

What do you think of Julia?

nl · on June 24, 2020

> Julia

The Haskell of Machine Learning.

https://www.jwz.org/doc/worse-is-better.html