Sorry, I'm very new to this, how do notebooks/shared notebooks play with version...

psv1 · on Oct 13, 2019

Badly - under the hood jupyter notebooks are json which stores not only the code but all of the metadata as well. I know that there are tools that help with integrating jupyter and git but I just end up going back and forth between .py files in VSCode and notebooks in jupyter lab depending on what I'm working on.

jimmyvalmer · on Oct 13, 2019

Aye, notebooks truly represent a wrong turn in scientific computing. The absence of version control alone is a showstopper, but it's insane generally to launch long-running estimations interactively via browser-based notebook interfaces, with outputs that are not readily greppable. But corporate philistines continue to ooh and ah over jupyter's visuals, and companies like Gradient are more than happy to cater to their FOMO.

enriquto · on Oct 14, 2019

> Aye, notebooks truly represent a wrong turn in scientific computing.

I agree with that problem, yet somehow like the idea of notebooks. Maybe the real tragedy here is that jupyter notebooks are saved as json and not as a valid program with comments, that can be run "as is" from the command line.

jimmyvalmer · on Oct 14, 2019

> I agree with that problem, yet somehow like the idea of notebooks.

You like the tight feedback loop of REPLs, as do I. You don't need the clunky machinery of jupyter to effect REPLs with emacs and ipython.

> real tragedy here is that jupyter notebooks are not [saved] as a > valid program with comments

Here's a simple way of doing that: write your code as valid programs with comments.

enriquto · on Oct 14, 2019

> write your code as valid programs with comments.

That's what I do! Then I have a script that converts my python program to shitty json that my colleague--who can only conceive to work inside a notebook--can run it. Finally, another script translates back the json to a readable code; and more importantly, to something that can meaningfully be put into git.

I would love if the jupyter interface allowed to save the notebook directly into a program with comments. Then all this silly sorcery would not be necessary.

jimmyvalmer · on Oct 14, 2019

It does, nbconvert.

But that's besides the point. The jupyter ecosystem, like other widespread and unwieldy formats such as Microsoft Word and PDF, poses a brutal obstruction to Unix workflows.

enriquto · on Oct 15, 2019

> It does, nbconvert

Yes, my script simply calls the nbconvert library, with some trickery to ensure that the result is idempotent. But I would like not to need this script, that instead the jupyter interface worked with valid python files directly (maybe after enabling some option).

> poses a brutal obstruction to Unix workflows.

It's not as much the notebook itself, but the file format chosen by default by the notebook. If it was a human-editable textfile there would be no problem, and there is no practical obstruction for that (other that young programmers today cannot conceive a different "serialization" format than json).

DTE · on Oct 13, 2019

I respectfully disagree :) I think that the notebook environment is a nice entry-point for lots of applications. We actually use it pretty regularly to launch more sophisticated experiments / multi-node training jobs, etc using our python SDK.

You are right that versioning is still an issue and we largely punt on it by using the docker container (with layer commits on each notebook teardown) as the versioning mechanism. Maybe not the best solution but it does have it's advantages.

jimmyvalmer · on Oct 13, 2019

If I were you, I'd have angrily railed against GP, so I salute the restraint. I don't approve of what you're doing but I also would happily trade positions with you (unemployed versus founding principal of a company -- any company -- even one whose mission is nonsense).

schmudde · on Oct 14, 2019

It's possible but requires some foresight and strategy. Images are the first problem - they're stored in blobs. Small changes in graphics can create meaningless commits.

Data is the second problem - you usually can't run a notebook without it. If the notebook transforms data over time, it can be a real issue.

Here's a pretty complete run down of the first problem, the .ipynb itself: https://nextjournal.com/schmudde/how-to-version-control-jupy...

An integrated solution that versions data, the notebook, and the computational environment (collaborators will make changes over time) is the ideal collaborative platform.

mkutsovsky · on Oct 13, 2019

There are some tools in the jupyter ecosystem that are aware of the .ipynb json format & make diffing + version control easier to manage (https://github.com/jupyter/nbdime). Another option is by converting the notebook file to something (.py,.md,.html) that existing git tools are better at working with (https://github.com/mwouts/jupytext, https://github.com/jupyter/nbconvert)