Badly - under the hood jupyter notebooks are json which stores not only the code but all of the metadata as well. I know that there are tools that help with integrating jupyter and git but I just end up going back and forth between .py files in VSCode and notebooks in jupyter lab depending on what I'm working on.
Aye, notebooks truly represent a wrong turn in scientific computing. The
absence of version control alone is a showstopper, but it's insane generally
to launch long-running estimations interactively via browser-based notebook
interfaces, with outputs that are not readily greppable. But corporate
philistines continue to ooh and ah over jupyter's visuals, and companies like
Gradient are more than happy to cater to their FOMO.
> Aye, notebooks truly represent a wrong turn in scientific computing.
I agree with that problem, yet somehow like the idea of notebooks. Maybe the real tragedy here is that jupyter notebooks are saved as json and not as a valid program with comments, that can be run "as is" from the command line.
> write your code as valid programs with comments.
That's what I do! Then I have a script that converts my python program to shitty json that my colleague--who can only conceive to work inside a notebook--can run it. Finally, another script translates back the json to a readable code; and more importantly, to something that can meaningfully be put into git.
I would love if the jupyter interface allowed to save the notebook directly into a program with comments. Then all this silly sorcery would not be necessary.
But that's besides the point. The jupyter ecosystem, like other widespread and unwieldy formats such as Microsoft Word and PDF, poses a brutal obstruction to Unix workflows.
Yes, my script simply calls the nbconvert library, with some trickery to ensure that the result is idempotent. But I would like not to need this script, that instead the jupyter interface worked with valid python files directly (maybe after enabling some option).
> poses a brutal obstruction to Unix workflows.
It's not as much the notebook itself, but the file format chosen by default by the notebook. If it was a human-editable textfile there would be no problem, and there is no practical obstruction for that (other that young programmers today cannot conceive a different "serialization" format than json).
I respectfully disagree :) I think that the notebook environment is a nice entry-point for lots of applications. We actually use it pretty regularly to launch more sophisticated experiments / multi-node training jobs, etc using our python SDK.
You are right that versioning is still an issue and we largely punt on it by using the docker container (with layer commits on each notebook teardown) as the versioning mechanism. Maybe not the best solution but it does have it's advantages.
If I were you, I'd have angrily railed against GP, so I salute the restraint.
I don't approve of what you're doing but I also would happily trade positions
with you (unemployed versus founding principal of a company -- any company --
even one whose mission is nonsense).
It's possible but requires some foresight and strategy. Images are the first problem - they're stored in blobs. Small changes in graphics can create meaningless commits.
Data is the second problem - you usually can't run a notebook without it. If the notebook transforms data over time, it can be a real issue.
An integrated solution that versions data, the notebook, and the computational environment (collaborators will make changes over time) is the ideal collaborative platform.