Quo vadis, Jupyter?
Computatiational notebooks – hate them or love them, they are here to stay. In this brief rant, I talk about where this computational medium might be heading, and how are we going to get there. Buckle up and enjoy the ride! 🚀🪐
REPL, but way cooler
Computational notebooks are a powerful tool for data science and scientific computing. They allow users to combine code, text, and visualizations in a single document. This makes them a great tool for exploratory data analysis, but also for reproducible research and teaching. They are often hard to share, and they are not well suited for version control. Reusing code from other notebooks is difficult, copy&paste is the norm. Automatically generating documentation from notebooks is hard. Refactoring code is almost unheard of. Using Copilot to automate boring boilerplate? Nah. And the list goes on…
Computational notebook is like a REPL (read-eval-print loop) – it is a rapid interactive programming session, where you forget all best practices of software engineering and just have fun. Two of the oldest programming languages – LISP & APL, i.e.
(÷(≢)(⌿(+)))) – adopted this model of interaction as the core developer experience. In fact, REPL was invented for LISP in 1964, almost 60 years ago. To get the feeling of how a REPL session might have looked like in the past, here is APL demo from 1975:
Whole video is a delight to watch. Put it to Watch later playlist or somewhere where you can find it. It is really worth watching. You'll learn why APL is still unmatched by current languages.
Computational notebook can be viewed as an editable record of a REPL session. To put that as an analogy: REPL VS. computational notebook is like pen & paper & calculator VS. Excel. Just rerun the notebook with some changes and you'll get different results. Yay! (Automatic rerun, aka reactivity is shitty, but exists in some forms.)
You have fun with this cool REPL, and after a while you save your work because you wanna have fun again in the future. However, you loose all your context. You have no idea what were you thinking when you started the session, and you have no idea what you were doing when you stopped the session. The execution context of your computer is also gone. Remember that perfectly trained model, that wasn't checkpointed? Yeah, it's gone. Forever. This is NOT an image-based Smalltalk system, that we were promised back in a day. This is a REPL – it can be saved as a single plain-text file. That's it.
When you live long enough with some code, you'll start to write more prose. Comments are way for preserving thoughts. They enable you to restore your thinking context – you can jump in and continue where you left off. Documenting your code is an essential way to understand what you were doing. Like a journaling, but for computational thoughts. Plain-text journalling. Remember that cool algorithm you visually sketched on the whiteboard? Nah, sorry – only plain-text allowed here.
Don Knuth (the author of The Art of Computer Programming – a multi-volume treatise dedicated to algorithms & data structures) invented literate programming exactly because it is the best way to produce high-quality output. Interleaving code and prose in a single document is the best way to preserve thinking context and ideally share it with someone else.
Org-mode and distant cousins
I am pretty sure (99.9999%) you are NOT an Emacs user and you DON'T know what an Org mode is. It's basically an operating system for your documents – be it notes, to-do lists, personal wiki-like second brain, or you know, computational notebooks. Anything! But plain-text of course.
Chances are, you heard about (or maybe even used) Notion or Roam. They don't really bother with the "computational" part that much, but they are really good Word-like WYSIWYG notebooks for networked thoughts. Notion being great for collaborative teams and Roam for individuals. The thoughts you can capture with them are essentially "dead", i.e. non-computational, but they are still useful, while attacking the same space of ideas as Emacs Org mode.
I bet you know what
:wq! means and also why it's funny. Emacs
C-h k is just
man in Unix.
Ctrl+B is the same as
**bold** and that is the same as
<b>bold</b>. What about
/todo in Notion and
- [ ] in Markdown? Or
Ctrl+K in Nano and
dd in Vim? All these are examples of different uses of a programmers favorite input device. Clicky clacky!
A lot of professional software is starting to incorporate command palette interaction into their interface. Value proposition is easy – learn just one keyboard shortcut, and contextual full-text search across all commands will find exactly what you want. Bonus points for user-customizible aliases and macros! Vim and Emacs will still have their niche, but for mere mortals the future is command palette.
Btw Markdown is extremely more useful as a WYSIWYG shortcut system than a plain-text file format edited by hand. You can still use the shorthand, e.g. "**hello** world" but it will immediatelly gets converted to richly formatted text, i.e. "hello world", which you can further style or edit as you wish.
Now lets zoom-out from the content of the notebook. There is a convergence happening in computational notebook space. Traditional IDEs, e.g. VSCode or PyCharm, are adopting notebook interface as a first-class citizen. On the other side of the spectrum, JupyterLab is becoming a full-blown IDE with multiple documents, tabs, panes and of course, the command palette. The OG of computational notebooks – Wolfram Mathematica, which is now more than 30 years old (and lightyears ahead in some regards), resists this trend and is still using multi-window interface.
Is the IDE approach the right way to go? Doesn't it suck fun and simplicity out of the whole experience? Won't we lose the joy of explorative data mangling when there will be too many distractions? Is there any middle path to this dichotomy?
Git, but no conflicts pls
Sharing is caring, right? But sharing is hard. Collaboration is hard. Git conflicts are pain. Live collaboration is awesome, until you want to work without distractions. The norm is that sharing a URL with live editable document is much better than sending a
.ipynb file. Some of us like to work offline. To keep the best of the both worlds we might invent some new collaboration model more suited for needs of computational notebooks. Or should we follow JupyterLab and use CRDTs?
Granularity of sharing is problem on its own. You don't want to share your work-in-progress, but you want to share your results. You want to share your thoughts, but you don't want to share your code. You want to share your code, but you don't want to share your data. You want to share your data, but you don't want to share your models. You want to share your models, but you don't want to share your trade secrets. Can techniques like transclusion help here? Maybe. Should I be able to import some part of existing notebook into a new one and be sure the dependency won't ever break? Definitely.
Reiterating the vision
Compiling all these features and wishes into a single coherent tool is not an easy task, so we need some stepping stones. If Notion had first-class support for live code blocks, we would be halfway there. Throw in some integrations with storage & database providers, and we would be even closer.
Or if Observable focused on WYSIWYG editing, that might be an awesome experience. Reactivity is there, but it's not the only thing that matters.
Maybe VSCode might be the best platform to deliver such a tool. It's a great platform for code editing with any language, but it is coming from traditional software engineering perspective, which might be a bit too rigid for computational notebooks.
Or should we just port Emacs Org mode to WASM and call it a day?