Introducing Zoose

Zoose is an open-source Docker container image for Jupyter notebooks, pre-loaded with common Python and R packages for data science as well as a Neo4j web server for graph analytics with Cypher.

Containerized notebooks with a standard set of packages make sharing and reproducing notebooks easier: there is no need for virtual environments and brittle configurations that break on Python upgrades. Just run the following command and you have a Python or R notebook environment with lots of packages at your fingertips:

docker run --pull always --rm -it \
  -v $(pwd):$(pwd) -w $(pwd) \
  -p 8888:8888 -p 7473:7373 -p 7474:7474 -p 7687:7687 \
  "databaseline/zoose"

Since Zoose is versioned, you can re-run any notebook created with it as long as you know which version of Zoose the notebook was created on. Note that Zoose cannot guarantee the data you use is still available, especially when connecting to external sources (e.g. BigQuery).

Zoose has been used internally at Nubank for sharing data-backed research and automated analyses of micro-surveys (i.e. targeted one-question surveys to validate internal product ideas with a single click per user) since August 2021.

Python packages include, but are not limited to, Keras, Matplotlib, NLTK, Numpy, Pandas, py2neo, SciPy, scikit-learn, Seaborn, spaCy. For R, you have immediate access to everything in tidyverse, tidymodels, and a few others. If you want to know what exactly is included, please go to GitHub.

Some noteworthy Jupyter extensions are enabled by default, such as auto-formatting of cells, code folding, easy (un)commenting of lines, an integrated table of contents, a scratchpad for quick calculations that do clutter the notebook itself, dynamic Markdown cells, and a spelling checker.