Skip to content

Commit

Permalink
Improve JupyterExecutorBasic (#21)
Browse files Browse the repository at this point in the history
`JupyterExecutorBasic` has been restructured to:

- handle more exception
- have a better separation between cache access and notebook execution (may be changed further at a later date)
- Capture notebook execution, exception tracebacks and store them on the staged record (see README). This moves towards addressing #14
- Report final summary of executed/excepted on CLI

Also added an 'auto-generate' function for creating the CLI example section of the README
  • Loading branch information
chrisjsewell authored Mar 1, 2020
1 parent 66912e3 commit 9a43e29
Show file tree
Hide file tree
Showing 11 changed files with 576 additions and 153 deletions.
223 changes: 148 additions & 75 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,20 @@ to come ...

## Example CLI usage

From checked-out repository folder:
<!-- This section was auto-generated by: tests/make_cli_readme.py -->

From the checked-out repository folder:

```console
$ jcache -h
$ jcache --help
Usage: jcache [OPTIONS] COMMAND [ARGS]...

The command line interface of jupyter-cache.

Options:
-v, --version Show the version and exit.
-p, --cache-path Print the current cache path and exit.
-a, --autocomplete Print the terminal autocompletion command and exit.
-a, --autocomplete Print the autocompletion command and exit.
-h, --help Show this message and exit.

Commands:
Expand All @@ -72,13 +74,13 @@ eval "$(_JCACHE_COMPLETE=source jcache)"
### Caching Executed Notebooks

```console
$ jcache cache -h
Usage: jcache cache [OPTIONS] COMMAND [ARGS]...
$ jcache cache --help
Usage: cache [OPTIONS] COMMAND [ARGS]...

Commands for adding to and inspecting the cache.

Options:
-h, --help Show this message and exit.
--help Show this message and exit.

Commands:
add-many Cache notebook(s) that have already been executed.
Expand All @@ -90,13 +92,23 @@ Commands:
show Show details of a cached notebook in the cache.
```

You can add notebooks straight into the cache. When caching, a check will be made that the notebooks look to have been executed correctly, i.e. the cell execution counts go sequentially up from 1.
The first time the cache is required, it will be lazily created:

```console
$ jcache cache add-many tests/notebooks/basic.ipynb
Cache path: jupyter-cache/.jupyter_cache
$ jcache cache list
Cache path: ../.jupyter_cache
The cache does not yet exist, do you want to create it? [y/N]: y
Caching: jupyter-cache/tests/notebooks/basic.ipynb
No Cached Notebooks

```

You can add notebooks straight into the cache.
When caching, a check will be made that the notebooks look to have been executed
correctly, i.e. the cell execution counts go sequentially up from 1.

```console
$ jcache cache add-many tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic.ipynb
Validity Error: Expected cell 1 to have execution_count 1 not 2
The notebook may not have been executed, continue caching? [y/N]: y
Success!
Expand All @@ -105,34 +117,44 @@ Success!
Or to skip validation:

```console
jcache cache add-many --no-validate tests/notebooks/*.ipynb
Caching: jupyter-cache/tests/notebooks/basic.ipynb
Caching: jupyter-cache/tests/notebooks/basic_failing.ipynb
Caching: jupyter-cache/tests/notebooks/basic_unrun.ipynb
Caching: jupyter-cache/tests/notebooks/complex_outputs.ipynb
Caching: jupyter-cache/tests/notebooks/external_output.ipynb
$ jcache cache add-many --no-validate tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
Caching: ../tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic_failing.ipynb
Caching: ../tests/notebooks/basic_unrun.ipynb
Caching: ../tests/notebooks/complex_outputs.ipynb
Caching: ../tests/notebooks/external_output.ipynb
Success!
```

Once you've cached some notebooks, you can look at the 'cache records' for what has been cached.
Once you've cached some notebooks, you can look at the 'cache records'
for what has been cached.

Each notebook is hashed (code cells and kernel spec only), which is used to compare against 'staged' notebooks. Multiple hashes for the same URI can be added (the URI is just there for inspetion) and the size of the cache is limited (current default 1000) so that, at this size, the last accessed records begin to be deleted. You can remove cached records by their ID.
Each notebook is hashed (code cells and kernel spec only),
which is used to compare against 'staged' notebooks.
Multiple hashes for the same URI can be added
(the URI is just there for inspetion) and the size of the cache is limited
(current default 1000) so that, at this size,
the last accessed records begin to be deleted.
You can remove cached records by their ID.

```console
$ jcache cache list
ID URI Created Accessed
---- ------------------------------------- ---------------- ----------------
5 tests/notebooks/external_output.ipynb 2020-02-29 03:17 2020-02-29 03:17
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 03:17 2020-02-29 03:17
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 03:17 2020-02-29 03:17
2 tests/notebooks/basic_failing.ipynb 2020-02-29 03:17 2020-02-29 03:17
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 2020-02-29 17:48
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 17:48 2020-02-29 17:48
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 2020-02-29 17:48
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 2020-02-29 17:48
```

You can also cache notebooks with artefacts (external outputs of the notebook execution).
You can also cache notebooks with artefacts
(external outputs of the notebook execution).

```console
$ jcache cache add-one -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
Caching: jupyter-cache/tests/notebooks/basic.ipynb
Caching: ../tests/notebooks/basic.ipynb
Validity Error: Expected cell 1 to have execution_count 1 not 2
The notebook may not have been executed, continue caching? [y/N]: y
Success!
```

Expand All @@ -141,9 +163,9 @@ Show a full description of a cached notebook by referring to its ID
```console
$ jcache cache show 6
ID: 6
URI: jupyter-cache/tests/notebooks/basic.ipynb
Created: 2020-02-29 03:19
Accessed: 2020-02-29 03:19
URI: ../tests/notebooks/basic.ipynb
Created: 2020-02-29 17:48
Accessed: 2020-02-29 17:48
Hashkey: 818f3412b998fcf4fe9ca3cca11a3fc3
Artifacts:
- artifact_folder/artifact.txt
Expand All @@ -153,14 +175,14 @@ Note artefact paths must be 'upstream' of the notebook folder:

```console
$ jcache cache add-one -nb tests/notebooks/basic.ipynb tests/test_db.py
Caching: jupyter-cache/tests/notebooks/basic.ipynb
Artifact Error: Path 'jupyter-cache/tests/test_db.py' is not in folder 'jupyter-cache/tests/notebooks''
Caching: ../tests/notebooks/basic.ipynb
Artifact Error: Path '../tests/test_db.py' is not in folder '../tests/notebooks''
```

To view the contents of an execution artefact:

```console
$ jcache cache cat-artifact 1 artifact_folder/artifact.txt
$ jcache cache cat-artifact 6 artifact_folder/artifact.txt
An artifact

```
Expand All @@ -179,8 +201,8 @@ You can also diff any of the cached notebooks with any (external) notebook:
$ jcache cache diff-nb 2 tests/notebooks/basic.ipynb
nbdiff
--- cached pk=2
+++ other: sandbox/tests/notebooks/basic.ipynb
## inserted before nb/cells/1:
+++ other: ../tests/notebooks/basic.ipynb
## inserted before nb/cells/0:
+ code cell:
+ execution_count: 2
+ source:
Expand All @@ -193,22 +215,25 @@ nbdiff
+ text:
+ 1

## deleted nb/cells/1:
## deleted nb/cells/0:
- code cell:
- source:
- raise Exception('oopsie!')


Success!
```

### Staging Notebooks for execution

```console
$ jcache stage -h
Usage: jcache stage [OPTIONS] COMMAND [ARGS]...
$ jcache stage --help
Usage: stage [OPTIONS] COMMAND [ARGS]...

Commands for staging notebooks to be executed.

Options:
-h, --help Show this message and exit.
--help Show this message and exit.

Commands:
add-many Stage notebook(s) for execution.
Expand All @@ -222,29 +247,29 @@ Commands:
Staged notebooks are recorded as pointers to their URI,
i.e. no physical copying takes place until execution time.

If you stage some notebooks for execution,
then you can list them to see which have existing records in the cache (by hash),
If you stage some notebooks for execution, then
you can list them to see which have existing records in the cache (by hash),
and which will require execution:

```console
$ jcache stage add-many tests/notebooks/*.ipynb
Staging: jupyter-cache/tests/notebooks/basic.ipynb
Staging: jupyter-cache/tests/notebooks/basic_failing.ipynb
Staging: jupyter-cache/tests/notebooks/basic_unrun.ipynb
Staging: jupyter-cache/tests/notebooks/complex_outputs.ipynb
Staging: jupyter-cache/tests/notebooks/external_output.ipynb
$ jcache stage add-many tests/notebooks/basic.ipynb tests/notebooks/basic_failing.ipynb tests/notebooks/basic_unrun.ipynb tests/notebooks/complex_outputs.ipynb tests/notebooks/external_output.ipynb
Staging: ../tests/notebooks/basic.ipynb
Staging: ../tests/notebooks/basic_failing.ipynb
Staging: ../tests/notebooks/basic_unrun.ipynb
Staging: ../tests/notebooks/complex_outputs.ipynb
Staging: ../tests/notebooks/external_output.ipynb
Success!
```

```console
$ jcache stage list
ID URI Created Assets Cache ID
---- ------------------------------------- ---------------- -------- ----------
5 tests/notebooks/external_output.ipynb 2020-02-29 03:29 0 5
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 03:29 0
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 03:29 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 03:29 0 2
1 tests/notebooks/basic.ipynb 2020-02-29 03:29 0 6
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 0 5
4 tests/notebooks/complex_outputs.ipynb 2020-02-29 17:48 0
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 0 2
1 tests/notebooks/basic.ipynb 2020-02-29 17:48 0 6
```

You can remove a staged notebook by its URI or ID:
Expand All @@ -258,39 +283,91 @@ Success!
You can then run a basic execution of the required notebooks:

```console
$ jcache cache remove 6
$ jcache cache remove 6 2
Removing Cache ID = 6
Removing Cache ID = 2
Success!
```

```console
$ jcache execute
Executing: jupyter-cache/tests/notebooks/basic.ipynb
Success: jupyter-cache/tests/notebooks/basic.ipynb
Executing: jupyter-cache/tests/notebooks/basic_unrun.ipynb
Success: jupyter-cache/tests/notebooks/basic_unrun.ipynb
Executing: ../tests/notebooks/basic.ipynb
Execution Succeeded: ../tests/notebooks/basic.ipynb
Executing: ../tests/notebooks/basic_failing.ipynb
error: Execution Failed: ../tests/notebooks/basic_failing.ipynb
Executing: ../tests/notebooks/basic_unrun.ipynb
Execution Succeeded: ../tests/notebooks/basic_unrun.ipynb
Finished!
succeeded:
- ../tests/notebooks/basic.ipynb
- ../tests/notebooks/basic_unrun.ipynb
excepted:
- ../tests/notebooks/basic_failing.ipynb
errored: []

```

Successfully executed notebooks will be cached to the cache,
along with any 'artefacts' created by the execution, that are inside the notebook folder, and data supplied by the executor.
along with any 'artefacts' created by the execution,
that are inside the notebook folder, and data supplied by the executor.

```console
$ jcache stage list
ID URI Created Assets Cache ID
---- ------------------------------------- ---------------- -------- ----------
5 tests/notebooks/external_output.ipynb 2020-02-29 03:29 0 5
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 03:29 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 03:29 0 2
1 tests/notebooks/basic.ipynb 2020-02-29 03:29 0 6
5 tests/notebooks/external_output.ipynb 2020-02-29 17:48 0 5
3 tests/notebooks/basic_unrun.ipynb 2020-02-29 17:48 0 6
2 tests/notebooks/basic_failing.ipynb 2020-02-29 17:48 0
1 tests/notebooks/basic.ipynb 2020-02-29 17:48 0 6
```

Execution data (such as execution time) will be stored in the cache record:

```console
$ jcache cache show 6
ID: 6
URI: jupyter-cache/tests/notebooks/basic_unrun.ipynb
Created: 2020-02-29 03:41
Accessed: 2020-02-29 03:41
URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2020-02-29 17:48
Accessed: 2020-02-29 17:48
Hashkey: 818f3412b998fcf4fe9ca3cca11a3fc3
Data:
execution_seconds: 1.2328746560000003
execution_seconds: 1.2727476909999993

```

Failed notebooks will not be cached, but the exception traceback will be added to the stage record:

```console
$ jcache stage show 2
ID: 2
URI: ../tests/notebooks/basic_failing.ipynb
Created: 2020-02-29 17:48
Failed Last Execution!
Traceback (most recent call last):
File "../jupyter_cache/executors/basic.py", line 152, in execute
executenb(nb_bundle.nb, cwd=tmpdirname)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 737, in executenb
return ep.preprocess(nb, resources, km=km)[0]
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
raise Exception('oopsie!')
------------------

---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-1-714b2b556897> in <module>
----> 1 raise Exception('oopsie!')

Exception: oopsie!
Exception: oopsie!


```

Once executed you may leave staged notebooks, for later re-execution, or remove them:
Expand All @@ -305,28 +382,24 @@ Unstaging ID: 5
Success!
```

You can also stage notebooks with assets; external files that are required by the notebook during execution. As with artefacts,
these files must be in the same folder as the notebook, or a sub-folder.
You can also stage notebooks with assets;
external files that are required by the notebook during execution.
As with artefacts, these files must be in the same folder as the notebook,
or a sub-folder.

```console
$ jcache stage add-one -nb tests/notebooks/basic.ipynb tests/notebooks/artifact_folder/artifact.txt
Success!
```

```console
$ jcache stage list
ID URI Created Assets
---- --------------------------- ---------------- --------
1 tests/notebooks/basic.ipynb 2020-02-25 10:01 1
```

```console
$ jcache stage show 1
ID: 1
URI: jupyter-cache/tests/notebooks/basic.ipynb
Created: 2020-02-25 10:01
URI: ../tests/notebooks/basic.ipynb
Created: 2020-02-29 17:48
Cache ID: 6
Assets:
- jupyter-cache/tests/notebooks/artifact_folder/artifact.txt
- ../tests/notebooks/artifact_folder/artifact.txt
```

## Contributing
Expand Down
Loading

0 comments on commit 9a43e29

Please sign in to comment.