Skip to content

Commit

Permalink
Add python stages (#98)
Browse files Browse the repository at this point in the history
* Add python stages

* Update CHANGELOG

* Fix dvc.yaml

* Update docs
  • Loading branch information
grst authored Jan 30, 2025
1 parent add568c commit cd3edb9
Show file tree
Hide file tree
Showing 28 changed files with 167 additions and 7 deletions.
Binary file added .coverage
Binary file not shown.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ and this project adheres to [Semantic Versioning][].

## [Unreleased]

### New features

- Add templates for Python stages (`quarto_py`, `quarto_ipynb`) ([#98](https://github.com/Boehringer-Ingelheim/dso/pull/98)).

### Documentation

- Update documentation, finalizing the most important sections of the user guide.
Expand All @@ -27,7 +31,7 @@ and this project adheres to [Semantic Versioning][].
- Do not change the configuration of the root logger, only the `dso` logger. Changing the root logger
had side-effects on other libraries when importing `dso` in Python ([#80](https://github.com/Boehringer-Ingelheim/dso/pull/80)).

### New Features
### New features

- Paths in `params.in.yaml` files declared with `!path` can now be compiled to absolute instead of relative paths ([#78](https://github.com/Boehringer-Ingelheim/dso/pull/78)).
- Python API that mirrors `dso-r` functionality (e.g. to be used from Jupyter notebooks) ([#30](https://github.com/Boehringer-Ingelheim/dso/pull/30))
Expand Down
3 changes: 2 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ For more details, please refer to [Configuration files](user_guide/params_files.

## Implementing a stage

A stage is a single step in your analysis and usually generates some kind of output data from input data. The input data can also be supplied by previous stages. To create a stage, use the `dso create stage` command and select either the _bash_ or _quarto_ template as a starting-point.
A stage is a single step in your analysis and usually generates some kind of output data from input data. The input data can also be supplied by previous stages. To create a stage, use the `dso create stage` command and select either the _bash_ or one of the _quarto_
[templates](user_guide/templates.md) as a starting-point.

The essential files of a stage are:

Expand Down
4 changes: 3 additions & 1 deletion docs/user_guide/templates.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ Folder templates:

Stage templates:

- [quarto](https://github.com/Boehringer-Ingelheim/dso/tree/main/src/dso/templates/stage/quarto) - Template for quarto notebook in R
- [quarto_r](https://github.com/Boehringer-Ingelheim/dso/tree/main/src/dso/templates/stage/quarto) - Template for quarto notebook in R
- [quarto_py](https://github.com/Boehringer-Ingelheim/dso/tree/main/src/dso/templates/stage/quarto) - Template for quarto notebook in Python (quarto markdown (`.qmd`) format)
- [quarto_r](https://github.com/Boehringer-Ingelheim/dso/tree/main/src/dso/templates/stage/quarto) - Template for quarto notebook in Python (jupyter notebook (`.ipynb`) format)
- [bash](https://github.com/Boehringer-Ingelheim/dso/tree/main/src/dso/templates/stage/bash) - Template for executing a bash snippet

The source code of the templates can be [inspected on GitHub](https://github.com/Boehringer-Ingelheim/dso/tree/main/src/dso/templates).
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ dependencies = [
"ruamel-yaml",
"svgutils",
"tomli; python_version<='3.10'",
"uv",
]

optional-dependencies.dev = [ "hatch", "pre-commit" ]
Expand Down
2 changes: 1 addition & 1 deletion src/dso/_quarto.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def render_quarto(quarto_dir: Path, report_dir: Path, before_script: str, cwd: P
{before_script}
quarto render "{quarto_dir}" --output-dir "{report_dir}" {quiet} {pandocfilter}
quarto render "{quarto_dir}" --execute --output-dir "{report_dir}" {quiet} {pandocfilter}
"""
)
res = subprocess.run(script, shell=True, executable="/bin/bash", cwd=cwd)
Expand Down
4 changes: 3 additions & 1 deletion src/dso/cli/_create.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
# list of stage template with description - can be later populated also from external directories
STAGE_TEMPLATES = {
"bash": "Execute a simple bash snippet or call an external script (e.g. nextflow)",
"quarto": "Generate a report using quarto",
"quarto_r": "Generate a quarto report using R (qmd file)",
"quarto_py": "Generate a quarto report using Python (qmd file)",
"quarto_ipynb": "Generate a quarto report using Python (ipynb file)",
}
# Create help text for CLI listing all templates
STAGE_TEMPLATE_TEXT = "\n".join(f" * __{name}__: {description}" for name, description in STAGE_TEMPLATES.items())
Expand Down
7 changes: 7 additions & 0 deletions src/dso/templates/init/default/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@ repos:
# args: [--fix, --exit-non-zero-on-fix]
# - id: ruff-format
# types_or: [python, pyi, jupyter]

# for ipynb files in `src` directories: we never want to commit any output as rendered output files
# are tracked by dvc
- repo: https://github.com/kynan/nbstripout
rev: "0.8.1"
hooks:
- id: nbstripout
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
Expand Down
File renamed without changes.
File renamed without changes.
11 changes: 11 additions & 0 deletions src/dso/templates/stage/quarto_ipynb/dvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
stages:
{{ stage_name }}:
params:
- dso.quarto
deps:
- src/{{ stage_name }}.ipynb
outs:
- output
- report/{{ stage_name }}.html
cmd:
- uv run dso exec quarto .
64 changes: 64 additions & 0 deletions src/dso/templates/stage/quarto_ipynb/src/{{ stage_name }}.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# | label: load_libraries\n",
"\n",
"import pandas as pd\n",
"\n",
"from dso import read_params, stage_here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the stage-specific 'params.yaml' config using the `read_params(..)` function. This function specifically loads\n",
"only the stage-dependent parameters that are defined in the 'params' section of the 'dvc.yaml' file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# | label: read_params\n",
"\n",
"params = read_params(\"{{ stage_path }}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To locate your files relative to the stage path use `stage_here(..)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# | label: obtain_files_relative_to_stage_dir\n",
"\n",
"# e.g.\n",
"samplesheet = pd.read_csv(stage_here(params[\"samplesheet\"]))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
7 changes: 7 additions & 0 deletions src/dso/templates/stage/quarto_py/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/log
/tmp
/temp
/input/*
!/input/*.dvc
/output/*
/report/*
3 changes: 3 additions & 0 deletions src/dso/templates/stage/quarto_py/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# {{ stage_name }}

{{ stage_description }}
11 changes: 11 additions & 0 deletions src/dso/templates/stage/quarto_py/dvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
stages:
{{ stage_name }}:
params:
- dso.quarto
deps:
- src/{{ stage_name }}.qmd
outs:
- output
- report/{{ stage_name }}.html
cmd:
- uv run dso exec quarto .
6 changes: 6 additions & 0 deletions src/dso/templates/stage/quarto_py/src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/.quarto/
/*.html
# temporary quarto files
*.rmarkdown
/_quarto.yml
/*_files/
25 changes: 25 additions & 0 deletions src/dso/templates/stage/quarto_py/src/{{ stage_name }}.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
```{python}
# | label: load_libraries
from dso import read_params, stage_here
import pandas as pd
```

Load the stage-specific 'params.yaml' config using the `read_params(..)` function. This function specifically loads
only the stage-dependent parameters that are defined in the 'params' section of the 'dvc.yaml' file.

```{python}
#| label: read_params
params = read_params("{{ stage_path }}")
```


To locate your files relative to the stage path use `stage_here(..)`.

```{python}
# | label: obtain_files_relative_to_stage_dir
# e.g.
samplesheet = pd.read_csv(stage_here(params["samplesheet"]))
```
7 changes: 7 additions & 0 deletions src/dso/templates/stage/quarto_r/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/log
/tmp
/temp
/input/*
!/input/*.dvc
/output/*
/report/*
3 changes: 3 additions & 0 deletions src/dso/templates/stage/quarto_r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# {{ stage_name }}

{{ stage_description }}
File renamed without changes.
Empty file.
6 changes: 6 additions & 0 deletions src/dso/templates/stage/quarto_r/src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/.quarto/
/*.html
# temporary quarto files
*.rmarkdown
/_quarto.yml
/*_files/
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def quarto_stage(dso_project) -> Path:
runner = CliRunner()
stage_name = "quarto_stage"
chdir(dso_project)
runner.invoke(dso_create, ["stage", stage_name, "--template", "quarto", "--description", "a quarto stage"])
runner.invoke(dso_create, ["stage", stage_name, "--template", "quarto_r", "--description", "a quarto stage"])
with (Path(stage_name) / "src" / f"{stage_name}.qmd").open("w") as f:
f.write(
dedent(
Expand Down
2 changes: 1 addition & 1 deletion tests/test_create.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from dso.cli import dso_create


@pytest.mark.parametrize("template", ["bash", "quarto"])
@pytest.mark.parametrize("template", ["bash", "quarto_r", "quarto_py", "quarto_ipynb"])
def test_create_stage(dso_project, template):
runner = CliRunner()
chdir(dso_project)
Expand Down

0 comments on commit cd3edb9

Please sign in to comment.