-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Update template documentation * Add linting section * Add UV documentation * WIP params_files * Update params file user guide
- Loading branch information
Showing
9 changed files
with
236 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,19 @@ | ||
# Linting | ||
|
||
Linting is a static file analysis designed to detect common pitfalls with dso/dvc projects. The DSO linter | ||
needs further development. Please check [#5](https://github.com/Boehringer-Ingelheim/dso/issues/5) for the progress. | ||
|
||
## Configuration | ||
|
||
TODO | ||
|
||
## Linting rules | ||
|
||
```{eval-rst} | ||
.. module:: dso._lint | ||
.. autosummary:: | ||
:toctree: ../generated | ||
DSO001 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,169 @@ | ||
# Params files | ||
# Configuration files | ||
|
||
This section is the reference guide for configuration files. Explain inheritance, jinja 2 etc. in detail. | ||
YAML-based config files in a _project_, _folder_, or _stage_ serve as a single point of truth for all input files, output files or parameters. | ||
For this purpose, configurations can be defined at each level of your project in a `params.in.yaml` file. | ||
Using `dso compile-config` the `params.in.yaml` files are compiled into `params.yaml` with the following features: | ||
|
||
TODO | ||
- _inheritance_: All variables defined in `params.in.yaml` files in any parent directory will be included. | ||
- _templating_: Variables can be composed using [jinja2 syntax](https://jinja.palletsprojects.com/en/stable/templates/#variables), e.g. `foo: "{{ bar }}_version2"`. | ||
- _path resolving_: Paths will be always relative to each compiled `params.yaml` file, no matter where they were defined. | ||
|
||
## Compiling `params.yaml` files | ||
Therefore, you only need to [read in](#accessing-stage-config) a single `params.yaml` file in each stage. | ||
|
||
All `params.yaml` files are automatically generated using: | ||
## Compiling configuration files | ||
|
||
To generate a `params.yaml` file for each `params.in.yaml` file, use: | ||
|
||
```bash | ||
dso compile-config | ||
``` | ||
|
||
## Overwriting Parameters | ||
`params.yaml` files are not tracked by git. Never modify a `params.yaml` file by hand, it will be overwritten. | ||
In folders without a `params.in.yaml` file, no `params.yaml` file will be generated. | ||
|
||
## Inheritance | ||
|
||
The following diagram displays the inheritance of configurations: | ||
|
||
```{eval-rst} | ||
.. image:: ../img/dso-yaml-inherit.png | ||
:width: 80% | ||
``` | ||
|
||
<p></p> | ||
|
||
DSO leverages [hiyapyco](https://github.com/zerwes/hiyapyco) with `method=METHOD_MERGE` and `none_behavior=NONE_BEHAVIOR_OVERRIDE` | ||
to implement inheritance. This means | ||
|
||
- Values in a `params.in.yaml` file at a deeper level (e.g. stage) take precedence over values in a parent folder. | ||
- Values are added existing lists | ||
- Dictionary entried are added to existing dictionaries | ||
- To exclude an inherited parameter, set the variable to `null`. | ||
|
||
## Templating | ||
|
||
Templating is again implemented in [hiyapyco](https://github.com/zerwes/hiyapyco) using the `interpolate=True` flag. | ||
This allows variable to be composed using [jinja2 syntax](https://jinja.palletsprojects.com/en/stable/templates/#variables), e.g. `foo: "{{ bar }}_version2"`. | ||
|
||
## Defining paths | ||
|
||
To ensure that, despite inheritance, paths are always relative to each compiled `params.yaml` file, relative paths need to be preceded with `!path`, e.g.: | ||
|
||
```yaml | ||
samplesheet: !path "01_preprocessing/input/samplesheet.txt" | ||
``` | ||
DSO supports compiling paths into absolute and relative paths. Relative paths are relative to the location of | ||
each compiled `params.yaml` file. By default, DSO uses relative paths. To enable absolute paths, see | ||
[configuration](../cli_configuration.md#project-specific-settings----pyprojecttoml). To learn | ||
how to work with relative paths in Python/R scripts see [python usage](../python_usage.md) and [R usage](https://boehringer-ingelheim.github.io/dso-r/). | ||
|
||
## Example | ||
|
||
Let's consider a project which has the following two `params.in.yaml` files at the project root | ||
and in a stage subfolder. | ||
|
||
::::{grid} 1 1 2 2 | ||
|
||
:::{grid-item-card} `/params.in.yaml` | ||
|
||
```yaml | ||
thresholds: | ||
fc: 2 | ||
p_value: 0.05 | ||
metadata_file: !path "metadata/metadata.csv" | ||
dataset_name: typical_analysis | ||
file_with_abs_path: "/data/home/user/{{ dataset_name }}_data_set.csv" | ||
remove_outliers: true | ||
exclude_samples: | ||
- sample_1 | ||
- sample_6 | ||
``` | ||
|
||
::: | ||
|
||
:::{grid-item-card} `/stage/params.in.yaml` | ||
|
||
```yaml | ||
thresholds: | ||
fc: 3 | ||
p_adjusted: 0.1 | ||
samplesheet: !path "01_preprocessing/input/samplesheet.txt" | ||
remove_outliers: null | ||
exclude_samples: | ||
- sample_42 | ||
``` | ||
|
||
::: | ||
:::: | ||
|
||
This results in the following **compiled `params.yaml` files**: | ||
|
||
::::{grid} 1 1 2 2 | ||
|
||
:::{grid-item-card} `/params.yaml` | ||
|
||
```yaml | ||
thresholds: | ||
fc: 2 | ||
p_value: 0.05 | ||
metadata_file: metadata/metadata.csv | ||
dataset_name: typical_analysis | ||
file_with_abs_path: /data/home/user/typical_analysis_data_set.csv | ||
remove_outliers: true | ||
exclude_samples: | ||
- sample_1 | ||
- sample_6 | ||
``` | ||
|
||
::: | ||
:::{grid-item-card} `/stage/params.yaml` | ||
|
||
```yaml | ||
thresholds: | ||
fc: 3 | ||
p_value: 0.05 | ||
p_adjusted: 0.1 | ||
metadata_file: ../metadata/metadata.csv | ||
dataset_name: typical_analysis | ||
file_with_abs_path: /data/home/user/typical_analysis_data_set.csv | ||
remove_outliers: | ||
exclude_samples: | ||
- sample_1 | ||
- sample_6 | ||
- sample_42 | ||
samplesheet: 01_preprocessing/input/samplesheet.txt | ||
``` | ||
|
||
::: | ||
:::: | ||
|
||
## Accessing stage config | ||
|
||
To ensure that `dso` correctly reruns stages when dependencies have changed, it is really important | ||
to declare all input files/params in `dvc.yaml`. `dso compile-config` generates `params.yaml` files that, | ||
in principle, you can read in with a YAML parser in a programming language of your choice. | ||
However, we **recommend that you use one of the following interfaces to access the stage configuration**. | ||
These interfaces ensure that you will have access only to the parameters declared in the `dvc.yaml` file as | ||
either input, parameter, or output. This ensure that you cannot forget to declare a parameter that you actually | ||
use in your analysis. | ||
|
||
- `read_params` [in R](https://boehringer-ingelheim.github.io/dso-r/) | ||
- `read_params` [in Python](../python_usage.md) | ||
- `dso get-config` from the terminal. | ||
|
||
When multiple `params.in.yaml` files (such as those at the project, folder, or stage level) contain the same configuration, the value specified at the more specific level (e.g., stage) takes precedence over the value set at the broader level (e.g., project). This makes the analysis adaptable and enhances modifiability across the project. | ||
`dso get-config` prints the filtered params file for a given stage to STDOUT. This makes it really easy to | ||
call it from other languages as a system call. In fact, this is what `read_params` in R and Python are doing under the hood. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,40 @@ | ||
# `uv` integration | ||
|
||
TODO | ||
[`uv`](https://docs.astral.sh/uv/) is an ultrafast python package and project manager. Every dso | ||
project is also a `uv` project, so you can use all the features described in the uv [working with projects](https://docs.astral.sh/uv/guides/projects/) | ||
documentation. | ||
|
||
Integration with `uv` serves two main purposes: | ||
|
||
- freeze the version of `dso` per project to ensure reproducibility in the future, even if dso behavior changes. | ||
This features is a work-in-progress, see also [installation](../cli_installation.md#freezing-the-dso-version-within-a-project). | ||
- Provide a python virtual environment for all python stages in the project. | ||
|
||
Using a separate virtual environment for each project is considered good practice to ensure reproducibility and | ||
to avoid dependency conflicts. `uv` makes this very easy. | ||
|
||
To add dependencies, edit the `dependencies` section in `pyproject.toml` or use | ||
|
||
```bash | ||
uv add <some_package> | ||
``` | ||
|
||
to install it. | ||
|
||
By using | ||
|
||
```bash | ||
uv sync | ||
``` | ||
|
||
all requested packages are installed into the local `.venv` directory. At the same time a `uv.lock` file | ||
is created that pins the exact versions of each package. This file is tracked by `.git`, which means | ||
every collaborator will get exactly the same environment if they run `uv sync` on their machine. | ||
|
||
To run a script within the virtual environment, use | ||
|
||
```bash | ||
uv run ./some_script.py | ||
``` | ||
|
||
All DSO Python stages use the virtual environment by default. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters