WIP: Profile data mirroring #6723

GeigerJ2 · 2025-01-22T12:56:30Z

No description provided.

Adds the option to dump workflows and structures, sorted by groups or not. Also checks for repository size and asks for confirmation if larger than 10gb.

…ects (aiidateam#6566) Apparently, somehow `mypy` is updated and it is complaining about an `ignore[assignment]` comment that previously was imposed by `mypy` itself. (in src/aiida/orm/utils/serialize.py::L51) This commit removed `ignore[assignment]`, and `ci-style / pre-commit (pull_request)` is passing now.

This commit copies the behavior of `verdi group list`, simply by setting a filter, one can get rid of all matching groups at once.

…le-dump

@giovannipizzi

As specified [here](https://aiida.readthedocs.io/projects/aiida-core/en/v2.6.2/topics/data_types.html#exporting-data-nodes) in the docs, the `Data` class implements an `export()` method. For actually exporting `Data` nodes, subclasses should implement a `_prepare_XXX` method, where `XXX` is the dataformat. When running `verdi data <orm-data-type> export`, he available data formats for exporting are then dynamically determined based on the implemented `_prepare_` methods. The `Code` classes didn't follow this specification (likely because `Code` wasn't historically a subclass of `Data` as mentioned by @giovannipizzi), but instead a custom implementation was used for `verdi code export` in `cmd_code.py`. I have now moved the code of this custom implementation to the new `_prepare_yml` method of the `AbstractCode` class (in `AbstractCode` rather than `Code` as `Code` is in `legacy.py` and will eventually be removed, I guess). The `export` function in `cmd_code.py` now calls the `data_export` function, as is done when exporting other classes derived from `Data`. The way of exporting a `Code` remains unchanged via `verdi code export`. I _didn't_ add a: ```python 'core.code' = 'aiida.cmdline.commands.cmd_data.cdm_code:code' ```` entry point in the `pyproject.toml` under the `[project.entry-points.'aiida.cmdline.data']`, as I don't think we need two ways of exporting `Code`s right now (even thoughexporting `Code`s via `verdi data core.code export` would be more consistent with the other `Data` classes). Lastly, the original motivation for this refactor is me working on a more general dumping feature (PR coming soon) that allows users to dump all data in their profile to disk. Here, I was using the general `data_export` function as the "default" exporter (that can be overriden by the user, as can the output format). The implementation of `Code` exporting differing from the other `Data` nodes made the mapping a bit ugly. PS: One more question: Why does `_exportcontent` always return a tuple with an empty dictionary as the second element?

of `PortableCode`. Some notes on the implementation: - Removed the `put_object_from_tree` from the `__init__`, as this is done in the `setter` which is anyway called when the `property` is actually set in the `__init__`. - Added check for absolute Path (would otherwise be captured in `Repository.put_object_from_tree`, but maybe better to catch here already). - Not sure if the repository should be erased before putting another `object_from_tree`. The logic I followed is that if one wants to set another `filepath_files` to not keep the previous ones and get a polluted repository, but rather just setting the new one. However, it should be avoided ofc that unwanted things are removed from the repository. And, if one wants to add multiple directories, etc., this situation is described in the constructor docstring, and one can use the methods of the `NodeRepository`, so I guess this shouldn't be done via setting the `filepath_files`

for more information, see https://pre-commit.ci

…quired.

…ge-mirror

which were called by a workflow that is part of a group. Without extending the `nodes_in_groups` by the `called_descendants` of the `WorkflowNode`s, a top-level directory `calculations` will be created and the called `CalculationNode`s of a workflow that is assigned a group would be dumped. This leads to duplication, as the calculations are already being dumped in the respective `workflows` directory within the group directory.

for more information, see https://pre-commit.ci

GeigerJ2 and others added 30 commits June 24, 2024 12:36

First version of ProfileDumper

5f2c57e

Adds the option to dump workflows and structures, sorted by groups or not. Also checks for repository size and asks for confirmation if larger than 10gb.

Added first CLI entry point

5551845

Add dump_code_computers method

dd5a3bd

Move _validate_make_dump_path function to utils.py

38a2d75

Add full CLI option; validate_path; profile info

edf74f1

Added unfinished node not implemented warning

86eaa52

Merge branch 'aiidateam:main' into feature/verdi-profile-dump

45e6eab

Commit before implementing group dumping and abstraction

bd3b888

Push to continue at home

c85e51a

First version of linking the calculations works.

eabd7a0

Not sure what state this is in now...

80bb4a0

Commit before implementing changes of discussion with GP

e845d4d

prefix variable for _generate_default_dump_path

43a6cb8

Dump raw node attributes of input/output nodes of CJ

76f991b

Realized classes derived from Data should implement _prepare_ method

ce23856

Use hardcoded default rich export dict

cefff85

Use data_export also for Code export.

bf67f46

Dependencies: Update requirement paramiko~=3.0 (aiidateam#6559)

5649076

CLI: Add filters to verdi group delete. (aiidateam#6556)

94a2432

This commit copies the behavior of `verdi group list`, simply by setting a filter, one can get rid of all matching groups at once.

Merge remote-tracking branch 'upstream/main' into feature/verdi-profi…

a0f4e49

…le-dump

Move to verdi storage mirror. Group dumping works.

aa821b8

Remove TODO regarding verdi code create

a0c9b28

Fix tests for validate_output_filename

2710f5e

Add _prepare_yaml and hard-code default format

016de8a

Add proper code instance generators in test_data.py

df97fa0

Fix default fileformat to yaml

64fb94c

Add filepath_files as attribute to PortableCode

c5f460b

GeigerJ2 and others added 30 commits October 4, 2024 14:32

Commit before removing merging the different CollectionDumpers.

357e03c

Pass instances of DataDumper and ProcessDumper to CollectionDumper

57ba110

Remove AbstractDumper and backup-code.py.

a7808bf

Remove backup-code.py.bak.

d58b042

Fully remove backup files and revert process.py renaming.

9381785

Fix ProcessDumper import

fd9f694

[pre-commit.ci] auto fixes from pre-commit.com hooks

200c0ba

for more information, see https://pre-commit.ci

Delete src/aiida/tools/dumping/refactor.py

3499d52

[pre-commit.ci] auto fixes from pre-commit.com hooks

8cedf87

for more information, see https://pre-commit.ci

Delete src/aiida/tools/dumping/backup-code.py.bak

873069b

Hidden/non-hidden per-group data dumping works. Some cleanup still re…

001f370

…quired.

Hidden data dumping works, at least outside of the ProcessDumper

1f1fbba

Disable dumping individual entities for now. Re-introduce later.

f0ae023

First specifying of rich data_dumping options from CLI works

9aaa344

Merge remote-tracking branch 'upstream/main' into feature/verdi-stora…

1b11216

…ge-mirror

Passing config file works, including rich-dumping spec

81cb798

rich_options -> rich_spec & rich_options_dict -> rich_spec_dict

03fec15

First version of incremental dumping should be there

dc1da53

Selecting specific nodes or groups works

55fbc75

Changes in the file paths for data dumping

b4f581d

Some changes, who knows what...

cae0ae0

[pre-commit.ci] auto fixes from pre-commit.com hooks

2ac8294

for more information, see https://pre-commit.ci

Merge branch 'main' into feature/verdi-storage-mirror-bak

e1d9f3f

Remove duplicate code in utils.py

d2a5a81

Move dumping test fixtures to global conftest.py

cf9d221

Commit to save all changes.

2c6d29e

Adding incremental files.

3c900ac

Merge branch 'aiidateam:main' into feature/verdi-storage-mirror-bak

1aa81ba

[pre-commit.ci] auto fixes from pre-commit.com hooks

ac93e54

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Profile data mirroring #6723

WIP: Profile data mirroring #6723

GeigerJ2 commented Jan 22, 2025

WIP: Profile data mirroring #6723

Are you sure you want to change the base?

WIP: Profile data mirroring #6723

Conversation

GeigerJ2 commented Jan 22, 2025