Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Profile data mirroring #6723

Draft
wants to merge 70 commits into
base: main
Choose a base branch
from

Conversation

GeigerJ2
Copy link
Contributor

No description provided.

GeigerJ2 and others added 30 commits June 24, 2024 12:36
Adds the option to dump workflows and structures, sorted by groups or
not. Also checks for repository size and asks for confirmation if larger
than 10gb.
…ects (aiidateam#6566)

Apparently, somehow `mypy` is updated and it is complaining about an `ignore[assignment]` comment that previously was imposed by `mypy` itself. (in src/aiida/orm/utils/serialize.py::L51)
This commit removed `ignore[assignment]`, and `ci-style / pre-commit (pull_request)`  is passing now.
This commit copies the behavior of `verdi group list`, simply by setting a filter, one can get rid of all matching groups at once.
As specified
[here](https://aiida.readthedocs.io/projects/aiida-core/en/v2.6.2/topics/data_types.html#exporting-data-nodes) in the
docs, the `Data` class implements an `export()` method. For actually exporting `Data` nodes, subclasses should implement
a `_prepare_XXX` method, where `XXX` is the dataformat. When running `verdi data <orm-data-type> export`, he available
data formats for exporting are then dynamically determined based on the implemented `_prepare_` methods. The `Code`
classes didn't follow this specification (likely because `Code` wasn't historically a subclass of `Data` as mentioned by
@giovannipizzi), but instead a custom implementation was used for `verdi code export` in `cmd_code.py`.

I have now moved the code of this custom implementation to the new `_prepare_yml` method of the `AbstractCode` class (in
`AbstractCode` rather than `Code` as `Code` is in `legacy.py` and will eventually be removed, I guess). The `export`
function in `cmd_code.py` now calls the `data_export` function, as is done when exporting other classes derived from
`Data`.

The way of exporting a `Code` remains unchanged via `verdi code export`.
I _didn't_ add a:

```python
'core.code' = 'aiida.cmdline.commands.cmd_data.cdm_code:code'
````

entry point in the `pyproject.toml` under the
`[project.entry-points.'aiida.cmdline.data']`, as I don't think we need
two ways of exporting `Code`s right now (even thoughexporting `Code`s via `verdi data
core.code export` would be more consistent with the other `Data`
classes).

Lastly, the original motivation for this refactor is me working on a more general dumping feature (PR coming soon) that
allows users to dump all data in their profile to disk. Here, I was
using the general `data_export` function as the "default" exporter (that
can be overriden by the user, as can the output format). The
implementation of `Code` exporting differing from the other `Data`
nodes made the mapping a bit ugly.

PS: One more question: Why does `_exportcontent` always return a tuple
with an empty dictionary as the second element?
of `PortableCode`.

Some notes on the implementation:
- Removed the `put_object_from_tree` from the `__init__`, as this is
  done in the `setter` which is anyway called when the `property` is
  actually set in the `__init__`.
- Added check for absolute Path (would otherwise be
captured in `Repository.put_object_from_tree`, but maybe better to catch
here already).
- Not sure if the repository should be erased before putting
another `object_from_tree`. The logic I followed is that if one wants to
set another `filepath_files` to not keep the previous ones and get a
polluted repository, but rather just setting the new one. However, it
should be avoided ofc that unwanted things are removed from the
repository. And, if one wants to add multiple directories, etc., this
situation is described in the constructor docstring, and one can use the
methods of the `NodeRepository`, so I guess this shouldn't be done via
setting the `filepath_files`
GeigerJ2 and others added 30 commits October 4, 2024 14:32
which were called by a workflow that is part of a group. Without
extending the `nodes_in_groups` by the `called_descendants` of the
`WorkflowNode`s, a top-level directory `calculations` will be created
and the called `CalculationNode`s of a workflow that is assigned a group
would be dumped. This leads to duplication, as the calculations are
already being dumped in the respective `workflows` directory within the
group directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants