-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine: Allow CalcJob
monitors to return outputs
#6191
Engine: Allow CalcJob
monitors to return outputs
#6191
Conversation
4c283ba
to
d0c5c02
Compare
@edan-bainglass this might be a feature of interest |
Thanks @sphuber. I'll take a look at it as soon as I'm back from vacay 🏝️ |
d0c5c02
to
9211f43
Compare
@edan-bainglass did you want to have a look at this? Otherwise I think I will go ahead and merge this |
Yes. I'll give it a look by end of Friday. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good 👍
As mentioned in a comment, a use-case for persistence of partial data would be interesting. I wonder if this is a necessary step towards (or sheds light on) monitor-triggered calcjobs, for which we DO have a requirement!
.. versionadded:: 2.5.0 | ||
|
||
Monitors can also attach outputs to the calculation that it is monitoring. | ||
This can be useful to report outputs while the calculation is running that should still be stored permanently in the provenance graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the use case of Aurora, there is no desire to store snapshots permanently. In the current implementation, I continuously overwrite extras.snapshot
with the latest. The field is discarded on job completion. But perhaps partial data can be important in some cases, say, in a structure optimization calculation, where intermediates are of value down the road (though many if not all codes take care of this internally).
retrieved = retrieve_calculation(self.node, transport, retrieved_temporary_folder.abspath) | ||
if retrieved is not None: | ||
self.out(self.node.link_label_retrieved, retrieved) | ||
self.update_outputs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of those tasks, the process will change to a new state and the outputs will be "flushed" to the database by
update_outputs
.
Is _perform_import
where this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is the method that is called when a CalcJobNode
is run in "import" mode. That is when a remote_folder
is passed as an input, in which case the actual code is not run, but a CalcJobNode
is created taking the outputs from the already existing remote_folder
. The comment you quoted to the lifetime of an actual CalcJob
process that is being run, which will call update_outputs
on state changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is it then actually calling update_outputs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.update_outputs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 Hopefully one day I'll have time to find out how this line connects to the rest of AiiDA's mechanism
The handling of outputs of a process are handled as follows. A new output is registered by calling `Process.out`. This validates the output against the process specification, and if it passes, the output is added to the `Process._outputs` mapping. Whenever a new state is entered, `Process.on_entered` is triggered which calls `Process.update_outputs`. This goes through all the outputs in memory and adds a link to the process node if it hasn't already been done before. The `CalcJob` implementation wen around this mechanism though for the `remote_folder` and `retrieved` outputs. These outputs are created in the `upload_calculation` and `retrieve_calculation` methods of the `aiida.engine.daemon.execmanager`, respectively, and those same methods also create the link to the process node. To restore consistency, the `upload_calculation`/`retrieve_calculation` methods now return the node they created and it is the caller, the `task_upload_job` and `task_retrieve_job`, respectively, that now call `Process.out` for the returned node. At the end of those tasks, the process will change to a new state and the outputs will be "flushed" to the database by `update_outputs`. This change now also allows removing the "manual" handling of outputs in the `CalcJob.parse` call.
The `CalcJobMonitorResult` dataclass adds the attribute `outputs`. It takes a dictionary of nodes that the engine will attach to the calculation node to which the monitor is attached just as the outputs of a `Parser` would be.
9211f43
to
8c6e16a
Compare
The
CalcJobMonitorResult
dataclass adds the attributeoutputs
. It takes a dictionary of nodes that the engine will attach to the calculation node to which the monitor is attached just as the outputs of aParser
would be.