Push after running `neuron_parallel_compile` #354

michaelbenayoun · 2023-11-24T12:05:01Z

This PR adds support for pushing to the cache repository on the Hugging Face Hub after using neuron_paralell_compile.

Basically, when running a training after doing neuron_parallel_compile, all the compilation files will be pushed to the Hub before starting actual training.

cc @5cp

HuggingFaceDocBuilderDev · 2023-11-24T12:07:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

optimum/neuron/utils/cache_utils.py

dacorvo · 2023-11-24T15:39:48Z

optimum/neuron/utils/misc.py

 from .require_utils import requires_safetensors


 logger = logging.get_logger()


+def should_log() -> bool:


This is very specific to neuronx_distributed, and not really related to logging. Can you move it under the neuronx_distributed directory, and choose a name that clearly states that this verifies we are in the process of the first worker (that's what I understood).

No it's linked to the overall torch.distributed, it used everywhere, like in the Trainer and everything. Will renaming it.

dacorvo

Maybe change the name, otherwise LGTM ...

dacorvo · 2023-11-24T16:33:27Z

optimum/neuron/utils/misc.py

@@ -45,7 +45,7 @@
 logger = logging.get_logger()


-def should_log() -> bool:
+def should_current_worker_log() -> bool:


It still has nothing to do with log. I would call it is_main_worker, wdyt ?

Alright, doing it.

michaelbenayoun added 8 commits November 24, 2023 13:03

Fix

223a07b

Add comment

8b27717

fix

bd58065

Cleanup after precompilation

7bc11be

Trigger CI

6d7c587

Apply suggestions

1e34286

Use the same default as the Neuron compiler

f055268

Adding support for pushing after doing neuron_parallel_compile

14d774c

michaelbenayoun added 4 commits November 24, 2023 15:12

Fix

934619c

Merge branch 'main' into push_with_neuron_parallel_compile

c09a96d

Handle logging

e86518c

Working

3022de0

michaelbenayoun requested review from dacorvo and JingyaHuang November 24, 2023 15:21

dacorvo reviewed Nov 24, 2023

View reviewed changes

michaelbenayoun added 2 commits November 24, 2023 17:30

Fix

4441d51

Small refactor

c25d681

dacorvo approved these changes Nov 24, 2023

View reviewed changes

michaelbenayoun added 2 commits November 24, 2023 17:41

Added backward compatibility

233d56a

Rename

dc6293e

michaelbenayoun merged commit a8ae151 into main Nov 24, 2023

michaelbenayoun deleted the push_with_neuron_parallel_compile branch November 24, 2023 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push after running `neuron_parallel_compile` #354

Push after running `neuron_parallel_compile` #354

michaelbenayoun commented Nov 24, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 24, 2023

dacorvo Nov 24, 2023

michaelbenayoun Nov 24, 2023 •

edited

Loading

dacorvo left a comment

dacorvo Nov 24, 2023

michaelbenayoun Nov 24, 2023

Push after running neuron_parallel_compile #354

Push after running neuron_parallel_compile #354

Conversation

michaelbenayoun commented Nov 24, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Nov 24, 2023

dacorvo Nov 24, 2023

Choose a reason for hiding this comment

michaelbenayoun Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

dacorvo Nov 24, 2023

Choose a reason for hiding this comment

michaelbenayoun Nov 24, 2023

Choose a reason for hiding this comment

Push after running `neuron_parallel_compile` #354

Push after running `neuron_parallel_compile` #354

michaelbenayoun commented Nov 24, 2023 •

edited

Loading

michaelbenayoun Nov 24, 2023 •

edited

Loading