All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.0.0 - 2024-10-14
- The
Dependency
object now takes an optional parameterbinary_dependencies
to specify binary packages to be installed in the computation container. (#249)
- cuda base docker image is now
nvidia/cuda:12.6.1-runtime-ubuntu24.04
(#248) - Remove parasite verisons of
setuptools
in Dockerfiles and installsetuptools>70.0.0
to tackle last identified CVEs (#250)
- Bump NumPy and pytorch versions in tests. (#252)
- Drop Python 3.9 support. (#247)
0.47.0 - 2024-09-12
- Python 3.12 support (#226)
- Add Docker GPU base image, activated through the
Dependency
object with the variableuse_gpu=True
. The Docker image used isnvidia/cuda:11.8.0-runtime-ubuntu22.04
. (#227)
- BREAKING: change
use_gpu
todiasble_gpu
in allTorchAlgo
. The device is set tocpu
is no GPU are available or ifdisable_gpu
is set toTrue
. You must inverse the boolean in your code to keep the same behaviour (diasble_gpu == not use_gpu
). (#241) - Remove packages named
build-essential
and*-dev
after building dependencies to decrease CVE (#242)
-
Add a non-root user to the generated Dockerfile for the compute functions.
Compute pods were already running as non-root (ensured by a security context in the backend), we are making it more explicit here. (#228)
-
Added
subprocess_only
tag to prevent simulation mode tests to run in remote mode. (#229) -
Bump pytorch version to 2.2.1 in tests. (#230)
-
Bump NumPy version to 1.26.4 in tests. (#231)
-
Actually trigger the GPU docker configuration with
use_gpu
flag when running Camelyon benchmark (#244) -
Use Tensor.cpu() to copy the tensor to host memory first in Camelyon benchmark (#245)
0.46.0 - 2024-06-03
- Add
apt update
to docker user images to limit vulnerabilities. (#213)
0.45.0 - 2024-03-27
-
- New CLI arguments to Camelyon benchmark (
--torch-gpu
and--cp-name
) (#201)
- New CLI arguments to Camelyon benchmark (
-
- Apply changes from breaking PR on Substra (#405(Substra/substra#405)) (#202)
-
- Depreciate
setup.py
in favour ofpyproject.toml
(#204)
- Depreciate
0.44.0 - 2024-03-07
- Add documentation on how to change SubstraFL log level (#194)
- Add the
simulate_experiment
function, that will execute theCompute Plan
in RAM only. It returns Python objects containing the computedPerformances
and the saved intermediateStates
. More information about this feature is available in docstrings (#184).
Example of usage:
from substrafl.experiment import simulate_experiment
scores, intermediate_state_train, intermediate_state_agg = simulate_experiment(
client=my_substra_client,
strategy=my_strategy,
train_data_nodes=train_data_nodes,
evaluation_strategy=my_eval_strategy,
aggregation_node=aggregation_node,
clean_models=False,
num_rounds=NUM_ROUNDS,
)
- BREAKING: rename
datasamples
todata_from_opener
(#193) - Bump documentation dependencies to Sphinx 7.2.6 (#195)
- The predict task does not exist anymore. The evaluation of a model is done in a single task #177
Strategy
implement anevaluate
method, with the@remote_data
decorator, to compute the evaluation of the model. Theevaluate
method is the same for all strategies #177- BREAKING: the
perform_predict
method ofStrategy
changed in favor ofperform_evaluation
that calls the newevaluate
method #177 - BREAKING:
metric_functions
are now passed to theStrategy
instead of theTestDataNode
#177 - BREAKING: the
predict
method ofAlgo
has no@remote_data
decorator anymore. It signatures does not takeprediction_path
anymore, and the predictions are return by the method #177 - Abstract base class
Node
is replaced byProtocols
, defined insubstrafl.nodes.protocol.py
(#185) - BREAKING: rename
test_data_sample_keys
,test_tasks
andregister_test_operations
,tasks
todata_sample_keys
andregister_operations
inTestDataNodes
(#185) - BREAKING:
InputIdentifiers
andOutputIdentifiers
move fromsubstrafl.nodes.node
tosubstrafl.nodes.schemas
(#185) - Switch to python-slim as base image, instead of substra-tools (#197)
- Dropped support for Python 3.8 (#200)
- Numerical stability of the
NewtonRaphson
strategy is improved by symmetrizing the Hessian (#196)
0.43.0 - 2024-02-26
- Renamed
function
field of Substra Function pydantic model toarchive
(#181)
- Update schemas and tests to remove Pydantic v2 warnings (#183)
0.42.0 - 2023-10-18
- Support on Python 3.11 (#169)
- Remove substrafl wheel cache (#175)
- Camelyon benchmark download files (#182)
0.41.1 - 2023-10-06
- Fix Newton-Raphson docstring (#170)
0.41.0 - 2023-09-08
- Update to pydantic 2.3.0 (#159)
0.40.0 - 2023-09-07
-
Check the Python version used before generating the Dockerfile (#155).
-
Python dependencies can be resolved using pip compile during function registration by setting
compile
toTrue
in theDependency
object (#155).Dependency( pypi_dependencies=["pytest", "numpy"], compile=True, )
-
Dependency objects are now computed at initialization in a cache directory, accessible through the
cache_directory
attribute. The cache directory is deleted at the Dependency object deletion. (#155) -
Check created wheels name. (#160)
- BREAKING: Rename
generate_wheel.py
tomanage_dependencies.py
(#156) - BREAKING: Move
manage_dependencies.py
fromremote.register
todependency
(#158) - BREAKING:
local_dependencies
is renamedlocal_installable_dependencies
(#158) - BREAKING: local_installable_dependencies are now limited to local modules or Python wheels (no support for bdist, sdist...) (#155).
- Set, save & load
random.seed
andnp.random.seed
along withtorch.manual_seed
inTorchAlgo
(#151) - Keep the last round task output by default (#162)
0.39.0 - 2023-07-25
- BREAKING: Input and output of aggregate tasks are now
shared_state
. It provides more flexibility to link different type of tasks with each other. To usedownload_aggregate_shared_state
on experiments launched before this commit, you can use the following code as a replacement of the function (#142).
import tempfile
from substrafl.model_loading import _download_task_output_files
from substrafl.model_loading import _load_from_files
with tempfile.TemporaryDirectory() as temp_folder:
_download_task_output_files(
client=<client>,
compute_plan_key=<compute_plan_key>,
dest_folder=temp_folder,
round_idx=<round_idx>,
rank_idx=<rank_idx>,
task_type="aggregate",
identifier="model",
)
aggregated_state = _load_from_files(input_folder=temp_folder, remote=True)
- Function
wait
inutils
. You can usesubstra.Client.wait_task
&substra.Client.wait_compute_plan
instead. (#147)
- Compatibility with GPU devices when running torch based experiments (#154)
- Pin
pydantic
to>=1.9.0
&<2.0.0
aspydantic
v2.0.0
has been released with a lot of non backward compatible changes. (#148)
0.38.0 - 2023-06-27
- BREAKING: Rename
model_loading.download_shared_state
tomodel_loading.download_train_shared_state
(#143) - BREAKING: Rename
model_loading.download_aggregated_state
tomodel_loading.download_aggregate_shared_state
(#143) - Numpy < 1.24 in dependencies to keep pickle compatibility with substra-tools numpy version (#144)
0.37.0 - 2023-06-12
- ComputePlanBuilder base class to define which method are needed to implement a custom strategy in SubstraFL.
These methods are
build_compute_plan
,load_local_states
andsave_local_states
. #120 - Check and test on string used as metric name in test data nodes (#122).
- Add default exclusion patterns when copying file to avoid creating large Docker images (#118)
- Add the possibility to force the Dependency editable_mode through the environment variable SUBSTRA_FORCE_EDITABLE_MODE (#131)
-
BREAKING: depreciate the usage of
model_loading.download_algo_files
andmodel_loading.load_algo
functions. New utils functions are now available. (#125)model_loading.download_algo_state
to download a SubstraFL algo of a given round or rank.model_loading.download_shared_state
to download a SubstraFL shared object of a given round or rank.model_loading.download_aggregated_state
to download a SubstraFL aggregated of a given round or rank. The API change goes from:algo_files_folder = str(pathlib.Path.cwd() / "tmp" / "algo_files") download_algo_files( client=client_to_download_from, compute_plan_key=compute_plan.key, round_idx=round_idx, dest_folder=algo_files_folder, ) model = load_algo(input_folder=algo_files_folder).model
to
algo = download_algo_state( client=client_to_download_from , compute_plan_key=compute_plan.key, round_idx=round_idx, ) model = algo.model
-
BREAKING: rename
build_graph
tobuild_compute_plan
. (#120) -
BREAKING: move
schema.py
into thestrategy
module. (#120)from substrafl.schemas import FedAvgSharedState # Become from substrafl.strategies.schemas import FedAvgSharedState
-
Way to copy function files (#118)
-
download_train_task_models_by_rank
uses new functionlist_task_output_assets
instead of usingvalue
that has been removed (#129)
- New dependencies copy method in Docker mode.(#130)
0.36.0 - 2023-05-11
- Close issue #114. Large batch size are set to the number of samples in predict for NR and FedPCA. (#115)
-
BREAKING: Metrics are now given as
metric_functions
and not asmetric_key
. The functions given as metric functions to test data nodes are automatically registered in a new Substra function by SubstraFL. (#117). The new argument of the TestDataNode classmetric_functions
replaces themetric_keys
one and accepts a dictionary (using the key as the identifier of the function given as value), a list of functions or directly a function if there is only one metric to compute (function.__name__
is then used as identifier). Installed dependencies are thealgo_dependencies
passed toexecute_experiment
, and permissions are the same as the predict function.From a user point of view, the metric registration changes from:
def accuracy(datasamples, predictions_path): y_true = datasamples["labels"] y_pred = np.load(predictions_path) return accuracy_score(y_true, np.argmax(y_pred, axis=1)) metric_deps = Dependency(pypi_dependencies=["numpy==1.23.1", "scikit-learn==1.1.1"]) permissions_metric = Permissions(public=False, authorized_ids=DATA_PROVIDER_ORGS_ID) metric_key = add_metric( client=client, metric_function=accuracy, permissions=permissions_metric, dependencies=metric_deps, ) test_data_nodes = [ TestDataNode( organization_id=org_id, data_manager_key=dataset_keys[org_id], test_data_sample_keys=[test_datasample_keys[org_id]], metric_keys=[metric_key], ) for org_id in DATA_PROVIDER_ORGS_ID ]
to:
def accuracy(datasamples, predictions_path): y_true = datasamples["labels"] y_pred = np.load(predictions_path) return accuracy_score(y_true, np.argmax(y_pred, axis=1)) test_data_nodes = [ TestDataNode( organization_id=org_id, data_manager_key=dataset_keys[org_id], test_data_sample_keys=[test_datasample_keys[org_id]], metric_functions={"Accuracy": accuracy}, ) for org_id in DATA_PROVIDER_ORGS_ID ]
-
Enforce kwargs for user facing function with more than 3 parameters (#109)
-
Remove references to
composite
. Replace bytrain_task
. (#108)
- Add the Federated Principal Component Analysis strategy (#97)
0.35.1 - 2023-04-11
- Change order of layers in the Dockerfile: files are copied as needed before the installation layers, and the final copy is made last. (#110)
0.35.0 - 2023-03-31
- Initialization task to each strategy in SubstraFL. (#89)
This allows to load the Algo
and all its attributes to the platform before any training? Once on the platform, we can perform a testing task before any training.
This init task consists in submitting an empty function, coded in the BaseAlgo
class.
@remote
def initialize(self, shared_states):
return
The init task return a local
output that will be passed as input to a test task, and to the first train task.
The graph pass from:
flowchart LR
TrainTask1_round0--Local-->TestTask1_r0
TrainTask1_round0--Shared-->TestTask1_r0
TrainTask2_round0--Shared-->AggregateTask
TrainTask2_round0--Local-->TestTask2_r0
TrainTask2_round0--Shared-->TestTask2_r0
AggregateTask--Shared-->TrainTask1_r1
TrainTask1_round0--Local-->TrainTask1_r1
AggregateTask--Shared-->TrainTask2_r1
TrainTask2_round0--Local-->TrainTask2_r1
TrainTask1_round0--Shared-->AggregateTask
TrainTask1_r1--Local-->TestTask1_r1
TrainTask1_r1--Shared-->TestTask1_r1
TrainTask2_r1--Local-->TestTask2_r1
TrainTask2_r1--Shared-->TestTask2_r1
to:
flowchart LR
InitTask1_round0--Local-->TestTask1_r0
InitTask2_round0--Local-->TestTask2_r0
InitTask1_round0--Local-->TrainTask1_r1
InitTask2_round0--Local-->TrainTask2_r1
TrainTask2_r1--Shared-->AggregateTask
TrainTask1_r1--Shared-->AggregateTask
TrainTask1_r1--Local-->TestTask1_r1
TrainTask2_r1--Local-->TestTask2_r1
TrainTask1_r1--Local-->TrainTask1_r2
TrainTask2_r1--Local-->TrainTask2_r2
AggregateTask--Shared-->TrainTask1_r2
AggregateTask--Shared-->TrainTask2_r2
TrainTask1_r2--Local-->TestTask1_r2
TrainTask2_r2--Local-->TestTask2_r2
- BREAKING:
algo
are now passed as parameter to thestrategy
and not toexecute_experiement
anymore (#98) - BREAKING A
strategy
need to implement a new methodbuild_graph
to build the graph of tasks to be execute inexecute_experiment
(#98) - BREAKING:
predict
method ofstrategy
has been renamed toperform_predict
(#98) - Test tasks don't take a
shared
as input anymore (#89) - BREAKING: change
eval_frequency
default value to None to avoid confusion with hidden default value (#91) - BREAKING: rename Algo to Function (#82)
- BREAKING: clarify
EvaluationStrategy
arguments: changerounds
toeval_frequency
andeval_rounds
(#85) - replace
schemas.xxx
bysubstra.schemas.xxx
(#105)
- BREAKING: Given local code dependencies are now copied to the level of the running script systematically (#99)
- Docker images are pruned in main check of Github Action to free disk space while test run (#102)
- Pass
aggregation_lr
to the parent class for Scaffold. Fix issue 103 (#104)
from substra import schemas
inaggregation_node.py
,test_data_node.py
andtrain_data_node.py
(#105)
0.34.0 - 2023-02-20
- Possibility to test on an organization where no training has been performed (#74)
- Add contributing, contributors & code of conduct files (#68)
- Test only field for datasamples (#67)
- Remove RemoteDataMethod and change RemoteMethod class to be fully flexible regarding function name. The substra-tools methods is now generic, and load the inputs depending on the inputs dictionary content (#59)
- BREAKING: rename tuple to task (#79)
0.33.0 - 2022-12-19
- test: add Github Action to run subprocess tests on Windows after each merge (#60)
- test: pass the CI e2e tests on Python 3.10 (#56)
-
fix: bug introduced with numpy 1.24 and cloudpickle: TypeError: __generator_ctor(). Remove version from requirements. (Issue open)
-
fix: bug introduced with numpy 1.24 and cloudpickle: TypeError: __generator_ctor(). Remove version from requirements.
0.32.0 - 2022-11-22
-
The metric registration is simplified. The user can now directly write a score function within their script, and directly register it by specifying the right dependencies and permissions. The score function must have
(datasamples, predictions_path)
as signature. (#47)Example of new metric registration:
metric_deps = Dependency(pypi_dependencies=["numpy==1.23.1"]) permissions_metric = Permissions(public=True) def mse(datasamples, predictions_path): y_true = datasamples["target"] y_pred = np.load(predictions_path) return np.mean((y_true - y_pred)**2) metric_key = add_metric( client=substra_client, permissions=permissions_metric, dependencies=metric_deps, metric_function=mse, )
-
doc on the model loading page (#40)
-
The round 0 is now exposed. Possibility to evaluate centralized strategies before any training (FedAvg, NR, Scaffold). The round 0 is skipped for single org strategy and cannot be evaluated before training (#46)
- Github actions on Ubuntu 22.04 (#52)
- torch algo: test that
with_batch_norm_parameters
is only about the running mean and variance of the batch norm layers (#30) - torch algo:
with_batch_norm_parameters
- also take into account thetorch.nn.LazyBatchNorm{x}d
layers (#30) - chore: use the generic task (#31)
- Apply changes from algo to function in substratools (#34)
- add
tools_functions
method toRemoteDataMethod
andRemoteMethod
to return the function(s) to send totools.execute
.
- add
- Register functions in substratools using decorator
@tools.register
(#37) - Update substratools Docker image (#49)
- Fix python 3.10 compatibility by catching OSError for Notebooks (#51)
- Free disk space in main github action to run the CI (#48)
- local dependencies are installed in one
pip
command to optimize the installation and avoid incompatibilities error (#39) - Fix error when installing current package as local dependency (#41)
- Fix flake8 repo for pre-commit (#50)
0.31.0 - 2022-10-03
- algo category from algo as it is not required by substra anymore
- documentation of the
predict
function of Algos was not up to date (#33)
0.30.0 - 2022-09-26
- Return statement of both
predict
and_local_predict
methods from Torch Algorithms.
- Update the Client, it takes a backend type instead of debug=True + env variable to set the spawner - (#210)
- Do not use Model.category since this field is being removed from the SDK
- Update the tests and benchmark with the change on Metrics from substratools (#24)
- NOTABLE CHANGES due to breaking changes in substra-tools:
- the opener only exposes
get_data
andfake_data
methods - the results of the above method is passed under the
datasamples
keys within theinputs
dict arg of all tools methods (train, predict, aggregate, score) - all method (train, predict, aggregate, score) now takes a
task_properties
argument (dict) in addition toinputs
andoutputs
- The
rank
of a task previously passed under therank
key within the inputs is now given in thetask_properties
dict under therank
key
- the opener only exposes
This means that all opener.py
file should be changed from:
import substratools as tools
class TestOpener(tools.Opener):
def get_X(self, folders):
...
def get_y(self, folders):
...
def fake_X(self, n_samples=None):
...
def fake_y(self, n_samples=None):
...
to:
import substratools as tools
class TestOpener(tools.Opener):
def get_data(self, folders):
...
def fake_data(self, n_samples=None):
...
This also implies that metrics
has now access to the results of get_data
and not only get_y
as previously. The user should adapt all of his metrics
file accordingly e.g.:
class AUC(tools.Metrics):
def score(self, inputs, outputs):
"""AUC"""
y_true = inputs["y"]
...
def get_predictions(self, path):
return np.load(path)
if __name__ == "__main__":
tools.metrics.execute(AUC())
could be replace with:
class AUC(tools.Metrics):
def score(self, inputs, outputs, task_properties):
"""AUC"""
datasamples = inputs["datasamples"]
y_true = ... # getting target from the whole datasamples
def get_predictions(self, path):
return np.load(path)
if __name__ == "__main__":
tools.metrics.execute(AUC())
- BREAKING CHANGE:
train
andpredict
method of all substrafl algos now takesdatasamples
as argument instead ofX
abdy
. This is impacting the user code only if he or she overwrite those methods instead of using the_local_train
and_local_predict
methods. - BREAKING CHANGE: The result of the
get_data
method from the opener is automatically provided to the givendataset
as__init__
arg instead ofx
andy
within thetrain
andpredict
methods of allTorch*Algo
classes. The userdataset
should be adapted accordingly e.g.:
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, x, y, is_inference=False) -> None:
...
class MyAlgo(TorchFedAvgAlgo):
def __init__(
self,
):
torch.manual_seed(seed)
super().__init__(
model=my_model,
criterion=criterion,
optimizer=optimizer,
index_generator=index_generator,
dataset=MyDataset,
)
should be replaced with
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, datasamples, is_inference=False) -> None:
...
class MyAlgo(TorchFedAvgAlgo):
def __init__(
self,
):
torch.manual_seed(seed)
super().__init__(
model=my_model,
criterion=criterion,
optimizer=optimizer,
index_generator=index_generator,
dataset=MyDataset,
)
0.29.0 - 2022-09-19
- Use the new Substra SDK feature that enable setting the
transient
flag on tasks instead ofclean_models
on compute plans to remove intermediary models.
0.28.0 - 2022-09-12
- Throw an error if
pytorch 1.12.0
is used. There is a regression bug intorch 1.12.0
, that impacts optimizers that have been pickled and unpickled. This bug occurs for Adam optimizer for example (but not for SGD). Here is a link to one issue covering it: pytorch/pytorch#80345
-
Removing
classic-algos
from the benchmark dependencies -
NOTABLE CHANGES due to breaking changes in substra-tools: the user must now pass the method name to execute from the tools defined class within the dockerfile of both
algo
andmetric
under the--method-name
argument:ENTRYPOINT ["python3", "metrics.py"]
shall be replaced by:
ENTRYPOINT ["python3", "metrics.py", "--method-name", "score"]
-
Use the new Susbtra sdk features that return the path of the downloaded file. Change the
model_loading.py
implementation and the tests.
- In the PyTorch algorithms, move the data to the device (GPU or CPU) in the training loop and predict function so that the user does not need to do it.
- Rename connect-tools docker images to substra-tools
- Benchmark:
- use public data hosted on Zenodo for the benchmark
- Fix the GPU test to the last breaking changes, and unskip the
use_gpu=False
case
- Update the NpIndexGenerator docstrings to add information how to use it as a full epoch index generator.
- BREAKING CHANGES:
- an extra argument
predictions_path
has been added to bothpredict
and_local_predict
methods from all*TorchAglo
classes. The user now have to use the_save_predictions
method to save its predictions in_local_predict
. The user defined metrics will load those saved prediction withnp.load(inputs['predictions'])
. The_save_predictions
method can be overwritten.
- an extra argument
Default _local_predict
method from substrafl algorithms went from:
def _local_predict(self, predict_dataset: torch.utils.data.Dataset):
if self._index_generator is not None:
predict_loader = torch.utils.data.DataLoader(predict_dataset, batch_size=self._index_generator.batch_size)
else:
raise BatchSizeNotFoundError(
"No default batch size has been found to perform local prediction. "
"Please overwrite the _local_predict function of your algorithm."
)
self._model.eval()
predictions = torch.Tensor([])
with torch.inference_mode():
for x in predict_loader:
predictions = torch.cat((predictions, self._model(x)), 0)
return predictions
to
def _local_predict(self, predict_dataset: torch.utils.data.Dataset, predictions_path: Path):
if self._index_generator is not None:
predict_loader = torch.utils.data.DataLoader(predict_dataset, batch_size=self._index_generator.batch_size)
else:
raise BatchSizeNotFoundError(
"No default batch size has been found to perform local prediction. "
"Please overwrite the _local_predict function of your algorithm."
)
self._model.eval()
predictions = torch.Tensor([])
with torch.inference_mode():
for x in predict_loader:
predictions = torch.cat((predictions, self._model(x)), 0)
self._save_predictions(predictions, predictions_path)
return predictions
- NOTABLE CHANGES due to breaking changes in connect-tools.
- both
load_predictions
andget_predictions
methods have been removed from the opener - the user defined
metrics
now takesinputs
andoutputs
as argument.inputs
is a dict containing:rank
: inty
: the result ofget_y
applied to the task datasamplespredictions
: a file path where the output predictions of the user defined algo has been saved. As stated above, those predictions can be load thanks tonp.load
if the user didn't overwrite the_save_predictions
methods from substrafl defined*Algo
.
outputs
is a dict containing:performance
: a file path where to save the result of the metrics. It must be done through thetools.save_performance
function.
- both
Instead of:
import substratools as tools
from sklearn.metrics import roc_auc_score
class AUC(tools.MetricAlgo):
def score(self, y_true, y_pred):
"""AUC"""
metric = roc_auc_score(y_true, y_pred) if len(set(y_true)) > 1 else 0
return float(metric)
if __name__ == "__main__":
tools.algo.execute(AUC())
the metric files should look like:
import numpy as np
import substratools as tools
from sklearn.metrics import roc_auc_score
class AUC(tools.MetricAlgo):
def score(self, inputs, outputs):
"""AUC"""
y_pred = np.load(inputs["predictions"])
y_true = inputs["y"]
metric = roc_auc_score(y_true, y_pred) if len(set(y_true)) > 1 else 0
tools.save_performance(float(metric), outputs["performance"])
if __name__ == "__main__":
tools.algo.execute(AUC())
- Documentation for the
_skip
argument from the_local_predict
and_local_train
methods ofTorch*Algo
.
- Update the inputs/outputs to make them compatible with the task execution
- GPU execution: move the RNG state to CPU in case the checkpoint has been loaded on the GPU
- fix: rng state for torch algos. Add test for both stability between organizations and rounds.
- feat:
_local_predict
has been re added - feat: add default batching to
predict
- BREAKING CHANGE: drop Python 3.7 support
- BREAKING CHANGE: the library is now named "substrafl"
- feat: add compute task inputs
- fix: support several items in the
Dependency
-local_dependencies
field
- feat: add compute task output
- BREAKING CHANGE: add the torch Dataset as argument of TorchAlgo to preprocess the data
The
_init_
function of the dataset must contain (self, x, y, is_inference). The__getitem__
function is expected to return x, y if is_inference is False, else x. This behavior can be changed by re-writing the_local_train
orpredict
methods._local_train
is no longer mandatory to overwrite any more. Its signature passed from(x, y)
to(train_dataset)
_local_predict
has been deleted._get_len_from_x
has been deleted.
- feat: the compute plan tasks are uploaded to Connect using the auto-batching feature (it should solve gRPC message errors for large compute plans)
- BREAKING CHANGE: convert (test task) to (predict task + test task)
-
Added functions to download the model of a strategy :
-
The function
substrafl.model_loading.download_algo_files
downloads the files needed to load the output model of a strategy according to the given round. These files are downloaded to the given folder. -
The
substrafl.model_loading.load_algo
function to load the output model of a strategy from the files previously downloaded via the the functionsubstrafl.model_loading.download_algo_files
.
Those two functions works together:
download_algo_files(client=substra_client, compute_plan_key=key, round_idx=None, dest_folder=session_dir) model = load_algo(input_folder=session_dir)
-
- compatibility with substra 0.28.0
- feat: Newton Raphson strategy
- added packaging to the install requirements
- Stop using metrics APIs, use algo APIs instead
- BREAKING CHANGE: Strategy rounds starts at
1
and initialization round is now0
. It used to start at0
and the initialization round was-1
For each composite train tuple, aggregate tuple and test tuple the meta dataround_idx
has changed accordingly to the rule stated above. - BREAKING CHANGE: rename node to organization in Connect
- Rename the
OneNode
strategy toSingleOrganization
- when using the
TorchScaffoldAlgo
:- The number of time the
_scaffold_parameters_update
method must be called within the_local_train
method is now checked - A warning is thrown if an other optimizer than
SGD
- If multiple learning rates are set for the optimizer, a warning is thrown and the smallest learning rate is used for
the shared state aggregation operation.
0
is not considered as a learning rate for this choice as it could be used to deactivate the learning process of certain layers from the model.
- The number of time the
- BREAKING CHANGE: add initialization round to centralized strategies :
- Each centralized strategy starts with an initialization round composed of one composite train tuple on each train data node
- One round of a centralized strategy is now:
Aggregation
->Training on composite
- Composite train tuples before test tuples have been removed
- All torch algorithm have now a common
predict
method - The
algo
argument has been removed from thepredict
method of all strategies - The
fake_traintuple
attribute of theRemoteStruct
class has been removed
The full discussion regarding this feature can be found here
-
feat: meaningful name for algo . You can use the
_algo_name
parameter to set a custom algo name for the registration. By default, it is set tomethod-name_class-name
.algo.train( node.data_sample_keys, shared_state=self.avg_shared_state, _algo_name=f"Training with {algo.__class__.__name__}", )
- chore: add latest connect-tools docker image selection
- Torch algorithms now support GPUs, there is a parameter
use_gpu
in the__init__
of the Torch algo classes. Ifuse_gpu
is True and there is no GPU detected, the code runs on CPU.
- The wheels of the libraries installed with
editable=True
are now in$HOME/.substrafl
instead of$LIB_PATH/dist
- benchmark:
make benchmark
runs the default remote benchmark on the connect platform specified in the config filemake benchmark-local
runs the default local benchmark in subprocess mode
-
BREAKING CHANGE: replace "tag" argument with "name" in execute_experiment
-
execute_experiment
checks that the algo and strategy are compatible. You can override the list of strategies the algo is compatible with using thestrategies
property :from substrafl.algorithms.algo import Algo from substrafl import StrategyName class MyAlgo(Algo): @property def strategies(self): return [StrategyName.FEDERATED_AVERAGING, StrategyName.SCAFFOLD] # ...
- feat: the compute plan key of the experiment is saved in the experiment summary before submitting or executing it
- feat: add the possibility for the user to pass additional metadata to the compute plan metadata
- Force the reinstallation of connect-tools in the Docker image, necessary for the editable mode
-
BREAKING CHANGE: the default value of
drop_last
in theNpIndexGenerator
is now False -
BREAKING CHANGE: the index generator is now required when implementing a strategy
from substrafl.index_generator import NpIndexGenerator nig = NpIndexGenerator( batch_size=batch_size, num_updates=num_updates, drop_last=False, # optional, defaults to False shuffle=True, # optional, defaults to True ) class MyAlgo(TorchFedAvgAlgo): def __init__(self): super().__init__( index_generator=nig, # other parameters ) # ...
-
The user can now initialize his
TorchAlgo
function with custom parameters (only primitive types are supported) :class MyAlgo(TorchFedAvgAlgo): def __init__(self, my_arg): super.__init__( model=model, criterion=criterion optimizer=optimizer, index_generator=nig, my_arg=my_arg, # This is necessary ) # ...
- Fix the format of the asset ids: the right format is
str(uuid.uuid4())
and notuuid.uuid4().hex
- feat: rename "compute_plan_tag" to "tag" #131
- feat: Add the optional argument "compute_plan_tag" to give the user the possibility to choose its own tag (timestamp by default) #128
- feat: Scaffold strategy
- feat: add one node strategy
- The Connect tasks have a
round_idx
attribute in their metadata - doc: add python api to documentation
- API documentation: fix the docstrings and the display of the documentation for some functions
- (BREAKING CHANGE) FedAvg strategy: the train function must return a FedAvgSharedState, the average function returns a FedAvgAveragedState. No need to change your code if you use TorchFedAvgAlgo
- benchmark:
- Use the same batch sampler between the torch and Substrafl examples
- Make it work with
num_workers
> 0 - Explain the effect of the sub-sampling
- Update the default benchmark parameters in
benchmarks.sh
- Add new curves to the plotting: when one parameter changes while the others stay the same
- Use connect-tools 0.10.0 as a base image for the Dockerfile
- fix: naming changed from FedAVG to FedAvg
- fix: log a warning if an existing wheel is used to build the docker image
- fix:
execute_experiment
has no side effects on its arguments - fix:
Dependency.local_package
are installed in no editable mode and additionally acceptspyproject.yaml
as configuration file - fix:
execute_experiment
acceptsNone
asevaluation_strategy
- fix: The
substrafl.algorithms.algo.Algo
abstractmethod
decorator is now taken into account
- feat:
EvaluationStrategy
can now be reinitialized - Refactoring
substrafl.algorithms.pytorch.fed_avg.TorchFedAvgAlgo
:- replace the
_preprocess
and_postprocess
functions by_local_train
and_local_predict
- the user can override the
_get_len_from_x
function to get the number of samples in the dataset from x batch_size
is now a required argument, and a warning is issued if it is None
- replace the
- The
substrafl.index_generator.np_index_generator.NpIndexGenerator
class now works withtorch.utils.data.DataLoader
, withnum_workers
> 0 - The benchmark uses
substrafl.algorithms.pytorch.fed_avg.TorchFedAvgAlgo
instead of its own custom algorithm - Add the
clean_models
option to theexecute_experiment
function
- feat: make a base class for the index generator and document it
- The
Algo
now exposes amodel
property to get the model after downloading it from Connect - (BREAKING CHANGE) experiment summary is saved as a json in
experiment_folder
- fix: notebook dependency failure You can now run a substrafl experiment with local dependencies in a Jupyter notebook
-
feat: models can now be tested every n rounds, on the same nodes they were trained on This feature introduces a new parameter
evaluation_strategy
inexecute_experiment
, which takes anEvaluationStrategy
instance fromsubstrafl.evaluation_strategy
. If this parameter is not given, performance will not be measured at all (previously, it was measured at the end of the experiment by default). -
feat: install substrafl from pypi
- fix: Update pydantic version to enable autocompletion
- feat: Add a FL algorithm wrapper in PyTorch for the federated averaging strategy
- test: connect-test integration
- feat: Add a possibility to test an algorithm on selected rounds or every n rounds
- fix: dependency management: the
local_code
dependencies are copied to the same folder structure relatively to the algo - fix: dependency management - it failed when resolving the
local_code
dependencies because the path to the algo was relative
- feat: batch indexer
- feat: more logs + function to set the logging level
- Subprocess mode is now faster as it fully reuses the user environment instead of re building the connect related parts (substra #119 and #63)
- fix: error message for local dependency
- feat: User custom dependencies
- feat: support substra subprocess mode
- first release