-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1.0 preparation of the uploading of the package
- Loading branch information
Showing
203 changed files
with
114,669 additions
and
1 deletion.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
graft imputegap | ||
include pyproject.toml | ||
include setup.py | ||
include README.md | ||
global-exclude *.py[cod] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,241 @@ | ||
Metadata-Version: 2.1 | ||
Name: imputegap | ||
Version: 0.1.0 | ||
Summary: A Library of Imputation Techniques for Time Series Data | ||
Home-page: https://github.com/eXascaleInfolab/ImputeGAP | ||
Author: Quentin Nater | ||
Author-email: [email protected] | ||
License: The Unlicense | ||
Project-URL: Documentation, https://github.com/eXascaleInfolab/ImputeGAP/tree/main/docs | ||
Project-URL: Source, https://github.com/eXascaleInfolab/ImputeGAP | ||
Classifier: Development Status :: 1 - Beta | ||
Classifier: Intended Audience :: Developers | ||
Classifier: Programming Language :: Python :: 3.8 | ||
Classifier: Topic :: Imputation | ||
Requires-Python: >= 3.12.0,<3.12.6 | ||
Description-Content-Type: text/markdown | ||
Requires-Dist: numpy==1.26.4 | ||
Requires-Dist: pandas==2.0.3 | ||
Requires-Dist: matplotlib==3.7.5 | ||
Requires-Dist: toml==0.10.2 | ||
Requires-Dist: scikit-learn==1.3.2 | ||
Requires-Dist: scipy==1.14.1 | ||
Requires-Dist: setuptools==75.1.0 | ||
Requires-Dist: tensorflow==2.17.0 | ||
Requires-Dist: shap==0.44.1 | ||
Requires-Dist: pycatch22==0.4.5 | ||
Requires-Dist: scikit-optimize==0.10.2 | ||
Requires-Dist: pyswarms==1.3.0 | ||
Requires-Dist: types-toml | ||
Requires-Dist: types-setuptools | ||
Requires-Dist: wheel | ||
|
||
![My Logo](assets/logo_imputegab.png) | ||
|
||
# Welcome to ImputeGAP | ||
ImputeGAP is a unified framework for imputation algorithms that provides a narrow-waist interface between algorithm evaluation and parameterization for datasets issued from various domains ranging from neuroscience, medicine, climate to energy. | ||
|
||
The interface provides advanced imputation algorithms, construction of various missing values patterns, and different evaluation metrics. In addition, the framework offers support for AutoML parameterization techniques, feature extraction, and, potentially, analysis of feature impact using SHAP. The framework should allow a straightforward integration of new algorithms, datasets, and metrics. | ||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
|
||
|
||
## Requirements | ||
In order to use **ImputeGAP**, you must have : | ||
* Python **3.12.0** or higher | ||
* Run your implementation on a **Unix-compatible environment**. | ||
<br><br> | ||
|
||
To install these two prerequisites, please refer to the following documentation: <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/docs/installation" >install requirements</a><br><br> | ||
|
||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
|
||
|
||
|
||
## Installation | ||
To install ImputeGAP locally, download the package from GitHub, move inside the folder. | ||
|
||
```bash | ||
$ git init | ||
$ git clone https://github.com/eXascaleInfolab/ImputeGAP | ||
$ cd ./ImputeGAP | ||
``` | ||
|
||
Then, once inside, run the command : | ||
|
||
```bash | ||
$ pip install -e . | ||
``` | ||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Datasets | ||
All datasets preconfigured in this library can be found here : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/naterq_skeleton_refac_3/imputegap/dataset" >link to datasets</a> | ||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
|
||
|
||
## Loading and pre-process | ||
The model of management is able to load any kind of time series datasets in text format that respect this condition :<br /><br /> | ||
<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*<br><br> | ||
|
||
### Example Loading | ||
```python | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="z_score") | ||
|
||
# [OPTIONAL] you can plot your raw data / print the information | ||
ts_1.plot(raw_data=ts_1.data, title="raw data", max_series=10, max_values=100, save_path="./assets") | ||
ts_1.print(limit=10) | ||
``` | ||
|
||
<br /><hr /><br /> | ||
|
||
|
||
|
||
## Contamination | ||
ImputeGAP allows to contaminate datasets with a specific scenario to reproduce a situation. Up to now, the scenarios are : <b>MCAR, MISSING POURCENTAGE, ...</b><br /> | ||
Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a><br><br> | ||
|
||
|
||
### Example Contamination | ||
```python | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="min_max") | ||
|
||
# 3. contamination of the data with MCAR scenario | ||
infected_data = ts_1.Contaminate.mcar(ts_1.data, series_impacted=0.4, missing_rate=0.2, use_seed=True) | ||
``` | ||
|
||
<br /><hr /><br /> | ||
|
||
## Imputation | ||
ImputeGAP proposes many algorithms of imputation categorized in families, such as : <b>Matrix Decomposition, Machine Learning, Regression, Pattern Recognition, Statistical metods, ...</b><br /> | ||
|
||
It is also possible de add your own algorithm. To do so, just follow the min-impute template and replace the logic by your code.<br /><br /> | ||
|
||
|
||
### Example Imputation | ||
```python | ||
from imputegap.recovery.imputation import Imputation | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="min_max") | ||
|
||
# 3. contamination of the data | ||
infected_data = ts_1.Contaminate.mcar(ts_1.data) | ||
|
||
# 4. imputation of the contaminated data | ||
# choice of the algorithm, and their parameters (default, automl, or defined by the user) | ||
cdrec = Imputation.MD.CDRec(infected_data) | ||
|
||
# imputation with default values | ||
cdrec.impute() | ||
# OR imputation with user defined values | ||
cdrec.impute(params={"rank": 5, "epsilon":0.01, "iterations": 100}) | ||
|
||
# [OPTIONAL] save your results in a new Time Series object | ||
ts_3 = TimeSeries().import_matrix(cdrec.imputed_matrix) | ||
|
||
# 5. score the imputation with the raw_data | ||
cdrec.score(ts_1.data, ts_3.data) | ||
``` | ||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Auto-ML | ||
ImputeGAP provides optimization techniques that automatically find the right hyperparameters for a specific algorithm in relation to a certain dataset. | ||
|
||
The optimizers available are : <b>Greedy Optimizer, Bayesian Optimizer, Particle Swarm Optimizer and Successive Halving</b>.<br /><br /> | ||
|
||
### Example Auto-ML | ||
```python | ||
from imputegap.recovery.imputation import Imputation | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="min_max") | ||
|
||
# 3. contamination of the data | ||
infected_data = ts_1.Contaminate.mcar(ts_1.data) | ||
|
||
# 4. imputation of the contaminated data | ||
# choice of the algorithm, and their parameters (default, automl, or defined by the user) | ||
cdrec = Imputation.MD.CDRec(infected_data) | ||
|
||
# imputation with AutoML which will discover the optimal hyperparameters for your dataset and your algorithm | ||
cdrec.impute = Imputation.MD.CDRec(infected_data).impute(user_defined=False, params={"ground_truth": ts_1.data, "optimizer": "bayesian", "options": {"n_calls": 5}}) | ||
|
||
# 5. score the imputation with the raw_data | ||
cdrec.score(ts_1.data, cdrec.imputed_matrix) | ||
``` | ||
|
||
|
||
<br /><hr /><br /> | ||
|
||
|
||
## Explainer | ||
ImputeGap provides you with an algorithm based on the SHAP library, which explains the results of your Imputations using features specific to your dataset.<br /><br /> | ||
|
||
### Example Explainer | ||
```python | ||
from imputegap.explainer.explainer import Explainer | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# load your data form ImputeGAP TimeSeries() | ||
ts_1 = TimeSeries() | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
|
||
# call the explanation of your dataset with a specific algorithm to gain insight on the Imputation results | ||
shap_values, shap_details = Explainer.shap_explainer(raw_data=ts_1.data, file_name="eeg", algorithm="cdrec") | ||
|
||
# [OPTIONAL] print the results with the impact of each feature. | ||
Explainer.print(shap_values, shap_details) | ||
``` | ||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Contributors | ||
Quentin Nater (<a href="mailto:[email protected]">[email protected]</a>) and Dr. Mourad Khayati (<a href="mailto:[email protected]">[email protected]</a>) | ||
|
||
<br /><br /> |
Oops, something went wrong.