Skip to content

Commit

Permalink
1.0 preparation of the uploading of the package
Browse files Browse the repository at this point in the history
  • Loading branch information
qnater committed Oct 11, 2024
1 parent d509077 commit 9c3dde1
Show file tree
Hide file tree
Showing 203 changed files with 114,669 additions and 1 deletion.
Binary file added dist/.tmp-q2k_ybj1/imputegap-0.1.0.tar.gz
Binary file not shown.
5 changes: 5 additions & 0 deletions imputegap-0.1.0/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
graft imputegap
include pyproject.toml
include setup.py
include README.md
global-exclude *.py[cod]
241 changes: 241 additions & 0 deletions imputegap-0.1.0/PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
Metadata-Version: 2.1
Name: imputegap
Version: 0.1.0
Summary: A Library of Imputation Techniques for Time Series Data
Home-page: https://github.com/eXascaleInfolab/ImputeGAP
Author: Quentin Nater
Author-email: [email protected]
License: The Unlicense
Project-URL: Documentation, https://github.com/eXascaleInfolab/ImputeGAP/tree/main/docs
Project-URL: Source, https://github.com/eXascaleInfolab/ImputeGAP
Classifier: Development Status :: 1 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Imputation
Requires-Python: >= 3.12.0,<3.12.6
Description-Content-Type: text/markdown
Requires-Dist: numpy==1.26.4
Requires-Dist: pandas==2.0.3
Requires-Dist: matplotlib==3.7.5
Requires-Dist: toml==0.10.2
Requires-Dist: scikit-learn==1.3.2
Requires-Dist: scipy==1.14.1
Requires-Dist: setuptools==75.1.0
Requires-Dist: tensorflow==2.17.0
Requires-Dist: shap==0.44.1
Requires-Dist: pycatch22==0.4.5
Requires-Dist: scikit-optimize==0.10.2
Requires-Dist: pyswarms==1.3.0
Requires-Dist: types-toml
Requires-Dist: types-setuptools
Requires-Dist: wheel

![My Logo](assets/logo_imputegab.png)

# Welcome to ImputeGAP
ImputeGAP is a unified framework for imputation algorithms that provides a narrow-waist interface between algorithm evaluation and parameterization for datasets issued from various domains ranging from neuroscience, medicine, climate to energy.

The interface provides advanced imputation algorithms, construction of various missing values patterns, and different evaluation metrics. In addition, the framework offers support for AutoML parameterization techniques, feature extraction, and, potentially, analysis of feature impact using SHAP. The framework should allow a straightforward integration of new algorithms, datasets, and metrics.



<br /><hr /><br />



## Requirements
In order to use **ImputeGAP**, you must have :
* Python **3.12.0** or higher
* Run your implementation on a **Unix-compatible environment**.
<br><br>

To install these two prerequisites, please refer to the following documentation: <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/docs/installation" >install requirements</a><br><br>




<br /><hr /><br />




## Installation
To install ImputeGAP locally, download the package from GitHub, move inside the folder.

```bash
$ git init
$ git clone https://github.com/eXascaleInfolab/ImputeGAP
$ cd ./ImputeGAP
```

Then, once inside, run the command :

```bash
$ pip install -e .
```



<br /><hr /><br />

## Datasets
All datasets preconfigured in this library can be found here : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/naterq_skeleton_refac_3/imputegap/dataset" >link to datasets</a>



<br /><hr /><br />



## Loading and pre-process
The model of management is able to load any kind of time series datasets in text format that respect this condition :<br /><br />
<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*<br><br>

### Example Loading
```python
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="z_score")

# [OPTIONAL] you can plot your raw data / print the information
ts_1.plot(raw_data=ts_1.data, title="raw data", max_series=10, max_values=100, save_path="./assets")
ts_1.print(limit=10)
```

<br /><hr /><br />



## Contamination
ImputeGAP allows to contaminate datasets with a specific scenario to reproduce a situation. Up to now, the scenarios are : <b>MCAR, MISSING POURCENTAGE, ...</b><br />
Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a><br><br>


### Example Contamination
```python
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="min_max")

# 3. contamination of the data with MCAR scenario
infected_data = ts_1.Contaminate.mcar(ts_1.data, series_impacted=0.4, missing_rate=0.2, use_seed=True)
```

<br /><hr /><br />

## Imputation
ImputeGAP proposes many algorithms of imputation categorized in families, such as : <b>Matrix Decomposition, Machine Learning, Regression, Pattern Recognition, Statistical metods, ...</b><br />

It is also possible de add your own algorithm. To do so, just follow the min-impute template and replace the logic by your code.<br /><br />


### Example Imputation
```python
from imputegap.recovery.imputation import Imputation
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="min_max")

# 3. contamination of the data
infected_data = ts_1.Contaminate.mcar(ts_1.data)

# 4. imputation of the contaminated data
# choice of the algorithm, and their parameters (default, automl, or defined by the user)
cdrec = Imputation.MD.CDRec(infected_data)

# imputation with default values
cdrec.impute()
# OR imputation with user defined values
cdrec.impute(params={"rank": 5, "epsilon":0.01, "iterations": 100})

# [OPTIONAL] save your results in a new Time Series object
ts_3 = TimeSeries().import_matrix(cdrec.imputed_matrix)

# 5. score the imputation with the raw_data
cdrec.score(ts_1.data, ts_3.data)
```


<br /><hr /><br />

## Auto-ML
ImputeGAP provides optimization techniques that automatically find the right hyperparameters for a specific algorithm in relation to a certain dataset.

The optimizers available are : <b>Greedy Optimizer, Bayesian Optimizer, Particle Swarm Optimizer and Successive Halving</b>.<br /><br />

### Example Auto-ML
```python
from imputegap.recovery.imputation import Imputation
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="min_max")

# 3. contamination of the data
infected_data = ts_1.Contaminate.mcar(ts_1.data)

# 4. imputation of the contaminated data
# choice of the algorithm, and their parameters (default, automl, or defined by the user)
cdrec = Imputation.MD.CDRec(infected_data)

# imputation with AutoML which will discover the optimal hyperparameters for your dataset and your algorithm
cdrec.impute = Imputation.MD.CDRec(infected_data).impute(user_defined=False, params={"ground_truth": ts_1.data, "optimizer": "bayesian", "options": {"n_calls": 5}})

# 5. score the imputation with the raw_data
cdrec.score(ts_1.data, cdrec.imputed_matrix)
```


<br /><hr /><br />


## Explainer
ImputeGap provides you with an algorithm based on the SHAP library, which explains the results of your Imputations using features specific to your dataset.<br /><br />

### Example Explainer
```python
from imputegap.explainer.explainer import Explainer
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# load your data form ImputeGAP TimeSeries()
ts_1 = TimeSeries()
ts_1.load_timeseries(utils.search_path("eeg"))

# call the explanation of your dataset with a specific algorithm to gain insight on the Imputation results
shap_values, shap_details = Explainer.shap_explainer(raw_data=ts_1.data, file_name="eeg", algorithm="cdrec")

# [OPTIONAL] print the results with the impact of each feature.
Explainer.print(shap_values, shap_details)
```


<br /><hr /><br />

## Contributors
Quentin Nater (<a href="mailto:[email protected]">[email protected]</a>) and Dr. Mourad Khayati (<a href="mailto:[email protected]">[email protected]</a>)

<br /><br />
Loading

0 comments on commit 9c3dde1

Please sign in to comment.