-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #11 from eXascaleInfolab/naterq_skeleton_refac_3
naterq skeleton refac 3
- Loading branch information
Showing
70 changed files
with
1,167 additions
and
279 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,30 +5,188 @@ ImputeGAP is a unified framework for imputation algorithms that provides a narro | |
|
||
The interface provides advanced imputation algorithms, construction of various missing values patterns, and different evaluation metrics. In addition, the framework offers support for AutoML parameterization techniques, feature extraction, and, potentially, analysis of feature impact using SHAP. The framework should allow a straightforward integration of new algorithms, datasets, and metrics. | ||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Installation | ||
To install in local ImputeGAP, download the package from GitHub and run the command : | ||
|
||
```pip install -e .``` | ||
|
||
## Requirements | ||
In order to use **ImputeGAP**, you must have : | ||
* Python **3.12.0** or higher | ||
* Run your implementation on a **Unix-compatible environment**. | ||
<br><br> | ||
|
||
To install these two prerequisites, please refer to the following documentation: <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/docs/installation#readme" >install requirements</a><br><br> | ||
|
||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Execution | ||
To execute a code containing the library ImputeGAP, we strongly advise you to use a unix environment. For <b>Windows OS</b>, please use the <b>WSL</b> tool to compute your project. | ||
|
||
WSL can be choosen on IDE on the interpreter settings. | ||
|
||
|
||
## Installation | ||
To install ImputeGAP locally, download the package from GitHub and run the command : | ||
|
||
```pip install -e .``` | ||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Loading - Manager | ||
|
||
|
||
## Loading and pre-process | ||
The model of management is able to load any kind of time series datasets in text format that respect this condition :<br /><br /> | ||
<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.* | ||
<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*<br><br> | ||
|
||
### Example Loading | ||
```python | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="z_score") | ||
|
||
# [OPTIONAL] you can plot your raw data / print the information | ||
ts_1.plot(raw_data=ts_1.data, title="raw data", max_series=10, max_values=100, save_path="./assets") | ||
ts_1.print(limit=10) | ||
``` | ||
|
||
<br /><hr /><br /> | ||
|
||
|
||
|
||
## Contamination | ||
ImputeGAP allows to contaminate datasets with a specific scenario to reproduce a situation. Up to now, the scenarios are : <b>MCAR, MISSING POURCENTAGE, ...</b><br /> | ||
Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a> | ||
Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a><br><br> | ||
|
||
|
||
### Example Contamination | ||
```python | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="min_max") | ||
|
||
# 3. contamination of the data with MCAR scenario | ||
infected_data = ts_1.Contaminate.mcar(ts_1.data, series_impacted=0.4, missing_rate=0.2, use_seed=True) | ||
``` | ||
|
||
<br /><hr /><br /> | ||
|
||
## Imputation | ||
ImputeGAP proposes many algorithms of imputation categorized in families, such as : <b>Matrix Decomposition, Machine Learning, Regression, Pattern Recognition, Statistical metods, ...</b><br /> | ||
|
||
It is also possible de add your own algorithm. To do so, just follow the min-impute template and replace the logic by your code.<br /><br /> | ||
|
||
|
||
### Example Imputation | ||
```python | ||
from imputegap.recovery.imputation import Imputation | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="min_max") | ||
|
||
# 3. contamination of the data | ||
infected_data = ts_1.Contaminate.mcar(ts_1.data) | ||
|
||
# 4. imputation of the contaminated data | ||
# choice of the algorithm, and their parameters (default, automl, or defined by the user) | ||
cdrec = Imputation.MD.CDRec(infected_data) | ||
|
||
# imputation with default values | ||
cdrec.impute() | ||
# OR imputation with user defined values | ||
cdrec.impute(params={"rank": 5, "epsilon":0.01, "iterations": 100}) | ||
|
||
# [OPTIONAL] save your results in a new Time Series object | ||
ts_3 = TimeSeries().import_matrix(cdrec.imputed_matrix) | ||
|
||
# 5. score the imputation with the raw_data | ||
cdrec.score(ts_1.data, ts_3.data) | ||
``` | ||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Auto-ML | ||
ImputeGAP provides optimization techniques that automatically find the right hyperparameters for a specific algorithm in relation to a certain dataset. | ||
|
||
The optimizers available are : <b>Greedy Optimizer, Bayesian Optimizer, Particle Swarm Optimizer and Successive Halving</b>.<br /><br /> | ||
|
||
### Example Auto-ML | ||
```python | ||
from imputegap.recovery.imputation import Imputation | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# 1. initiate the TimeSeries() object that will stay with you throughout the analysis | ||
ts_1 = TimeSeries() | ||
|
||
# 2. load the timeseries from file or from the code | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
ts_1.normalize(normalizer="min_max") | ||
|
||
# 3. contamination of the data | ||
infected_data = ts_1.Contaminate.mcar(ts_1.data) | ||
|
||
# 4. imputation of the contaminated data | ||
# choice of the algorithm, and their parameters (default, automl, or defined by the user) | ||
cdrec = Imputation.MD.CDRec(infected_data) | ||
|
||
# imputation with AutoML which will discover the optimal hyperparameters for your dataset and your algorithm | ||
cdrec.impute = Imputation.MD.CDRec(infected_data).impute(user_defined=False, params={"ground_truth": ts_1.data, "optimizer": "bayesian", "options": {"n_calls": 5}}) | ||
|
||
# 5. score the imputation with the raw_data | ||
cdrec.score(ts_1.data, cdrec.imputed_matrix) | ||
``` | ||
|
||
|
||
<br /><hr /><br /> | ||
|
||
|
||
## Explainer | ||
ImputeGap provides you with an algorithm based on the SHAP library, which explains the results of your Imputations using features specific to your dataset.<br /><br /> | ||
|
||
### Example Explainer | ||
```python | ||
from imputegap.explainer.explainer import Explainer | ||
from imputegap.recovery.manager import TimeSeries | ||
from imputegap.tools import utils | ||
|
||
# load your data form ImputeGAP TimeSeries() | ||
ts_1 = TimeSeries() | ||
ts_1.load_timeseries(utils.search_path("eeg")) | ||
|
||
# Call the explaination of your dataset with a specific algorithm to gain insight on the Imputation results | ||
shap_values, shap_details = Explainer.shap_explainer(raw_data=ts_1.data, file_name="eeg", algorithm="cdrec") | ||
|
||
# [OPTIONAL] print the results with the impact of each feature. | ||
Explainer.print(shap_values, shap_details) | ||
``` | ||
|
||
|
||
<br /><hr /><br /> | ||
|
||
## Contributors | ||
Quentin Nater (<a href="mailto:[email protected]">[email protected]</a>) and Dr. Mourad Khayati (<a href="mailto:[email protected]">[email protected]</a>) | ||
|
||
<br /><br /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
![My Logo](../../assets/logo_imputegab.png) | ||
|
||
# Installation for ImputeGAP | ||
|
||
## Requirements | ||
In order to use **ImputeGAP**, you must have Python **3.12.0** or higher and run your code in a **Unix-compatible environment**. | ||
<br><br> | ||
|
||
|
||
### Install WSL for Windows | ||
To run your implementation in a Unix-compatible environment on Windows, we recommend you install **WSL (Windows Subsystem for Linux)**. | ||
|
||
0. Check if you already have installed **WSL**, by typing `WSL` in the search menu. | ||
1. If it is not installed, open **PowerShell** as Administrator (right-click the Start menu and select **Windows PowerShell (Admin)**). | ||
2. Run the following command to install WSL: | ||
```powershell | ||
wsl --install | ||
3. This will install the latest version of WSL and a default Linux distribution (usually Ubuntu). After the installation, you'll need to restart your computer. | ||
<br><br> | ||
*WSL can be selected in the IDE in the interpreter parameters.* | ||
<br><br> | ||
### Install Python 3.12.0 | ||
To use **ImputeGAP** effectively, ensure that your environment has **Python** version **3.12.0** or higher installed. Follow these steps to install or update Python in your Unix-compatible environment: | ||
##### Step 1: Check Existing Python Version | ||
Open your terminal and check the currently installed version of Python by running: | ||
```bash | ||
python3 --version | ||
``` | ||
<br> | ||
|
||
##### Step 2: Install Python | ||
Update your package list and install the necessary dependencies for building Python: | ||
```bash | ||
sudo apt update | ||
sudo apt install -y build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev | ||
``` | ||
<br> | ||
Download Python 3.12.0 source code from the official Python website and extract it : | ||
|
||
```bash | ||
cd /usr/src | ||
sudo wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz | ||
sudo tar xzf Python-3.12.0.tgz | ||
``` | ||
<br> | ||
Compile and install Python 12: | ||
|
||
```bash | ||
cd Python-3.12.0 | ||
sudo ./configure --enable-optimizations | ||
sudo make altinstall | ||
``` | ||
<br> | ||
Verify the installation: | ||
|
||
```bash | ||
python3.12 --version | ||
``` | ||
|
||
|
||
|
||
|
||
<br /><hr /><br /> | ||
|
||
|
||
|
||
|
||
## Installation | ||
To install in local ImputeGAP, download the package from GitHub and run the command : | ||
|
||
```pip install -e .``` | ||
|
||
|
||
|
||
<br /><hr /><br /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.