Skip to content

Commit

Permalink
Merge pull request #11 from eXascaleInfolab/naterq_skeleton_refac_3
Browse files Browse the repository at this point in the history
naterq skeleton refac 3
  • Loading branch information
qnater authored Oct 11, 2024
2 parents b9e4a7b + e08de03 commit efb9806
Show file tree
Hide file tree
Showing 70 changed files with 1,167 additions and 279 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/pytest_optimization.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,6 @@ jobs:
python -m pytest ./tests/test_opti_bayesian_iim.py
python -m pytest ./tests/test_opti_bayesian_mrnn.py
python -m pytest ./tests/test_opti_bayesian_stmvl.py
python -m pytest ./tests/test_opti_greedy_cdrec.py
python -m pytest ./tests/test_opti_greedy_cdrec.py
python -m pytest ./tests/test_opti_pso_cdrec.py
python -m pytest ./tests/test_opti_sh_cdrec.py
89 changes: 54 additions & 35 deletions .idea/workspace.xml

Large diffs are not rendered by default.

176 changes: 167 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,188 @@ ImputeGAP is a unified framework for imputation algorithms that provides a narro

The interface provides advanced imputation algorithms, construction of various missing values patterns, and different evaluation metrics. In addition, the framework offers support for AutoML parameterization techniques, feature extraction, and, potentially, analysis of feature impact using SHAP. The framework should allow a straightforward integration of new algorithms, datasets, and metrics.



<br /><hr /><br />

## Installation
To install in local ImputeGAP, download the package from GitHub and run the command :

```pip install -e .```

## Requirements
In order to use **ImputeGAP**, you must have :
* Python **3.12.0** or higher
* Run your implementation on a **Unix-compatible environment**.
<br><br>

To install these two prerequisites, please refer to the following documentation: <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/docs/installation#readme" >install requirements</a><br><br>




<br /><hr /><br />

## Execution
To execute a code containing the library ImputeGAP, we strongly advise you to use a unix environment. For <b>Windows OS</b>, please use the <b>WSL</b> tool to compute your project.

WSL can be choosen on IDE on the interpreter settings.


## Installation
To install ImputeGAP locally, download the package from GitHub and run the command :

```pip install -e .```



<br /><hr /><br />

## Loading - Manager


## Loading and pre-process
The model of management is able to load any kind of time series datasets in text format that respect this condition :<br /><br />
<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*
<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*<br><br>

### Example Loading
```python
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="z_score")

# [OPTIONAL] you can plot your raw data / print the information
ts_1.plot(raw_data=ts_1.data, title="raw data", max_series=10, max_values=100, save_path="./assets")
ts_1.print(limit=10)
```

<br /><hr /><br />



## Contamination
ImputeGAP allows to contaminate datasets with a specific scenario to reproduce a situation. Up to now, the scenarios are : <b>MCAR, MISSING POURCENTAGE, ...</b><br />
Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a>
Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a><br><br>


### Example Contamination
```python
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="min_max")

# 3. contamination of the data with MCAR scenario
infected_data = ts_1.Contaminate.mcar(ts_1.data, series_impacted=0.4, missing_rate=0.2, use_seed=True)
```

<br /><hr /><br />

## Imputation
ImputeGAP proposes many algorithms of imputation categorized in families, such as : <b>Matrix Decomposition, Machine Learning, Regression, Pattern Recognition, Statistical metods, ...</b><br />

It is also possible de add your own algorithm. To do so, just follow the min-impute template and replace the logic by your code.<br /><br />


### Example Imputation
```python
from imputegap.recovery.imputation import Imputation
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="min_max")

# 3. contamination of the data
infected_data = ts_1.Contaminate.mcar(ts_1.data)

# 4. imputation of the contaminated data
# choice of the algorithm, and their parameters (default, automl, or defined by the user)
cdrec = Imputation.MD.CDRec(infected_data)

# imputation with default values
cdrec.impute()
# OR imputation with user defined values
cdrec.impute(params={"rank": 5, "epsilon":0.01, "iterations": 100})

# [OPTIONAL] save your results in a new Time Series object
ts_3 = TimeSeries().import_matrix(cdrec.imputed_matrix)

# 5. score the imputation with the raw_data
cdrec.score(ts_1.data, ts_3.data)
```


<br /><hr /><br />

## Auto-ML
ImputeGAP provides optimization techniques that automatically find the right hyperparameters for a specific algorithm in relation to a certain dataset.

The optimizers available are : <b>Greedy Optimizer, Bayesian Optimizer, Particle Swarm Optimizer and Successive Halving</b>.<br /><br />

### Example Auto-ML
```python
from imputegap.recovery.imputation import Imputation
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
ts_1 = TimeSeries()

# 2. load the timeseries from file or from the code
ts_1.load_timeseries(utils.search_path("eeg"))
ts_1.normalize(normalizer="min_max")

# 3. contamination of the data
infected_data = ts_1.Contaminate.mcar(ts_1.data)

# 4. imputation of the contaminated data
# choice of the algorithm, and their parameters (default, automl, or defined by the user)
cdrec = Imputation.MD.CDRec(infected_data)

# imputation with AutoML which will discover the optimal hyperparameters for your dataset and your algorithm
cdrec.impute = Imputation.MD.CDRec(infected_data).impute(user_defined=False, params={"ground_truth": ts_1.data, "optimizer": "bayesian", "options": {"n_calls": 5}})

# 5. score the imputation with the raw_data
cdrec.score(ts_1.data, cdrec.imputed_matrix)
```


<br /><hr /><br />


## Explainer
ImputeGap provides you with an algorithm based on the SHAP library, which explains the results of your Imputations using features specific to your dataset.<br /><br />

### Example Explainer
```python
from imputegap.explainer.explainer import Explainer
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# load your data form ImputeGAP TimeSeries()
ts_1 = TimeSeries()
ts_1.load_timeseries(utils.search_path("eeg"))

# Call the explaination of your dataset with a specific algorithm to gain insight on the Imputation results
shap_values, shap_details = Explainer.shap_explainer(raw_data=ts_1.data, file_name="eeg", algorithm="cdrec")

# [OPTIONAL] print the results with the impact of each feature.
Explainer.print(shap_values, shap_details)
```


<br /><hr /><br />

## Contributors
Quentin Nater (<a href="mailto:[email protected]">[email protected]</a>) and Dr. Mourad Khayati (<a href="mailto:[email protected]">[email protected]</a>)

<br /><br />
81 changes: 81 additions & 0 deletions docs/installation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
![My Logo](../../assets/logo_imputegab.png)

# Installation for ImputeGAP

## Requirements
In order to use **ImputeGAP**, you must have Python **3.12.0** or higher and run your code in a **Unix-compatible environment**.
<br><br>


### Install WSL for Windows
To run your implementation in a Unix-compatible environment on Windows, we recommend you install **WSL (Windows Subsystem for Linux)**.

0. Check if you already have installed **WSL**, by typing `WSL` in the search menu.
1. If it is not installed, open **PowerShell** as Administrator (right-click the Start menu and select **Windows PowerShell (Admin)**).
2. Run the following command to install WSL:
```powershell
wsl --install
3. This will install the latest version of WSL and a default Linux distribution (usually Ubuntu). After the installation, you'll need to restart your computer.
<br><br>
*WSL can be selected in the IDE in the interpreter parameters.*
<br><br>
### Install Python 3.12.0
To use **ImputeGAP** effectively, ensure that your environment has **Python** version **3.12.0** or higher installed. Follow these steps to install or update Python in your Unix-compatible environment:
##### Step 1: Check Existing Python Version
Open your terminal and check the currently installed version of Python by running:
```bash
python3 --version
```
<br>

##### Step 2: Install Python
Update your package list and install the necessary dependencies for building Python:
```bash
sudo apt update
sudo apt install -y build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev
```
<br>
Download Python 3.12.0 source code from the official Python website and extract it :

```bash
cd /usr/src
sudo wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz
sudo tar xzf Python-3.12.0.tgz
```
<br>
Compile and install Python 12:

```bash
cd Python-3.12.0
sudo ./configure --enable-optimizations
sudo make altinstall
```
<br>
Verify the installation:

```bash
python3.12 --version
```




<br /><hr /><br />




## Installation
To install in local ImputeGAP, download the package from GitHub and run the command :

```pip install -e .```



<br /><hr /><br />
26 changes: 26 additions & 0 deletions env/default_values.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,32 @@ learning_rate = 0.01
iterations = 50
sequence_length = 7

[greedy]
n_calls = 250
selected_metrics='RMSE'

[bayesian]
n_calls = 2
n_random_starts = 50
acq_func = 'gp_hedge'
selected_metrics='RMSE'

[pso]
n_particles = 50
c1 = 0.5
c2 = 0.3
w = 0.9
iterations=10
n_processes=1
selected_metrics='RMSE'

[sh]
num_configs = 10
num_iterations = 2
reduction_factor = 10
selected_metrics='RMSE'


[explainer]
splitter = 10
nbr_series = 15
Expand Down
Binary file modified imputegap/algorithms/__pycache__/cdrec.cpython-312.pyc
Binary file not shown.
Binary file modified imputegap/algorithms/__pycache__/iim.cpython-312.pyc
Binary file not shown.
Binary file modified imputegap/algorithms/__pycache__/mrnn.cpython-312.pyc
Binary file not shown.
Binary file modified imputegap/algorithms/__pycache__/stmvl.cpython-312.pyc
Binary file not shown.
11 changes: 10 additions & 1 deletion imputegap/algorithms/cdrec.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import ctypes
import os
import platform
import time
import ctypes as __native_c_types_import;
import numpy as __numpy_import;

Expand Down Expand Up @@ -80,7 +81,7 @@ def native_cdrec(__py_matrix, __py_rank, __py_eps, __py_iters):
return __py_recovered;


def cdrec(contamination, truncation_rank, iterations, epsilon):
def cdrec(contamination, truncation_rank, iterations, epsilon, logs=True):
"""
CDREC algorithm for imputation of missing data
@author : Quentin Nater
Expand All @@ -90,11 +91,19 @@ def cdrec(contamination, truncation_rank, iterations, epsilon):
:param epsilon : learning rate
:param iterations : number of iterations
:param logs: print logs of time execution
:return: imputed_matrix, metrics : all time series with imputation data and their metrics
"""
start_time = time.time() # Record start time

# Call the C++ function to perform recovery
imputed_matrix = native_cdrec(contamination, truncation_rank, epsilon, iterations)

end_time = time.time()

if logs:
print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n")

return imputed_matrix
15 changes: 11 additions & 4 deletions imputegap/algorithms/iim.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
import numpy as np

import time
from imputegap.wrapper.AlgoPython.IIM.testerIIM import impute_with_algorithm


def iim(contamination, number_neighbor, algo_code):
def iim(contamination, number_neighbor, algo_code, logs=True):
"""
Template zero impute for adding your own algorithms
@author : Quentin Nater
Expand All @@ -12,10 +11,18 @@ def iim(contamination, number_neighbor, algo_code):
:param adaptive_flag: The algorithm will run the non-adaptive version of the algorithm, as described in the paper
:param number_neighbor : The number of neighbors to use for the KNN classifier, by default 10.
:param algo_code : Action of the IIM output
:param logs: print logs of time execution
:return: imputed_matrix, metrics : all time series with imputation data and their metrics
"""
#imputed_matrix = iim_recovery(matrix_nan=contamination, adaptive_flag=adaptive_flag, learning_neighbors=number_neighbor)
start_time = time.time() # Record start time

imputed_matrix = impute_with_algorithm(algo_code, contamination.copy(), number_neighbor)

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation iim - Execution Time: {(end_time - start_time):.4f} seconds\n")

return imputed_matrix
Loading

0 comments on commit efb9806

Please sign in to comment.