Merge pull request #11 from eXascaleInfolab/naterq_skeleton_refac_3

naterq skeleton refac 3
eXascaleInfolab · Oct 11, 2024 · efb9806 · efb9806
2 parents b9e4a7b + e08de03
commit efb9806
Show file tree

Hide file tree

Showing 70 changed files with 1,167 additions and 279 deletions.
diff --git a/.github/workflows/pytest_optimization.yml b/.github/workflows/pytest_optimization.yml
@@ -35,4 +35,6 @@ jobs:
         python -m pytest ./tests/test_opti_bayesian_iim.py
         python -m pytest ./tests/test_opti_bayesian_mrnn.py
         python -m pytest ./tests/test_opti_bayesian_stmvl.py
-        python -m pytest ./tests/test_opti_greedy_cdrec.py
+        python -m pytest ./tests/test_opti_greedy_cdrec.py
+        python -m pytest ./tests/test_opti_pso_cdrec.py
+        python -m pytest ./tests/test_opti_sh_cdrec.py
diff --git a/.idea/workspace.xml b/.idea/workspace.xml
diff --git a/README.md b/README.md
@@ -5,30 +5,188 @@ ImputeGAP is a unified framework for imputation algorithms that provides a narro
 
 The interface provides advanced imputation algorithms, construction of various missing values patterns, and different evaluation metrics. In addition, the framework offers support for AutoML parameterization techniques, feature extraction, and, potentially, analysis of feature impact using SHAP. The framework should allow a straightforward integration of new algorithms, datasets, and metrics.
 
+
+
 <br /><hr /><br />
 
-## Installation
-To install in local ImputeGAP, download the package from GitHub and run the command : 
 
-```pip install -e .``` 
+
+## Requirements
+In order to use **ImputeGAP**, you must have :
+* Python **3.12.0** or higher
+* Run your implementation on a **Unix-compatible environment**.
+<br><br>
+
+To install these two prerequisites, please refer to the following documentation: <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/docs/installation#readme" >install requirements</a><br><br>
+
+
+
 
 <br /><hr /><br />
 
-## Execution
-To execute a code containing the library ImputeGAP, we strongly advise you to use a unix environment. For <b>Windows OS</b>, please use the <b>WSL</b> tool to compute your project.
 
-WSL can be choosen on IDE on the interpreter settings.
+
+
+## Installation
+To install ImputeGAP locally, download the package from GitHub and run the command : 
+
+```pip install -e .``` 
+
+
 
 <br /><hr /><br />
 
-## Loading - Manager
+
+
+## Loading and pre-process
 The model of management is able to load any kind of time series datasets in text format that respect this condition :<br /><br />
-<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*
+<b>(Values,Series)</b> : *series are seperated by space et values by a carriage return \n.*<br><br>
+
+### Example Loading
+```python
+from imputegap.recovery.manager import TimeSeries
+from imputegap.tools import utils
+
+# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
+ts_1 = TimeSeries()
+
+# 2. load the timeseries from file or from the code
+ts_1.load_timeseries(utils.search_path("eeg"))
+ts_1.normalize(normalizer="z_score")
+
+# [OPTIONAL] you can plot your raw data / print the information
+ts_1.plot(raw_data=ts_1.data, title="raw data", max_series=10, max_values=100, save_path="./assets")
+ts_1.print(limit=10)
+```
 
 <br /><hr /><br />
 
+
+
 ## Contamination
 ImputeGAP allows to contaminate datasets with a specific scenario to reproduce a situation. Up to now, the scenarios are : <b>MCAR, MISSING POURCENTAGE, ...</b><br />
-Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a>
+Please find the documentation in this page : <a href="https://github.com/eXascaleInfolab/ImputeGAP/tree/main/imputegap/contamination#readme" >missing data scenarios</a><br><br>
+
+
+### Example Contamination
+```python
+from imputegap.recovery.manager import TimeSeries
+from imputegap.tools import utils
+
+# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
+ts_1 = TimeSeries()
+
+# 2. load the timeseries from file or from the code
+ts_1.load_timeseries(utils.search_path("eeg"))
+ts_1.normalize(normalizer="min_max")
+
+# 3. contamination of the data with MCAR scenario
+infected_data = ts_1.Contaminate.mcar(ts_1.data, series_impacted=0.4, missing_rate=0.2, use_seed=True)
+```
+
+<br /><hr /><br />
+
+## Imputation
+ImputeGAP proposes many algorithms of imputation categorized in families, such as : <b>Matrix Decomposition, Machine Learning, Regression, Pattern Recognition, Statistical metods, ...</b><br />
+
+It is also possible de add your own algorithm. To do so, just follow the min-impute template and replace the logic by your code.<br /><br />
+
+
+### Example Imputation
+```python
+from imputegap.recovery.imputation import Imputation
+from imputegap.recovery.manager import TimeSeries
+from imputegap.tools import utils
+
+# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
+ts_1 = TimeSeries()
+
+# 2. load the timeseries from file or from the code
+ts_1.load_timeseries(utils.search_path("eeg"))
+ts_1.normalize(normalizer="min_max")
+
+# 3. contamination of the data
+infected_data = ts_1.Contaminate.mcar(ts_1.data)
+
+# 4. imputation of the contaminated data
+# choice of the algorithm, and their parameters (default, automl, or defined by the user)
+cdrec = Imputation.MD.CDRec(infected_data)
+
+# imputation with default values
+cdrec.impute()
+# OR imputation with user defined values
+cdrec.impute(params={"rank": 5, "epsilon":0.01, "iterations": 100})
+
+# [OPTIONAL] save your results in a new Time Series object
+ts_3 = TimeSeries().import_matrix(cdrec.imputed_matrix)
+
+# 5. score the imputation with the raw_data
+cdrec.score(ts_1.data, ts_3.data)
+```
+
 
 <br /><hr /><br />
+
+## Auto-ML
+ImputeGAP provides optimization techniques that automatically find the right hyperparameters for a specific algorithm in relation to a certain dataset.
+
+The optimizers available are : <b>Greedy Optimizer, Bayesian Optimizer, Particle Swarm Optimizer and Successive Halving</b>.<br /><br />
+
+### Example Auto-ML
+```python
+from imputegap.recovery.imputation import Imputation
+from imputegap.recovery.manager import TimeSeries
+from imputegap.tools import utils
+
+# 1. initiate the TimeSeries() object that will stay with you throughout the analysis
+ts_1 = TimeSeries()
+
+# 2. load the timeseries from file or from the code
+ts_1.load_timeseries(utils.search_path("eeg"))
+ts_1.normalize(normalizer="min_max")
+
+# 3. contamination of the data
+infected_data = ts_1.Contaminate.mcar(ts_1.data)
+
+# 4. imputation of the contaminated data
+# choice of the algorithm, and their parameters (default, automl, or defined by the user)
+cdrec = Imputation.MD.CDRec(infected_data)
+
+# imputation with AutoML which will discover the optimal hyperparameters for your dataset and your algorithm
+cdrec.impute = Imputation.MD.CDRec(infected_data).impute(user_defined=False, params={"ground_truth": ts_1.data, "optimizer": "bayesian", "options": {"n_calls": 5}})
+
+# 5. score the imputation with the raw_data
+cdrec.score(ts_1.data, cdrec.imputed_matrix)
+```
+
+
+<br /><hr /><br />
+
+
+## Explainer
+ImputeGap provides you with an algorithm based on the SHAP library, which explains the results of your Imputations using features specific to your dataset.<br /><br />
+
+### Example Explainer
+```python
+from imputegap.explainer.explainer import Explainer
+from imputegap.recovery.manager import TimeSeries
+from imputegap.tools import utils
+
+# load your data form ImputeGAP TimeSeries()
+ts_1 = TimeSeries()
+ts_1.load_timeseries(utils.search_path("eeg"))
+
+# Call the explaination of your dataset with a specific algorithm to gain insight on the Imputation results
+shap_values, shap_details = Explainer.shap_explainer(raw_data=ts_1.data, file_name="eeg", algorithm="cdrec")
+
+# [OPTIONAL] print the results with the impact of each feature.
+Explainer.print(shap_values, shap_details)
+```
+
+
+<br /><hr /><br />
+
+## Contributors
+Quentin Nater (<a href="mailto:[email protected]">[email protected]</a>) and Dr. Mourad Khayati (<a href="mailto:[email protected]">[email protected]</a>)
+
+<br /><br />
diff --git a/docs/installation/README.md b/docs/installation/README.md
@@ -0,0 +1,81 @@
+![My Logo](../../assets/logo_imputegab.png)
+
+# Installation for ImputeGAP
+
+## Requirements
+In order to use **ImputeGAP**, you must have Python **3.12.0** or higher and run your code in a **Unix-compatible environment**.
+<br><br>
+
+
+### Install WSL for Windows
+To run your implementation in a Unix-compatible environment on Windows, we recommend you install **WSL (Windows Subsystem for Linux)**.
+
+0. Check if you already have installed **WSL**, by typing `WSL` in the search menu.
+1. If it is not installed, open **PowerShell** as Administrator (right-click the Start menu and select **Windows PowerShell (Admin)**).
+2. Run the following command to install WSL:
+   ```powershell
+   wsl --install
+   
+3. This will install the latest version of WSL and a default Linux distribution (usually Ubuntu). After the installation, you'll need to restart your computer.
+<br><br>
+*WSL can be selected in the IDE in the interpreter parameters.*
+<br><br>
+
+### Install Python 3.12.0
+
+To use **ImputeGAP** effectively, ensure that your environment has **Python** version **3.12.0** or higher installed. Follow these steps to install or update Python in your Unix-compatible environment:
+
+##### Step 1: Check Existing Python Version
+
+Open your terminal and check the currently installed version of Python by running:
+
+```bash
+python3 --version
+```
+<br>
+
+##### Step 2: Install Python
+Update your package list and install the necessary dependencies for building Python:
+```bash
+sudo apt update
+sudo apt install -y build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev
+```
+<br>
+Download Python 3.12.0 source code from the official Python website and extract it :
+
+```bash
+cd /usr/src
+sudo wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz
+sudo tar xzf Python-3.12.0.tgz
+```
+<br>
+Compile and install Python 12:
+
+```bash
+cd Python-3.12.0
+sudo ./configure --enable-optimizations
+sudo make altinstall
+```
+<br>
+Verify the installation:
+
+```bash
+python3.12 --version
+```
+
+
+
+
+<br /><hr /><br />
+
+
+
+
+## Installation
+To install in local ImputeGAP, download the package from GitHub and run the command : 
+
+```pip install -e .``` 
+
+
+
+<br /><hr /><br />
diff --git a/env/default_values.toml b/env/default_values.toml
@@ -19,6 +19,32 @@ learning_rate = 0.01
 iterations = 50
 sequence_length = 7
 
+[greedy]
+n_calls = 250
+selected_metrics='RMSE'
+
+[bayesian]
+n_calls = 2
+n_random_starts = 50
+acq_func = 'gp_hedge'
+selected_metrics='RMSE'
+
+[pso]
+n_particles = 50
+c1 = 0.5
+c2 = 0.3
+w = 0.9
+iterations=10
+n_processes=1
+selected_metrics='RMSE'
+
+[sh]
+num_configs = 10
+num_iterations = 2
+reduction_factor = 10
+selected_metrics='RMSE'
+
+
 [explainer]
 splitter = 10
 nbr_series = 15

diff --git a/imputegap/algorithms/__pycache__/cdrec.cpython-312.pyc b/imputegap/algorithms/__pycache__/cdrec.cpython-312.pyc
diff --git a/imputegap/algorithms/__pycache__/iim.cpython-312.pyc b/imputegap/algorithms/__pycache__/iim.cpython-312.pyc
diff --git a/imputegap/algorithms/__pycache__/mrnn.cpython-312.pyc b/imputegap/algorithms/__pycache__/mrnn.cpython-312.pyc
diff --git a/imputegap/algorithms/__pycache__/stmvl.cpython-312.pyc b/imputegap/algorithms/__pycache__/stmvl.cpython-312.pyc
diff --git a/imputegap/algorithms/cdrec.py b/imputegap/algorithms/cdrec.py
@@ -1,6 +1,7 @@
 import ctypes
 import os
 import platform
+import time
 import ctypes as __native_c_types_import;
 import numpy as __numpy_import;
 
@@ -80,7 +81,7 @@ def native_cdrec(__py_matrix, __py_rank, __py_eps, __py_iters):
     return __py_recovered;
 
 
-def cdrec(contamination, truncation_rank, iterations, epsilon):
+def cdrec(contamination, truncation_rank, iterations, epsilon, logs=True):
     """
     CDREC algorithm for imputation of missing data
     @author : Quentin Nater
@@ -90,11 +91,19 @@ def cdrec(contamination, truncation_rank, iterations, epsilon):
     :param epsilon : learning rate
     :param iterations : number of iterations
 
+    :param logs: print logs of time execution
+
     :return: imputed_matrix, metrics : all time series with imputation data and their metrics
 
     """
+    start_time = time.time()  # Record start time
 
     # Call the C++ function to perform recovery
     imputed_matrix = native_cdrec(contamination, truncation_rank, epsilon, iterations)
 
+    end_time = time.time()
+
+    if logs:
+        print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n")
+
     return imputed_matrix
diff --git a/imputegap/algorithms/iim.py b/imputegap/algorithms/iim.py
@@ -1,9 +1,8 @@
-import numpy as np
-
+import time
 from imputegap.wrapper.AlgoPython.IIM.testerIIM import impute_with_algorithm
 
 
-def iim(contamination, number_neighbor, algo_code):
+def iim(contamination, number_neighbor, algo_code, logs=True):
     """
     Template zero impute for adding your own algorithms
     @author : Quentin Nater
@@ -12,10 +11,18 @@ def iim(contamination, number_neighbor, algo_code):
     :param adaptive_flag: The algorithm will run the non-adaptive version of the algorithm, as described in the paper
     :param number_neighbor : The number of neighbors to use for the KNN classifier, by default 10.
     :param algo_code : Action of the IIM output
+
+    :param logs: print logs of time execution
+
     :return: imputed_matrix, metrics : all time series with imputation data and their metrics
 
     """
-    #imputed_matrix = iim_recovery(matrix_nan=contamination, adaptive_flag=adaptive_flag, learning_neighbors=number_neighbor)
+    start_time = time.time()  # Record start time
+
     imputed_matrix = impute_with_algorithm(algo_code, contamination.copy(), number_neighbor)
 
+    end_time = time.time()
+    if logs:
+        print(f"\n\t\t> logs, imputation iim - Execution Time: {(end_time - start_time):.4f} seconds\n")
+
     return imputed_matrix