SIVS is an acronym of Stable Iterative Variable Selection, and as the name suggests is a feature selection method that is robust to the variations that cross-validation can have on various methods with embedded feature selection. This method hired an iterative approach and internally utilizes varius Machine Learning methods which have embedded feature reduction in order to shrink down the feature space into a small and yet robust set.
For citation information, see the citation section of this document.
Table of Content
You can download and install the latest stable version from CRAN via:
install.packages("sivs", repos = "https://cran.rstudio.com")
Alternatively, you can install it directly from github via either of the following:
### First Approach
if (!require("devtools")) install.packages("devtools")
devtools::install_github("mmahmoudian/sivs")
### Second Approach
if (!require("remotes")) install.packages("remotes")
remotes::install_github('mmahmoudian/sivs')
Additionally, this package is also available via various package managers in Linux:
There is already a very good vignette that explains how sivs
should be used and I strongly encourage you to read it, but if you need a very very short set of instructions to kick-start, here it is:
In this example let's assume you have two classes (e.g dead vs. alive) that you want to know which minimal set of features can be used to differentiate between them:
- Prepare your data
- have features as columns, and samples as rows
- normalize, impute missing values, etc. as you see fit
- if you have multiple samples from the same individual, do your best that you have the same number of samples per individual (e.g by randomly choos them) to prevent giving extra weight to specific individual
- Run
sivs::sivs()
require("sivs") sivs_obj <- sivs::sivs(x = data, y = class)
- get the variable (a.k.a feature) importance. This only contains features that have importance more than 0:
sivs_obj$vimp
- [optional] shrink the feature list even more. This will only return an ordered list of features which is typicaly much smaller than
vimp
:sivs_suggested_features <- sivs::suggest(sivs_obj)
- Now you can only use these features in your machine learning:
smaller_data <- data[, sivs_suggested_features]
You can also build this package completelty from source and you are expected to get identical files as in CRAN. This can be useful for those who want to contribute to the package. I have made it easy and straight-forward to build and test the package using the GNU make. Follow these steps in order:
- First make sure you have the
make
installed and the package building dependencies# if make is installed, you will see the version make --version # this will show you a general help of available commands make help # this will check if you have the needed R packages and if not, it will install them for you make deps
- Change the code and files as needed
- if you have changed the R code and want to test it, you can build the R code and skip building the manual and vignette:
if you have changed the manual:
make build-noman
make docs make build
- install the package and make sure things are in order and working as you expected:
make install
- When you confirmed that everything is in order, repeat all the building steps with CRAN checking:
Alternatively you can run the following which is short-form for the command above:
make docs build check-cran install
make all-cran
This is a Free and Libré OpenSource Software (FLOSS) and therefore any contribution is welcome as long as it does not violate the license. To contribute, follow the steps in the Building From Source and then before creating the pull-request, make sure you have solved all ERRORs, WARNINGs and possibly all NOTEs produced by the following:
make all-cran
make check-cran
This method has been published in the journal of Bioinformatics:
Mehrad Mahmoudian, Mikko S Venäläinen, Riku Klén, Laura L Elo, Stable Iterative Variable Selection, Bioinformatics, 2021;, btab501, https://doi.org/10.1093/bioinformatics/btab501
BibTeX entry for LaTeX users:
@article{10.1093/bioinformatics/btab501,
author = {Mahmoudian, Mehrad and Venäläinen, Mikko S and Klén, Riku and Elo, Laura L},
title = "{Stable Iterative Variable Selection}",
journal = {Bioinformatics},
year = {2021},
month = {07},
abstract = "{The emergence of datasets with tens of thousands of features, such as high-throughput omics biomedical data, highlights the importance of reducing the feature space into a distilled subset that can truly capture the signal for research and industry by aiding in finding more effective biomarkers for the question in hand. A good feature set also facilitates building robust predictive models with improved interpretability and convergence of the applied method due to the smaller feature space.Here, we present a robust feature selection method named Stable Iterative Variable Selection (SIVS) and assess its performance over both omics and clinical data types. As a performance assessment metric, we compared the number and goodness of the selected feature using SIVS to those selected by LASSO regression. The results suggested that the feature space selected by SIVS was, on average, 41\\% smaller, without having a negative effect on the model performance. A similar result was observed for comparison with Boruta and Caret RFE.The method is implemented as an R package under GNU General Public License v3.0 and is accessible via Comprehensive R Archive Network (CRAN) via https://cran.r-project.org/web/packages/sivs/index.html or through Github via https://github.com/mmahmoudian/sivs/Supplementary data are available at Bioinformatics online.}",
issn = {1367-4803},
doi = {10.1093/bioinformatics/btab501},
url = {https://doi.org/10.1093/bioinformatics/btab501},
note = {btab501},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btab501/39070854/btab501.pdf},
}