v4.3.0 (2024-06-08)
- model: Add check for fitted model in LGBMModel fingerprint. (
f6a0933
)
- tuning: Optional
enqueue_trials
parameter added to fingerprint ofOptunaTuner
. (80fa374
) - transformer: Update
LabelEncoder
to usePyArrow
implementation ofunique
to preventvaex
bug from crashing the transformer. (85059d7
)
v4.2.0 (2024-05-21)
- transformer: Update
ExpressionTransformer
to useTypedDict
instead of tuples. (3950abd
)
v4.1.0 (2024-05-18)
- tuning: Add support for enqueuing trials in
OptunaTuner
. (9e0b6b2
) - data splitting: Add support for stratification on multiple features in the
RandomSplitter
. (d745434
) - transformer: Add
metadata
option for theExpressionTransformer
that allows for creation of meta features not tracked in theDataSchema
. (f16ea8b
) - transformer: Add
ExpressionTransformer
for creating features using thevaex
expression system. (c0faf74
)
v4.0.0 (2024-05-09)
- exporter: Add
S3Exporter
that implements cached S3 exporting of files from the local disk. (d17b2d2
) - exporter: Add
BaseExporter
andLocalExporter
implementations that support exporting data to disk, along with correspondingPipeline
steps. (6ce13cf
)
- exporter: Add
LocalManifest
support forLocalExporter
which simplifies caching logic and enables S3 manifest translations. (2199ff0
) - exporter: Add support for multiple data export using
LocalExporter
. (ff988b6
) - data source: Add support for reading manifest files from S3 buckets in
S3Ingester
. (9c68a9b
) - pipeline: Add
disable_cache
parameter toPipeline
execution. (da1e31a
)
- data cleaning: Fix newline characters breaking CSV reading using Arrow. (
3a7e594
) - tuning: Delete logging of storage URI to minimize risk of accidentally logging credentials. (
054692d
)
- data source: Extract shared S3 logic to
utils
which can be then used byS3Exporter
. (97a7974
)
v3.2.0 (2024-04-18)
- tuning: Add support for
RDSStorage
using theOptunaTuner
(cc06ddd
)
- data source: Fix bug where
dataset_id
consisting of path components would break local metadata file creation (17c4866
) - model: Add
verbosity
parameter toBaseModel
to set log level in the base class. (0a3828f
)
v3.1.0 (2024-04-12)
- model: Add optional memoization to datasets during model training. (#209) (
2ca4465
) - model: Add optional memoization to datasets during model training. (
6a955dc
)
v3.0.0 (2024-04-05)
- model: Update
LGBMModel
to use dependency injection, now expects alightgbm.LGBMModel
as argument. (7250f34
)
- Switch
vaex
file format toArrow
instead ofHDF5
for better type support. (ac8e500
) - data cleaning: Fix bug where boolean columns are stored as numerical in the data schema due to
int8
conversion. (da358d8
)
v2.2.0 (2024-03-22)
- filter: Add
ImblearnResamplingFilter
which is a wrapper forimblearn
over- and under-samplers. (77a3d7d
) - filter: Add
ExpressionFilter
and base class for simple DataFrame filtering usingvaex
expressions. (dc679ff
) - cache: Add
disable_cache
argument to all cached functions to completely bypass all caching functionality. (fbdfc5d
)
- Update
CHANGELOG.md
format to include missing categories. (d97b32c
)
v2.1.0 (2024-02-24)
- Update Titanic dataset to
mleko
2.0 API. (62bf991
) - tuning: Add
optuna-dashboard
support toOptunaTuner
including automatically generated experiment notes. (29d81c2
) - transformer: Improve flexibility of
LabelEncoderTransformer
by adding optional null encoding and manual dictionary mapping. (f7b30a9
) - Set
cache_directory
as optional argument, with custom default locations. (08e8777
)
- data cleaning: Fix
meta_columns
not being forcefully cast to correct data type inCSVToVaexConverter
. (b42b9ed
)
- Update year in Copyright in README.md (#192) (
eeb56e1
)
- Fix test cases generating cache directory outside temporary directory. (
ba57fbf
)
v2.0.0 (2024-02-07)
- pipeline: Refactor
PipelineStep
to useTypedDict
for both inputs and outputs. (2eb623c
)
- model: Refactor validation_dataframe parameter in BaseModel and LGBMModel to be optional. (
d18ed29
) - cache: Add cache support for
None
returns on fields using cache handlers not equipped to process None. (a489996
) - model: Add support for custom evaluation function in LGBMModel. (
4e70a55
)
- data cleaning: Rename empty column name to
_empty
to preventvaex
crashes. (da72b75
) - data cleaning: Cast boolean columns to
int8
during cleaning to reduce label encoding needs. (d94f7c9
) - Added reserved keyword column name replacement to prevent evaluation errors from
vaex
. (3969ffd
)
- Improve error logging messages, and update codebase to new
black
format. (a29ad45
) - cache: Break out cache handler retrieval method. (
aba9e41
)
- Refactor mleko package documentation to format bullet list correctly. (
76ee895
)
- Remove TypeGuard and PyUpgrade from build and pre-commit. (
d374406
) - Add custom template for release notes to follow changelog structure. (
30518c0
)
v1.2.6 (2024-01-25)
- Bump patch release. (
ff5f94e
)
v1.2.5 (2024-01-25)
- Fix
CHANGELOG.md
template location (141c9b7
)
v1.2.4 (2024-01-25)
- Trigger patch release. (
7269dca
)
- semantic versioning: Update
CHANGELOG.md
template and semantic versioning logic. (1727e09
)
v1.2.3 (2024-01-25)
- Remove coverage from workflow (
09eb09d
)
v1.2.2 (2024-01-25)
- Switch to trusted publishing (
e84712d
)
v1.2.1 (2024-01-25)
- Experiment with semantic versioning (
0942196
)
v1.2.0 (2023-10-09)
- data source: ✨ Add support for pattern matching in
*Ingester
and addLocalManifest
to index fetched files. (75974a4
)
- logging: 🐛 Fix LGBM logging routing to correct log level. (
0e5fa77
)
- remove unnecessary blank lines (
a06edf2
) - ✏️ Improve logging of
CSVToVaexConverter
and fix typo inwrite_vaex_dataframe
. (197e56a
)
- 🔒️ Bump
gitpython
to resolve CVE-2023-41040 and CVE-2023-40590. (79627bd
)
v1.1.0 (2023-09-27)
- tuning: ✨ Add hyperparameter tuning functionality, initially including
OptunaTuner
. (be38c07
)
- tuning: 🧪 Add test cases for
TuneStep
. (d811c7d
)
v1.0.0 (2023-09-20)
- 📝 Improve
README.md
with more up to date information. (b388b59
)
- transformer: ✨ Add
DataSchema
API to transformersfit
,transform
andfit_transform
. (e053c85
)
- 📝 Add example notebook for
Titanic
dataset. (e651af9
)
v0.8.1 (2023-09-07)
- config: 🐛 Fix readthedocs build to only generate html. (
13fc207
)
v0.8.0 (2023-09-06)
- model: ✨ Add
LGBMModel
along with base class which can be extended for all types of future models. (b47a241
) - ✨ Add
DataSchema
which tracks dataset features throughout the pipeline and methods. (e03bd2c
) - feature selection: ✨ Update
BaseFeatureSelector
and children to use thefit
,transform
andfit_transform
pattern. (62e4dd1
) - transformer: ✨ Add
fit
,transform
andfit_transform
to allTransformers
, along with API and caching simplificatons. (5cc4ebc
) - cache: ✨ Add
CacheHandler
which allows customization of read/write functions for each cached return value individually. (609e084
)
- feature selection: 🐛 Add
DataSchema
as partial return from allfit
methods in feature selectors. (ebf2484
)
- cache: 🚸 Replace
disable_cache
with a check ifcache_size=0
forLRUCacheMixin
. (cfd7592
)
v0.7.0 (2023-07-11)
- ✨ Add fit transform support to all
FeatureSelector
along with refactoring theLRUCacheMixin
. (3df0601
) - ✨ Add support for separate fitting and transforming inside the pipeline. (
bb9b7a4
)
- data cleaning: 🐛 Switched to HDF5 as file format for faster I/O and better SageMaker support. (
61f9e42
)
v0.6.1 (2023-06-30)
- data cleaning: 🐛 Fix date32/64[day] not converted to datetime. (
98f4b26
) - data source: 🐛 Fix bug where S3 buckets with no manifest caused crash. (
9078845
)
- config: 🔧 Switch mypy for pyright and update configuration. (
5631aed
)
v0.6.0 (2023-06-26)
- cache: ✨ Add cache_group that can segment an instance cache into different isolated parts. (#66) (
5fa8c9c
) - cache: ✨ Add cache_group that can segment an instance cache into different isolated parts. (
b5c3de5
)
v0.5.0 (2023-06-17)
- transformer: ✨ Add MinMaxScalerTransformer for normalizing numerical features. (
9b26c00
) - transformer: ✨ Add MaxAbsScalerTransformer that scales numerical features. (
1fd2a93
) - transformer: ✨ Add CompositeTransformer for chaining together multiple transformers sequentially. (
006d741
) - transformer: ✨ Add LabelEncoderTransformer for ordinal encoding. (
41a4c45
) - transformer: ✨ Add FrequencyEncoderTransformer along with support for pipeline. (
465e6db
)
- 💫 Switch to tqdm.auto to prevent breaking in Jupyter notebooks. (
dc139cf
)
- ✅ Now _get_local_filenames returns a sorted list of filenames to ensure stability. (
774e8eb
)
v0.4.2 (2023-06-11)
- ⚡️ Optimize VarianceFeatureSelector when threshold is 0. (
906dde3
)
- ➖ Remove pandas dependency. (
40e264c
)
- semantic versioning: 👷 Add more sections to changelog based on conventional commit categories. (
e5b1594
)
v0.4.1 (2023-06-04)
- feature selection: 🐛 Fix
FeatureSelector
cache to use tuple in… (#60) (758cf5e
) - feature selection: 🐛 Fix
FeatureSelector
cache to use tuple instead of frozenset to have stable fingerprint. (cd82417
)
v0.4.0 (2023-06-03)
- feature selection: ✨ Add that filters out invariant features. (
798c261
) - feature selection: ✨ Add
PearsonCorrelationFeatureSelector
which drops highly correlated features. (66e5cd2
) - feature selection: ✨ Add
CompositeFeatureSelector
, for chaining multiple feature selection steps on the same DataFrame. (3d75079
) - feature selection: ✨ Add standard deviation feature selector. (
c56177b
) - feature selection: ✨ Add missing rate feature selector. (
d5ba8b5
)
- 🐛 Fix typeguard breaking changes causing build to fail. (
66c6a8e
)
- 🔥 Unify dataset subpackage naming to verbs and modules to nouns. (
3ffb909
) - 🔥 Rename subpackages in dataset to singular variant. (
51a8297
) - 🔥 Refactor entire project to improve maintainability. (
dd1d22c
)
v0.3.1 (2023-05-21)
- 🐛 Added notes to pipeline step docstrings. (
d94f899
)
- data source: 🐛 Added note to the KaggleDataSource init docstring. (
d5f12d3
)
- 🚀 Removed semantic PR workflow and updated test workflow to not run on release commits. (
8138745
)
v0.3.0 (2023-05-21)
- new notes (#54) (
21239f7
)
- data splitting: 🐛 Added notes and examples to splitters docstrings. (
d162c86
) - pipeline: 🐛 Updated some docstrings. (
56b36fd
)
- 🚀 Updated release to only trigger if the commit message does not contain chore(release). (
c9f3f3f
)
v0.2.0 (2023-05-21)
- add data splitting step (#53) (
a668b1a
)
v0.1.3 (2023-05-13)
- cache: 🐛 Cache modules exposed in subpackage init. (
fd65e9d
)
v0.1.2 (2023-05-13)
- cache: 🐛 Fixed LRUCacheMixin eviction test case. (
ce5bfc1
) - 🐛 Temporarely disabled failing tests for cache. (
9c17960
)
- 📝 Fixed sphinx-autoapi build warnings. (
040963a
)
v0.1.0 (2023-05-12)
- data source: ✨ Add KaggleDataSource to download the dataset from Kaggle by providing a destination directory, owner slug, dataset slug, and necessary API credentials. (
3fa07b6
)
- cache: 🐛 Fixed test by not testing it... (
e3a0ce9
) - cache: 🐛 Try logging using assert to fix GH issue (
5e247ec
) - cache: 🐛 Attempting to fix test case failing in GH actions. (
4892591
) - cache: 🐛 LRUCacheMixin now relies on file modification time instead of access time due to system limitations. (
127d657
) - 🐛 Fixed docstrings for private methods in KaggleDataSource and removed xdoctest from build steps (
bb55cf5
)