Releases: matthewwardrop/formulaic
v0.6.1
This is a minor release with one new feature.
New features and enhancements:
- Added support for treating individual categorical features as though they do not span the intercept (useful for intentionally generating over-specified model matrices in e.g. regularized models).
v0.6.0
This is a major release with some important consistency and completeness
improvements. It should be treated as almost being the first release candidate
of 1.0.0, which will land after some small amount of further feature extensions
and documentation improvements.
Breaking changes:
Although there are some internal changes to API, as documented below, there are
no breaking changes to user-facing APIs.
New features and enhancements:
- Formula terms are now consistently ordered regardless of providence (formulae or
manual term specification), and sorted according to R conventions by default
rather than lexically. This can be changed using the_ordering
keyword to
theFormula
constructor. - Greater compatibility with R and patsy formulae:
- for patsy: added
standardize
,Q
and treatment contrasts shims. - for patsy: added
cluster_by='numerical_factors
option toModelSpec
to enable
patsy style clustering of output columns by involved numerical factors. - for R: added support for exponentiation with
^
and%in%
.
- for patsy: added
- Diff and Helmert contrast codings gained support for additional variants.
- Greatly improved the performance of generating sparse dummy encodings when
there are many categories. #110 #112 (thanks @dbalabka) - Context scoping operators (like paretheses) are now tokenized as their own special
type. - Add support for merging
Structured
instances, and use this functionality during
AST evaluation where relevant. ModelSpec.term_indices
is now a list rather than a tuple, to allow direct use when
indexing pandas and numpy model matrices.- Add official support for Python 3.11.
Bugfixes and cleanups:
- Fix parsing formulae starting with a parenthesis.
- Fix iteration over root nodes of
Structured
instances for non-sequential iterable values. - Bump testing versions and fix
poly
unit tests. - Fix use of deprecated automatic casting of factors to numpy arrays during dense
column evaluation inPandasMaterializer
. #122 (thanks @effigies) Factor.EvalMethod.UNKNOWN
was removed, defaulting instead toLOOKUP
.- Remove
sympy
version constraint now that a bug has been fixed upstream.
Documentation:
- Substantial updates to documentation, which is now mostly complete for end-user
use-cases. Developer and API docs are still pending.
v0.5.2
This is a minor patch releases that fixes one bug.
Bugfixes and cleanups:
- Fixed alignment between the length of a
Structured
instance and iteration
over this instance (includingFormula
instances). Formerly the length would
only count the number of keys in its structure, rather than the number of
objects that would be yielded during iteration.
v0.5.1
This is a minor patch release that fixes two bugs.
Bugfixes and cleanups:
- Fixed generation of string representation of
Formula
objects. - Fixed generation of
formulaic.__version__
during package build.
v0.5.0
This is a major new release with some minor API changes, some ergonomic
improvements, and a few bug fixes.
Breaking changes:
- Accessing named substructures of
Formula
objects (e.g.formula.lhs
) no
longer returns a list of terms; but rather aFormula
object, so that the
helper methods can remain accessible. You can access the raw terms by
iterating over the formula (list(formula)
) or looking up the root node
(formula.root
).
New features and improvements:
- The
ModelSpec
object is now the source of truth in allModelMatrix
generations, and can be constructed directly from any supported specification
usingModelSpec.from_spec(...)
. Supported specifications include formula
strings, parsed formulae, model matrices and prior model specs. - The
.get_model_matrix()
helper methods acrossFormula
,
FormulaMaterializer
,ModelSpec
andmodel_matrix
objects/helpers
functions are now consistent, and all useModelSpec
directly under the hood. - When accessing substructures of
Formula
objects (e.g.formula.lhs
), the
term lists will be wrapped as trivialFormula
instances rather than returned
as raw lists (so that the helper methods like.get_model_matrix()
can still
be used). FormulaSpec
is now exported from the top-level module.
Bugfixes and cleanups:
- Fixed
ModelSpec
specifications being overriden by default arguments to
FormulaMaterializer.get_model_matrix
. Structured._flatten()
now correctly flattens unnamed substructures.
v0.4.0
This is a major new release with some new features, greatly improved ergonomics
for structured formulae, matrices and specs, and a few small breaking changes
(most with backward compatibility shims). All users are encouraged to upgrade.
Breaking changes:
include_intercept
is no longer an argument toFormulaParser.get_terms
;
and is instead an argument of theDefaultFormulaParser
constructor. If you
want to modify theinclude_intercept
behaviour, please use:Formula("y ~ x", _parser=DefaultFormulaParser(include_intercept=False))
- Accessing terms via
Formula.terms
is deprecated sinceFormula
became a
subclass ofStructured[List[Terms]]
. You can directly iterate over, and/or
access nested structure on theFormula
instance itself.Formula.terms
has a deprecated property which will return a reference to itself in order to
support legacy use-cases. This will be removed in 1.0.0. ModelSpec.feature_names
andModelSpec.feature_columns
are deprecated in
favour ofModelSpec.column_names
andModelSpec.column_indices
. Deprecated
properties remain in-place to support legacy use-cases. These will be removed
in 1.0.0.
New features and enhancements:
- Structured formulae (and their derived matrices and specs) are now mutable.
InternallyFormula
has been refactored as a subclass of
Structured[List[Terms]]
, and can be incrementally built and modified. The
matrix and spec outputs now have explicit subclasses ofStructured
(ModelMatrices
andModelSpecs
respectively) to expose convenience methods
that allow these objects to be largely used interchangeably with their
singular counterparts. ModelMatrices
andModelSpecs
arenow surfaced as top-level exports of the
formulaic
module.Structured
(and its subclasses) gained improved integration of nested tuple
structure, as well as support for flattened iteration, explicit mapping
output types, and lots of cleanups.ModelSpec
was made into a dataclass, and gained several new
properties/methods to support better introspection and mutation of the model
spec.FormulaParser
was renamedDefaultFormulaParser
, and made a subclass of the
new formula parser interfaceFormulaParser
. In this process
include_intercept
was removed from the API, and made an instance attribute
of the default parser implementation.
Bugfixes and cleanups:
- Fixed AST evaluation for large formulae that caused the evaluation to hit the
recursion limit. - Fixed sparse categorical encoding when the dataframe index is not the standard
range index. - Fixed a bug in the linear constraints parser when more than two constraints
were specified in a comma-separated string. - Avoid implicit changing of the sparsity structure of CSC matrices.
- If manually constructed
ModelSpec
s are provided by the user during
materialization, they are updated to reflect the output-type chosen by the
user, as well as whether to ensure full rank/etc. - Allowed use of older pandas versions. All versions >=1.0.0 are now supported.
- Various linting cleanups as
pylint
was added to the CI testing.
Documentation:
- Apart from the
.materializer
submodule, most code now has inline
documentation and annotations.
v0.3.4
This is a backward compatible major release that adds several new features.
New features and enhancements:
- Added support for customizing the contrasts generated for categorical
features, including treatment, sum, deviation, helmert and custom contrasts. - Added support for the generation of linear constraints for
ModelMatrix
instances (seeModelMatrix.model_spec.get_linear_constraints
). - Added support for passing
ModelMatrix
,ModelSpec
and other formula-like
objects to themodel_matrix
sugar method so that pre-processed formulae can
be used. - Improved the way tokens are manipulated for the right-hand-side intercept and
substitutions of0
with-1
to avoid substitutions in quoted contexts.
Bugfixes and cleanups:
- Fixed variable sanitization during evaluation, allowing variables with
special characters to be used in Python transforms; for example:
bs(`my|feature%is^cool`)
. - Fixed the parsing of dictionaries and sets within python expressions in the
formula; for example:C(x, {"a": [1,2,3]})
. - Bumped requirement on
astor
to >=0.8 to fix issues with ast-generation in
Python 3.8+ when numerical constants are present in the parsed python
expression (e.g. "bs(x, df=10)").
v0.3.3
v0.3.2
v0.3.1
This is a minor patch release that fixes the maintaining of output types, NA-handling, and assurance of full-rank for factors that evaluate to pre-encoded columns when constructing a model matrix from a pre-defined ModelSpec
. The benchmarks were also updated.