Releases · matthewwardrop/formulaic

22 Jun 19:29

matthewwardrop

v0.6.1

a96ae8a

v0.6.1

This is a minor release with one new feature.

New features and enhancements:

Added support for treating individual categorical features as though they do not span the intercept (useful for intentionally generating over-specified model matrices in e.g. regularized models).

Assets 2

27 Apr 03:55

matthewwardrop

v0.6.0

20a3760

v0.6.0

This is a major release with some important consistency and completeness
improvements. It should be treated as almost being the first release candidate
of 1.0.0, which will land after some small amount of further feature extensions
and documentation improvements.

Breaking changes:

Although there are some internal changes to API, as documented below, there are
no breaking changes to user-facing APIs.

New features and enhancements:

Formula terms are now consistently ordered regardless of providence (formulae or
manual term specification), and sorted according to R conventions by default
rather than lexically. This can be changed using the _ordering keyword to
the Formula constructor.
Greater compatibility with R and patsy formulae:
- for patsy: added standardize, Q and treatment contrasts shims.
- for patsy: added cluster_by='numerical_factors option to ModelSpec to enable
  patsy style clustering of output columns by involved numerical factors.
- for R: added support for exponentiation with ^ and %in%.
Diff and Helmert contrast codings gained support for additional variants.
Greatly improved the performance of generating sparse dummy encodings when
there are many categories. #110 #112 (thanks @dbalabka)
Context scoping operators (like paretheses) are now tokenized as their own special
type.
Add support for merging Structured instances, and use this functionality during
AST evaluation where relevant.
ModelSpec.term_indices is now a list rather than a tuple, to allow direct use when
indexing pandas and numpy model matrices.
Add official support for Python 3.11.

Bugfixes and cleanups:

Fix parsing formulae starting with a parenthesis.
Fix iteration over root nodes of Structured instances for non-sequential iterable values.
Bump testing versions and fix poly unit tests.
Fix use of deprecated automatic casting of factors to numpy arrays during dense
column evaluation in PandasMaterializer. #122 (thanks @effigies)
Factor.EvalMethod.UNKNOWN was removed, defaulting instead to LOOKUP.
Remove sympy version constraint now that a bug has been fixed upstream.

Documentation:

Substantial updates to documentation, which is now mostly complete for end-user
use-cases. Developer and API docs are still pending.

Contributors

effigies and dbalabka

Assets 2

18 Sep 03:23

matthewwardrop

v0.5.2

14487a1

v0.5.2

This is a minor patch releases that fixes one bug.

Bugfixes and cleanups:

Fixed alignment between the length of a Structured instance and iteration
over this instance (including Formula instances). Formerly the length would
only count the number of keys in its structure, rather than the number of
objects that would be yielded during iteration.

Assets 2

10 Sep 03:37

matthewwardrop

v0.5.1

eb5b6ce

v0.5.1

This is a minor patch release that fixes two bugs.

Bugfixes and cleanups:

Fixed generation of string representation of Formula objects.
Fixed generation of formulaic.__version__ during package build.

Assets 2

29 Aug 05:26

matthewwardrop

v0.5.0

f1b671f

v0.5.0

This is a major new release with some minor API changes, some ergonomic
improvements, and a few bug fixes.

Breaking changes:

Accessing named substructures of Formula objects (e.g. formula.lhs) no
longer returns a list of terms; but rather a Formula object, so that the
helper methods can remain accessible. You can access the raw terms by
iterating over the formula (list(formula)) or looking up the root node
(formula.root).

New features and improvements:

The ModelSpec object is now the source of truth in all ModelMatrix
generations, and can be constructed directly from any supported specification
using ModelSpec.from_spec(...). Supported specifications include formula
strings, parsed formulae, model matrices and prior model specs.
The .get_model_matrix() helper methods across Formula,
FormulaMaterializer, ModelSpec and model_matrix objects/helpers
functions are now consistent, and all use ModelSpec directly under the hood.
When accessing substructures of Formula objects (e.g. formula.lhs), the
term lists will be wrapped as trivial Formula instances rather than returned
as raw lists (so that the helper methods like .get_model_matrix() can still
be used).
FormulaSpec is now exported from the top-level module.

Bugfixes and cleanups:

Fixed ModelSpec specifications being overriden by default arguments to
FormulaMaterializer.get_model_matrix.
Structured._flatten() now correctly flattens unnamed substructures.

Assets 2

10 Aug 20:33

matthewwardrop

v0.4.0

705d186

v0.4.0

This is a major new release with some new features, greatly improved ergonomics
for structured formulae, matrices and specs, and a few small breaking changes
(most with backward compatibility shims). All users are encouraged to upgrade.

Breaking changes:

include_intercept is no longer an argument to FormulaParser.get_terms;
and is instead an argument of the DefaultFormulaParser constructor. If you
want to modify the include_intercept behaviour, please use:
```
Formula("y ~ x", _parser=DefaultFormulaParser(include_intercept=False))
```
Accessing terms via Formula.terms is deprecated since Formula became a
subclass of Structured[List[Terms]]. You can directly iterate over, and/or
access nested structure on the Formula instance itself. Formula.terms
has a deprecated property which will return a reference to itself in order to
support legacy use-cases. This will be removed in 1.0.0.
ModelSpec.feature_names and ModelSpec.feature_columns are deprecated in
favour of ModelSpec.column_names and ModelSpec.column_indices. Deprecated
properties remain in-place to support legacy use-cases. These will be removed
in 1.0.0.

New features and enhancements:

Structured formulae (and their derived matrices and specs) are now mutable.
Internally Formula has been refactored as a subclass of
Structured[List[Terms]], and can be incrementally built and modified. The
matrix and spec outputs now have explicit subclasses of Structured
(ModelMatrices and ModelSpecs respectively) to expose convenience methods
that allow these objects to be largely used interchangeably with their
singular counterparts.
ModelMatrices and ModelSpecs arenow surfaced as top-level exports of the
formulaic module.
Structured (and its subclasses) gained improved integration of nested tuple
structure, as well as support for flattened iteration, explicit mapping
output types, and lots of cleanups.
ModelSpec was made into a dataclass, and gained several new
properties/methods to support better introspection and mutation of the model
spec.
FormulaParser was renamed DefaultFormulaParser, and made a subclass of the
new formula parser interface FormulaParser. In this process
include_intercept was removed from the API, and made an instance attribute
of the default parser implementation.

Bugfixes and cleanups:

Fixed AST evaluation for large formulae that caused the evaluation to hit the
recursion limit.
Fixed sparse categorical encoding when the dataframe index is not the standard
range index.
Fixed a bug in the linear constraints parser when more than two constraints
were specified in a comma-separated string.
Avoid implicit changing of the sparsity structure of CSC matrices.
If manually constructed ModelSpecs are provided by the user during
materialization, they are updated to reflect the output-type chosen by the
user, as well as whether to ensure full rank/etc.
Allowed use of older pandas versions. All versions >=1.0.0 are now supported.
Various linting cleanups as pylint was added to the CI testing.

Documentation:

Apart from the .materializer submodule, most code now has inline
documentation and annotations.

Assets 2

01 May 04:10

matthewwardrop

v0.3.4

91a2d6a

v0.3.4

This is a backward compatible major release that adds several new features.

New features and enhancements:

Added support for customizing the contrasts generated for categorical
features, including treatment, sum, deviation, helmert and custom contrasts.
Added support for the generation of linear constraints for ModelMatrix
instances (see ModelMatrix.model_spec.get_linear_constraints).
Added support for passing ModelMatrix, ModelSpec and other formula-like
objects to the model_matrix sugar method so that pre-processed formulae can
be used.
Improved the way tokens are manipulated for the right-hand-side intercept and
substitutions of 0 with -1 to avoid substitutions in quoted contexts.

Bugfixes and cleanups:

Fixed variable sanitization during evaluation, allowing variables with
special characters to be used in Python transforms; for example:
bs(`my|feature%is^cool`).
Fixed the parsing of dictionaries and sets within python expressions in the
formula; for example: C(x, {"a": [1,2,3]}).
Bumped requirement on astor to >=0.8 to fix issues with ast-generation in
Python 3.8+ when numerical constants are present in the parsed python
expression (e.g. "bs(x, df=10)").

Assets 2

26 Apr 10:22

matthewwardrop

v0.3.3

2c64777

v0.3.3

This is a minor patch release that migrates the package tooling to poetry; solving a version inconsistency when packaging for conda.

Assets 2

16 Mar 22:06

matthewwardrop

v0.3.2

ef38012

v0.3.2

This is a minor patch release that fixes an attempt to import numpy.typing when numpy is not version 1.20 or later. (thanks for noticing this and fixing it @bashtage ).

Contributors

bashtage

Assets 2

15 Mar 09:28

matthewwardrop

v0.3.1

de0d442

v0.3.1

This is a minor patch release that fixes the maintaining of output types, NA-handling, and assurance of full-rank for factors that evaluate to pre-encoded columns when constructing a model matrix from a pre-defined ModelSpec. The benchmarks were also updated.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Contributors

Releases: matthewwardrop/formulaic

v0.6.1

v0.6.0

Contributors

v0.5.2

v0.5.1

v0.5.0

v0.4.0

v0.3.4

v0.3.3

v0.3.2

Contributors

v0.3.1