Skip to content

Releases: matthewwardrop/formulaic

v0.6.1

22 Jun 19:29
a96ae8a
Compare
Choose a tag to compare

This is a minor release with one new feature.

New features and enhancements:

  • Added support for treating individual categorical features as though they do not span the intercept (useful for intentionally generating over-specified model matrices in e.g. regularized models).

v0.6.0

27 Apr 03:55
Compare
Choose a tag to compare

This is a major release with some important consistency and completeness
improvements. It should be treated as almost being the first release candidate
of 1.0.0, which will land after some small amount of further feature extensions
and documentation improvements.

Breaking changes:

Although there are some internal changes to API, as documented below, there are
no breaking changes to user-facing APIs.

New features and enhancements:

  • Formula terms are now consistently ordered regardless of providence (formulae or
    manual term specification), and sorted according to R conventions by default
    rather than lexically. This can be changed using the _ordering keyword to
    the Formula constructor.
  • Greater compatibility with R and patsy formulae:
    • for patsy: added standardize, Q and treatment contrasts shims.
    • for patsy: added cluster_by='numerical_factors option to ModelSpec to enable
      patsy style clustering of output columns by involved numerical factors.
    • for R: added support for exponentiation with ^ and %in%.
  • Diff and Helmert contrast codings gained support for additional variants.
  • Greatly improved the performance of generating sparse dummy encodings when
    there are many categories. #110 #112 (thanks @dbalabka)
  • Context scoping operators (like paretheses) are now tokenized as their own special
    type.
  • Add support for merging Structured instances, and use this functionality during
    AST evaluation where relevant.
  • ModelSpec.term_indices is now a list rather than a tuple, to allow direct use when
    indexing pandas and numpy model matrices.
  • Add official support for Python 3.11.

Bugfixes and cleanups:

  • Fix parsing formulae starting with a parenthesis.
  • Fix iteration over root nodes of Structured instances for non-sequential iterable values.
  • Bump testing versions and fix poly unit tests.
  • Fix use of deprecated automatic casting of factors to numpy arrays during dense
    column evaluation in PandasMaterializer. #122 (thanks @effigies)
  • Factor.EvalMethod.UNKNOWN was removed, defaulting instead to LOOKUP.
  • Remove sympy version constraint now that a bug has been fixed upstream.

Documentation:

  • Substantial updates to documentation, which is now mostly complete for end-user
    use-cases. Developer and API docs are still pending.

v0.5.2

18 Sep 03:23
14487a1
Compare
Choose a tag to compare

This is a minor patch releases that fixes one bug.

Bugfixes and cleanups:

  • Fixed alignment between the length of a Structured instance and iteration
    over this instance (including Formula instances). Formerly the length would
    only count the number of keys in its structure, rather than the number of
    objects that would be yielded during iteration.

v0.5.1

10 Sep 03:37
eb5b6ce
Compare
Choose a tag to compare

This is a minor patch release that fixes two bugs.

Bugfixes and cleanups:

  • Fixed generation of string representation of Formula objects.
  • Fixed generation of formulaic.__version__ during package build.

v0.5.0

29 Aug 05:26
f1b671f
Compare
Choose a tag to compare

This is a major new release with some minor API changes, some ergonomic
improvements, and a few bug fixes.

Breaking changes:

  • Accessing named substructures of Formula objects (e.g. formula.lhs) no
    longer returns a list of terms; but rather a Formula object, so that the
    helper methods can remain accessible. You can access the raw terms by
    iterating over the formula (list(formula)) or looking up the root node
    (formula.root).

New features and improvements:

  • The ModelSpec object is now the source of truth in all ModelMatrix
    generations, and can be constructed directly from any supported specification
    using ModelSpec.from_spec(...). Supported specifications include formula
    strings, parsed formulae, model matrices and prior model specs.
  • The .get_model_matrix() helper methods across Formula,
    FormulaMaterializer, ModelSpec and model_matrix objects/helpers
    functions are now consistent, and all use ModelSpec directly under the hood.
  • When accessing substructures of Formula objects (e.g. formula.lhs), the
    term lists will be wrapped as trivial Formula instances rather than returned
    as raw lists (so that the helper methods like .get_model_matrix() can still
    be used).
  • FormulaSpec is now exported from the top-level module.

Bugfixes and cleanups:

  • Fixed ModelSpec specifications being overriden by default arguments to
    FormulaMaterializer.get_model_matrix.
  • Structured._flatten() now correctly flattens unnamed substructures.

v0.4.0

10 Aug 20:33
705d186
Compare
Choose a tag to compare

This is a major new release with some new features, greatly improved ergonomics
for structured formulae, matrices and specs, and a few small breaking changes
(most with backward compatibility shims). All users are encouraged to upgrade.

Breaking changes:

  • include_intercept is no longer an argument to FormulaParser.get_terms;
    and is instead an argument of the DefaultFormulaParser constructor. If you
    want to modify the include_intercept behaviour, please use:
    Formula("y ~ x", _parser=DefaultFormulaParser(include_intercept=False))
  • Accessing terms via Formula.terms is deprecated since Formula became a
    subclass of Structured[List[Terms]]. You can directly iterate over, and/or
    access nested structure on the Formula instance itself. Formula.terms
    has a deprecated property which will return a reference to itself in order to
    support legacy use-cases. This will be removed in 1.0.0.
  • ModelSpec.feature_names and ModelSpec.feature_columns are deprecated in
    favour of ModelSpec.column_names and ModelSpec.column_indices. Deprecated
    properties remain in-place to support legacy use-cases. These will be removed
    in 1.0.0.

New features and enhancements:

  • Structured formulae (and their derived matrices and specs) are now mutable.
    Internally Formula has been refactored as a subclass of
    Structured[List[Terms]], and can be incrementally built and modified. The
    matrix and spec outputs now have explicit subclasses of Structured
    (ModelMatrices and ModelSpecs respectively) to expose convenience methods
    that allow these objects to be largely used interchangeably with their
    singular counterparts.
  • ModelMatrices and ModelSpecs arenow surfaced as top-level exports of the
    formulaic module.
  • Structured (and its subclasses) gained improved integration of nested tuple
    structure, as well as support for flattened iteration, explicit mapping
    output types, and lots of cleanups.
  • ModelSpec was made into a dataclass, and gained several new
    properties/methods to support better introspection and mutation of the model
    spec.
  • FormulaParser was renamed DefaultFormulaParser, and made a subclass of the
    new formula parser interface FormulaParser. In this process
    include_intercept was removed from the API, and made an instance attribute
    of the default parser implementation.

Bugfixes and cleanups:

  • Fixed AST evaluation for large formulae that caused the evaluation to hit the
    recursion limit.
  • Fixed sparse categorical encoding when the dataframe index is not the standard
    range index.
  • Fixed a bug in the linear constraints parser when more than two constraints
    were specified in a comma-separated string.
  • Avoid implicit changing of the sparsity structure of CSC matrices.
  • If manually constructed ModelSpecs are provided by the user during
    materialization, they are updated to reflect the output-type chosen by the
    user, as well as whether to ensure full rank/etc.
  • Allowed use of older pandas versions. All versions >=1.0.0 are now supported.
  • Various linting cleanups as pylint was added to the CI testing.

Documentation:

  • Apart from the .materializer submodule, most code now has inline
    documentation and annotations.

v0.3.4

01 May 04:10
91a2d6a
Compare
Choose a tag to compare

This is a backward compatible major release that adds several new features.

New features and enhancements:

  • Added support for customizing the contrasts generated for categorical
    features, including treatment, sum, deviation, helmert and custom contrasts.
  • Added support for the generation of linear constraints for ModelMatrix
    instances (see ModelMatrix.model_spec.get_linear_constraints).
  • Added support for passing ModelMatrix, ModelSpec and other formula-like
    objects to the model_matrix sugar method so that pre-processed formulae can
    be used.
  • Improved the way tokens are manipulated for the right-hand-side intercept and
    substitutions of 0 with -1 to avoid substitutions in quoted contexts.

Bugfixes and cleanups:

  • Fixed variable sanitization during evaluation, allowing variables with
    special characters to be used in Python transforms; for example:
    bs(`my|feature%is^cool`).
  • Fixed the parsing of dictionaries and sets within python expressions in the
    formula; for example: C(x, {"a": [1,2,3]}).
  • Bumped requirement on astor to >=0.8 to fix issues with ast-generation in
    Python 3.8+ when numerical constants are present in the parsed python
    expression (e.g. "bs(x, df=10)").

v0.3.3

26 Apr 10:22
2c64777
Compare
Choose a tag to compare

This is a minor patch release that migrates the package tooling to poetry; solving a version inconsistency when packaging for conda.

v0.3.2

16 Mar 22:06
Compare
Choose a tag to compare

This is a minor patch release that fixes an attempt to import numpy.typing when numpy is not version 1.20 or later. (thanks for noticing this and fixing it @bashtage ).

v0.3.1

15 Mar 09:28
Compare
Choose a tag to compare

This is a minor patch release that fixes the maintaining of output types, NA-handling, and assurance of full-rank for factors that evaluate to pre-encoded columns when constructing a model matrix from a pre-defined ModelSpec. The benchmarks were also updated.