Release v1.1.0 · matthewwardrop/formulaic

This is a major feature release that was motivated in many aspects by the migration of statstmodels from patsy to formulaic. Many thanks to @bashtage for driving those invasive changes forward. There are some semantic breaking changes, but unless you are deep in the internals of formulaic (which I do not believe to be the case for any external library) these are not expected to break common usage.

Breaking changes:

Formula is no longer always "structured" with special cases to handle the
case where it has no structure. Legacy shims have been added to support old
patterns, with DeprecationWarnings raised when they are used. It is not
expected to break anyone not explicitly checking whether the Formula.root is
a list instance (which formerly should have been simply assumed) [it is a now
SimpleFormula instance that acts like an ordered sequence of Term
instances].
The column names associated with categorical factors has changed. Previously,
a prefix was unconditionally added to the level in the column name like
feature[T.A], whether nor not the encoding will result in that term acting
as a contrast. Now, in keeping with patsy, we only add the prefix if the
categorical factor is encoded with reduced rank. Otherwise, feature[A] will
be used instead.
formulaic.parsers.types.structured has been promoted to
formulaic.utils.structured.

New features and enhancements:

Formula now instantiates to SimpleFormula or StructuredFormula, the
latter being a tree-structure of SimpleFormula instances (as compared to
List[Term]) previously. This simplifies various internal logic and makes the
propagation of formula metadata more explicit. (#222)
Added support for restricting the set of features used by the default formula
parser so that libraries can more easily restrict the structure of output
formulae. (#207)
dict and recarray types are no associated with the pandas materializer
by default (rather than raising), simplifying some user workflows. (#225)
Added support for the . operator (which is replaced with all variables not
used on the left-hand-side of formulae). (#216)
Added experimental support for nested formulae of form [ ... ~ ... ].
This is useful for (e.g.) generating formulae for IV 2SLS. (#108)
Add support for subsettings ModelSpec[s] based on an arbitrary
strictly reduced FormulaSpec. (#208)
Added Formula.required_variables to more easily surface the expected data
requirements of the formula. (#205)
Added support for extracting rows dropped during materialization. (#197)
Added cubic spline support for cyclic (cc) and natural (cr). See
formulaic.materializers.transforms.cubic_spline.cubic_spline for
more details.
Added a lag() transform.
Constructing LinearConstraints can now be done from a list of strings (for
increased parity with patsy). (#201)
Categorical factors are now preceded with (e.g.) T. when they actully
describe contrasts (i.e. when they are encoded with reduced rank). (#220)
Contrasts metadata is now added to the encoder state via encode_categorical;
which is surfaced via ModelSpec.factor_contrasts. (#204)
Operator instances now received context which is optionally specified by
the user during formula parsing, and updated by the parser. This is what makes
the . implementation possible. (#216)
Given the generic usefulness of Structured, it has been promoted to
formulaic.utils. (#223)
Added explicit support and testing for Python 3.13. (#202)

Bugfixes and cleanups:

Fixed nested ordering of Formula instance. (#200)
Allow Python tokens to multiple chained parentheses and brackets without using
quotes as long as the parentheses are balanced. (#214, #218)
Reduced the number of redundant initialisation operations in Structured
instances. (#200)
Fixed pickling ModelMatrix and FactorValues instances (whenever wrapped
objects are picklable). (#209; thanks @bashtage)
basis_spline: Fixed evaluation involving datasets with null values, and
disallow out-of-bounds knots. (#217; thanks @bashtage)
Improved robustness of data contexts involving PyArrow datasets.
We now use the same sentiles throughout the code-base, rather than having
module specific sentinels in some places.
Migrated to ruff for linting, and updated mypy and pre-commit tooling.
Automatic fixes from ruff are automatically applied when using
hatch run lint:format.

Documentation:

Fixed and updated docsite build, as well as other minor tweaks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0

Contributors