v1.1.0
This is a major feature release that was motivated in many aspects by the migration of statstmodels
from patsy
to formulaic
. Many thanks to @bashtage for driving those invasive changes forward. There are some semantic breaking changes, but unless you are deep in the internals of formulaic
(which I do not believe to be the case for any external library) these are not expected to break common usage.
Breaking changes:
Formula
is no longer always "structured" with special cases to handle the
case where it has no structure. Legacy shims have been added to support old
patterns, withDeprecationWarning
s raised when they are used. It is not
expected to break anyone not explicitly checking whether theFormula.root
is
a list instance (which formerly should have been simply assumed) [it is a now
SimpleFormula
instance that acts like an ordered sequence ofTerm
instances].- The column names associated with categorical factors has changed. Previously,
a prefix was unconditionally added to the level in the column name like
feature[T.A]
, whether nor not the encoding will result in that term acting
as a contrast. Now, in keeping withpatsy
, we only add the prefix if the
categorical factor is encoded with reduced rank. Otherwise,feature[A]
will
be used instead. formulaic.parsers.types.structured
has been promoted to
formulaic.utils.structured
.
New features and enhancements:
Formula
now instantiates toSimpleFormula
orStructuredFormula
, the
latter being a tree-structure ofSimpleFormula
instances (as compared to
List[Term]
) previously. This simplifies various internal logic and makes the
propagation of formula metadata more explicit. (#222)- Added support for restricting the set of features used by the default formula
parser so that libraries can more easily restrict the structure of output
formulae. (#207) dict
andrecarray
types are no associated with thepandas
materializer
by default (rather than raising), simplifying some user workflows. (#225)- Added support for the
.
operator (which is replaced with all variables not
used on the left-hand-side of formulae). (#216) - Added experimental support for nested formulae of form
[ ... ~ ... ]
.
This is useful for (e.g.) generating formulae for IV 2SLS. (#108) - Add support for subsettings
ModelSpec[s]
based on an arbitrary
strictly reducedFormulaSpec
. (#208) - Added
Formula.required_variables
to more easily surface the expected data
requirements of the formula. (#205) - Added support for extracting rows dropped during materialization. (#197)
- Added cubic spline support for cyclic (
cc
) and natural (cr
). See
formulaic.materializers.transforms.cubic_spline.cubic_spline
for
more details. - Added a
lag()
transform. - Constructing
LinearConstraints
can now be done from a list of strings (for
increased parity withpatsy
). (#201) - Categorical factors are now preceded with (e.g.)
T.
when they actully
describe contrasts (i.e. when they are encoded with reduced rank). (#220) - Contrasts metadata is now added to the encoder state via
encode_categorical
;
which is surfaced viaModelSpec.factor_contrasts
. (#204) Operator
instances now receivedcontext
which is optionally specified by
the user during formula parsing, and updated by the parser. This is what makes
the.
implementation possible. (#216)- Given the generic usefulness of
Structured
, it has been promoted to
formulaic.utils
. (#223) - Added explicit support and testing for Python 3.13. (#202)
Bugfixes and cleanups:
- Fixed nested ordering of
Formula
instance. (#200) - Allow Python tokens to multiple chained parentheses and brackets without using
quotes as long as the parentheses are balanced. (#214, #218) - Reduced the number of redundant initialisation operations in
Structured
instances. (#200) - Fixed pickling
ModelMatrix
andFactorValues
instances (whenever wrapped
objects are picklable). (#209; thanks @bashtage) basis_spline
: Fixed evaluation involving datasets with null values, and
disallow out-of-bounds knots. (#217; thanks @bashtage)- Improved robustness of data contexts involving PyArrow datasets.
- We now use the same sentiles throughout the code-base, rather than having
module specific sentinels in some places. - Migrated to
ruff
for linting, and updatedmypy
andpre-commit
tooling. - Automatic fixes from
ruff
are automatically applied when using
hatch run lint:format
.
Documentation:
- Fixed and updated docsite build, as well as other minor tweaks.