Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.14.3
Fixed
pw.io.deltalake.read
andpw.io.deltalake.write
now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.
Added
- The Pathway CLI command
spawn
can now execute code directly from a specified GitHub repository. - A new CLI command,
spawn-from-env
, has been added. This command runs the Pathway CLIspawn
command using arguments provided in thePATHWAY_SPAWN_ARGS
environment variable.
v0.14.2
Fixed
- Switched
pw.xpacks.llm.embedders.GeminiEmbedder
to be sync to resolve compatibility issues with the Google Colab runs. - Pinned
surya-ocr
module version for stability.
v0.14.1
Added
pw.xpacks.llm.embedders.GeminiEmbedder
which is a wrapper for Google Gemini Embedding services.
v0.14.0
Fixed
pw.debug.table_to_pandas
now exportsint | None
columns correctly.
Changed
pw.io.airbyte.read
can now be used with Airbyte connectors implemented in Python without requiring Docker.- BREAKING: UDFs now verify the type of returned values at runtime. If it is possible to cast a returned value to a proper type, the values is cast. If the value does not match the expected type and can't be cast, an error is raised.
- BREAKING:
pw.reducers.ndarray
reducer requires input column to either have typefloat
,int
orArray
. pw.xpacks.llm.parsers.OpenParse
can now extract and parse images & diagrams from PDFs. This can be enabled by setting theparse_images
.processing_pipeline
can be also set to customize the post processing of doc elements.
v0.13.2
Added
pw.io.deltalake.read
now supports S3 data sources.pw.xpacks.llm.parsers.ImageParser
which allows parsing images with the vision LMs.pw.xpacks.llm.parsers.SlideParser
that enables parsing PDF and PPTX slides with the vision LMs.pw.xpacks.llm.parsers.question_answering.RAGClient
, Python client for Pathway hosted RAG apps.pw.xpacks.llm.parsers.question_answeringDeckRetriever
, a RAG app that enables searching through slide decks with visual-heavy elements.
Fixed
pw.xpacks.llm.vector_store.VectorStoreServer
now uses new indexes.
Changed
pw.xpacks.llm.parsers.OpenParse
now supports any vision Language model including local and propriety models via LiteLLM.
v0.13.1
Added
pw.io.kafka.read
now accepts an autogenerate_key flag. This flag determines the primary key generation policy to apply when reading raw data from the source. You can either use the key from the Kafka message or have Pathway autogenerate one.pw.io.deltalake.read
input connector that fetches changes from DeltaLake into a Pathway table.pw.xpacks.llm.parsers.OpenParse
which allows parsing tables and images in PDFs.
Fixed
- All S3 input connectors (including S3, Min.io, Digital Ocean, and Wasabi) now automatically retry network operations if a failure occurs.
- The issue where the connection to the S3 source fails after partially ingesting an object has been resolved by downloading the object in full first.
v0.13.0
Added
pw.io.deltalake.write
now supports S3 destinations.
Changed
pw.debug.compute_and_print
now allows passing more than one table.- BREAKING:
path
parameter inpw.io.deltalake.write
renamed touri
.
Fixed
- A bug in
pw.Table.deduplicate
. Ifpersistent_id
is not set, it is no longer generated inpw.PersistenceMode.SELECTIVE_PERSISTING
mode.
v0.12.0
Added
pw.PyObjectWrapper
that enables passing python objects of any type to the engine.cache_strategy
option added forpw.io.http.rest_connector
. It enables cache configuration, which is useful for duplicated requests.allow_misses
argument toTable.ix
andTable.ix_ref
methods which allows for filling rows with missing keys with None values.pw.io.deltalake.write
output connector that streams the changes of a given table into a DeltaLake storage.pw.io.airbyte.read
now supports data extraction with Google Cloud Runs.
Removed
- BREAKING: Removed
Table.having
method. - BREAKING: Removed
pw.DATE_TIME_UTC
,pw.DATE_TIME_NAIVE
andpw.DURATION
as dtype markers. Instead,pw.DateTimeUtc
,pw.DateTimeNaive
andpw.Duration
should be used, which are wrappers for corresponding pandas types. - BREAKING: Removed class transformers from public API:
pw.ClassArg
,pw.attribute
,pw.input_attribute
,pw.input_method
,pw.method
,pw.output_attribute
andpw.transformer
. - BREAKING: Removed several methods from
pw.indexing
module:binsearch_oracle
,filter_cmp_helper
,filter_smallest_k
andprefix_sum_oracle
.
v0.11.2
Added
pathway.assert_table_has_schema
andpathway.table_transformer
now acceptallow_subtype
argument, which, if True, allows column types in the Table be subtypes of types in the Schema.next
method topw.io.python.ConnectorSubject
(python connector) that enables passing values of any type to the engine, not only values that are json-serializable. Thenext
method should be the preferred way of passing values from the python connector.
Changed
- The
format
argument ofpw.io.python.read
is deprecated. A data format is inferred from the method used (next_json
,next_str
,next_bytes
) and the provided schema.
Removed
- Removed
pw.numba_apply
andnumba
dependency.
Fixed
- Fixed
pw.this
desugaring bug, where__getitem__
in.ix
context was not working properly. pw.io.sqlite.read
now checks if the data matches the passed schema.
v0.11.1
Added
query
andquery_as_of_now
ofpathway.stdlib.indexing.data_index.DataIndex
now accept inmetadata_column
parameter a column with data of typestr | None
.pathway.xpacks.connectors.sharepoint
module under Pathway for Business License.