Skip to content

Releases: pathwaycom/pathway

v0.14.3

22 Aug 07:57
Compare
Choose a tag to compare

Fixed

  • pw.io.deltalake.read and pw.io.deltalake.write now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.

Added

  • The Pathway CLI command spawn can now execute code directly from a specified GitHub repository.
  • A new CLI command, spawn-from-env, has been added. This command runs the Pathway CLI spawn command using arguments provided in the PATHWAY_SPAWN_ARGS environment variable.

v0.14.2

06 Aug 17:05
Compare
Choose a tag to compare

Fixed

  • Switched pw.xpacks.llm.embedders.GeminiEmbedder to be sync to resolve compatibility issues with the Google Colab runs.
  • Pinned surya-ocr module version for stability.

v0.14.1

05 Aug 10:32
Compare
Choose a tag to compare

Added

  • pw.xpacks.llm.embedders.GeminiEmbedder which is a wrapper for Google Gemini Embedding services.

v0.14.0

25 Jul 20:50
Compare
Choose a tag to compare

Fixed

  • pw.debug.table_to_pandas now exports int | None columns correctly.

Changed

  • pw.io.airbyte.read can now be used with Airbyte connectors implemented in Python without requiring Docker.
  • BREAKING: UDFs now verify the type of returned values at runtime. If it is possible to cast a returned value to a proper type, the values is cast. If the value does not match the expected type and can't be cast, an error is raised.
  • BREAKING: pw.reducers.ndarray reducer requires input column to either have type float, int or Array.
  • pw.xpacks.llm.parsers.OpenParse can now extract and parse images & diagrams from PDFs. This can be enabled by setting the parse_images. processing_pipeline can be also set to customize the post processing of doc elements.

v0.13.2

08 Jul 20:53
Compare
Choose a tag to compare

Added

  • pw.io.deltalake.read now supports S3 data sources.
  • pw.xpacks.llm.parsers.ImageParser which allows parsing images with the vision LMs.
  • pw.xpacks.llm.parsers.SlideParser that enables parsing PDF and PPTX slides with the vision LMs.
  • pw.xpacks.llm.parsers.question_answering.RAGClient, Python client for Pathway hosted RAG apps.
  • pw.xpacks.llm.parsers.question_answeringDeckRetriever, a RAG app that enables searching through slide decks with visual-heavy elements.

Fixed

  • pw.xpacks.llm.vector_store.VectorStoreServer now uses new indexes.

Changed

  • pw.xpacks.llm.parsers.OpenParse now supports any vision Language model including local and propriety models via LiteLLM.

v0.13.1

27 Jun 10:31
Compare
Choose a tag to compare

Added

  • pw.io.kafka.read now accepts an autogenerate_key flag. This flag determines the primary key generation policy to apply when reading raw data from the source. You can either use the key from the Kafka message or have Pathway autogenerate one.
  • pw.io.deltalake.read input connector that fetches changes from DeltaLake into a Pathway table.
  • pw.xpacks.llm.parsers.OpenParse which allows parsing tables and images in PDFs.

Fixed

  • All S3 input connectors (including S3, Min.io, Digital Ocean, and Wasabi) now automatically retry network operations if a failure occurs.
  • The issue where the connection to the S3 source fails after partially ingesting an object has been resolved by downloading the object in full first.

v0.13.0

13 Jun 12:12
Compare
Choose a tag to compare

Added

  • pw.io.deltalake.write now supports S3 destinations.

Changed

  • pw.debug.compute_and_print now allows passing more than one table.
  • BREAKING: path parameter in pw.io.deltalake.write renamed to uri.

Fixed

  • A bug in pw.Table.deduplicate. If persistent_id is not set, it is no longer generated in pw.PersistenceMode.SELECTIVE_PERSISTING mode.

v0.12.0

10 Jun 06:06
Compare
Choose a tag to compare

Added

  • pw.PyObjectWrapper that enables passing python objects of any type to the engine.
  • cache_strategy option added for pw.io.http.rest_connector. It enables cache configuration, which is useful for duplicated requests.
  • allow_misses argument to Table.ix and Table.ix_ref methods which allows for filling rows with missing keys with None values.
  • pw.io.deltalake.write output connector that streams the changes of a given table into a DeltaLake storage.
  • pw.io.airbyte.read now supports data extraction with Google Cloud Runs.

Removed

  • BREAKING: Removed Table.having method.
  • BREAKING: Removed pw.DATE_TIME_UTC, pw.DATE_TIME_NAIVE and pw.DURATION as dtype markers. Instead, pw.DateTimeUtc, pw.DateTimeNaive and pw.Duration should be used, which are wrappers for corresponding pandas types.
  • BREAKING: Removed class transformers from public API: pw.ClassArg, pw.attribute, pw.input_attribute, pw.input_method, pw.method, pw.output_attribute and pw.transformer.
  • BREAKING: Removed several methods from pw.indexing module: binsearch_oracle, filter_cmp_helper, filter_smallest_k and prefix_sum_oracle.

v0.11.2

27 May 08:33
Compare
Choose a tag to compare

Added

  • pathway.assert_table_has_schema and pathway.table_transformer now accept allow_subtype argument, which, if True, allows column types in the Table be subtypes of types in the Schema.
  • next method to pw.io.python.ConnectorSubject (python connector) that enables passing values of any type to the engine, not only values that are json-serializable. The next method should be the preferred way of passing values from the python connector.

Changed

  • The format argument of pw.io.python.read is deprecated. A data format is inferred from the method used (next_json, next_str, next_bytes) and the provided schema.

Removed

  • Removed pw.numba_apply and numba dependency.

Fixed

  • Fixed pw.this desugaring bug, where __getitem__ in .ix context was not working properly.
  • pw.io.sqlite.read now checks if the data matches the passed schema.

v0.11.1

16 May 19:30
Compare
Choose a tag to compare

Added

  • query and query_as_of_now of pathway.stdlib.indexing.data_index.DataIndex now accept in metadata_column parameter a column with data of type str | None.
  • pathway.xpacks.connectors.sharepoint module under Pathway for Business License.