- Added a data sink connector, supporting standard 'write' operations ('writeStream' interface is also supported)
- Added a basic data source connector, supporting standard 'read' operations Note: in this version, a single partition is forced
- Fixed DataTypes support, including DateTime and decimal
- Fixed streaming sink when working with multiple batches. Handle empty batches
- Added 'KUSTO_WRITE_RESULT_LIMIT' option. When writing to Kusto, limits the number of rows read back as BaseRelation
- Adjusted to Spark 2.4. This is optimized for Azure DataBricks default. In order to use with Spark 2.3, pom.xml file must be adjusted: spark.version (to 2.3.x) and json4s-jackson_2.11 (to 3.2.11)
- KustoOptions.KUSTO_TABLE is no longer used in reading using kusto source
Added 'scale' reading mode to allow reading large data sets. This is the default mode for reading from Kusto, and it requires the user to provide transient blob storage
NOTE: this is an interface-breaking change. Reading small data sets directly (as in previous version) is also supported, but it must be explicitly specified by setting 'KUSTO_READ_MODE' option to 'lean'
Added column pruning and filter push-down support when reading from Kusto
For details, refer to KustoSource.md document
Support writing large data sets. Partitions that exceed Kusto ingest policy guidelines are split into several smaller ingestion operations
For details, refer to KustoSink.md document
Support Key-vault based authentication, when authentication parameters are stored in KeyVault
For details, refer to Authentication.md document
- Added Python sample code for reference: pyKusto.py
- Updated existing references. In particular, the reference based on Databricks notebook: KustoConnectorDemo
- Organized samples and connector as separate modules to reduce dependencies
When running with spark 'wholestage codegen' enabled, a mismatch between schema 'nullable' definition and actual data containing null values can lead to a NullPointerException to be thrown by org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter. If you encounter this error and cannot identify and fix the mismatch, consider disabling 'wholestage codegen' by setting (databricks):
When writing to Kusto, entity naming rules must be followed to avoid collisions with Kusto reserved keywords. For details, refer to these guidlines.