- Added a data sink connector, supporting standard 'write' operations ('writeStream' interface is also supported)
- Added a basic data source connector, supporting standard 'read' operations Note: in this version, a single partition is forced
- Fixed DataTypes support, including DateTime and decimal
- Fixed streaming sink when working with multiple batches. Handle empty batches
- Added 'KUSTO_WRITE_RESULT_LIMIT' option. When writing to Kusto, limits the number of rows read back as BaseRelation
- Adjusted to Spark 2.4. This is optimized for Azure DataBricks default. In order to use with Spark 2.3, pom.xml file must be adjusted: spark.version (to 2.3.x) and json4s-jackson_2.11 (to 3.2.11)
- KustoOptions.KUSTO_TABLE is no longer used in reading using kusto source
-
Added 'scale' reading mode to allow reading large data sets. This is the default mode for reading from Kusto, and it requires the user to provide transient blob storage
NOTE: this is an interface-breaking change. Reading small data sets directly (as in previous version) is also supported, but it must be explicitly specified by setting 'KUSTO_READ_MODE' option to 'lean'
-
Added column pruning and filter push-down support when reading from Kusto
For details, refer to KustoSource.md document
-
Support writing large data sets. Partitions that exceed Kusto ingest policy guidelines are split into several smaller ingestion operations
For details, refer to KustoSink.md document
-
Support Key-vault based authentication, when authentication parameters are stored in KeyVault
For details, refer to Authentication.md document
- Added Python sample code for reference: pyKusto.py
- Updated existing references. In particular, the reference based on Databricks notebook: KustoConnectorDemo
- Organized samples and connector as separate modules to reduce dependencies
-
When running with spark 'wholestage codegen' enabled, a mismatch between schema 'nullable' definition and actual data containing null values can lead to a NullPointerException to be thrown by org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter. If you encounter this error and cannot identify and fix the mismatch, consider disabling 'wholestage codegen' by setting (databricks):
spark.conf.set("spark.sql.codegen.wholeStage","false")
-
When writing to Kusto, entity naming rules must be followed to avoid collisions with Kusto reserved keywords. For details, refer to these guidlines.