-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8783] Add tests for decimal data type #12519
[HUDI-8783] Add tests for decimal data type #12519
Conversation
val opts = sparkOpts ++ fgReaderOpts | ||
|
||
@Test | ||
def testDecimalInsertUpdateDeleteRead(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add various precisions so we can test the different possible representations in parquet?
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's test at least 3 cases:
- precision <= 10
- precision > 10 && precision < 18
- precision > 18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParameterizedTest @CsvSource(value = Array("10,2", "20,10", "38,18", "5,0"))
So, let me add one more (15, 10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now:
@ParameterizedTest @CsvSource(value = Array("10,2", "15,5", "20,10", "38,18", "5,0"))
a4d87e6
to
aad4dd7
Compare
aad4dd7
to
bba29bd
Compare
case v: Integer => newType match { | ||
case Type.INT => v | ||
case Type.LONG => v.longValue() | ||
case Type.FLOAT => v.floatValue() | ||
case Type.DOUBLE => v.doubleValue() | ||
case Type.STRING => UTF8String.fromString(v.toString) | ||
case Type.FIXED => BigDecimal(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a special handling for Spark decimal data type for precision < 18 in the UnsafeRow
decimal getter:
public Decimal getDecimal(int ordinal, int precision, int scale) {
if (isNullAt(ordinal)) {
return null;
}
if (precision <= Decimal.MAX_LONG_DIGITS()) {
return Decimal.createUnsafe(getLong(ordinal), precision, scale);
} else {
byte[] bytes = getBinary(ordinal);
BigInteger bigInteger = new BigInteger(bytes);
BigDecimal javaDecimal = new BigDecimal(bigInteger, scale);
return Decimal.apply(javaDecimal, precision, scale);
}
}
Should we do it here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. let me add it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 , I tried to cast it, but I think we so far we have lost the precision and scale information from FIXED type here. So we can not do the special treatment.
bba29bd
to
77e9370
Compare
Change Logs
Add a test to insert/update/delete/read data for a MOR table
Impact
Verify that decimal data type workflow works.
Risk level (write none, low medium or high below)
Low.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist