HIVE-28622: Duplicate Entries in TXN_WRITE_NOTIFICATION_LOG Due to Oracle's Handling of Empty Strings #5617
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In Oracle, empty strings ('') are treated as NULL values for VARCHAR2 and CHAR data types. This behavior is unique to Oracle and can be confusing, as an empty string is typically considered distinct from NULL in other databases.
As a result, the TXN_WRITE_NOTIFICATION_LOG table receives duplicate entries for a single Hive ACID transaction involving MERGE statements.
This discrepancy leads to issues: the _files and _dumpmetadata files in a Hive ACID incremental dump will not align if the dump scope includes one or more MERGE statements. Consequently, the Hive ACID incremental LOAD fails at the target (DR), blocking subsequent replication executions.
Solution
Testing:
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?