Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28622: Duplicate Entries in TXN_WRITE_NOTIFICATION_LOG Due to Oracle's Handling of Empty Strings #5617

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

harshal-16
Copy link
Contributor

In Oracle, empty strings ('') are treated as NULL values for VARCHAR2 and CHAR data types. This behavior is unique to Oracle and can be confusing, as an empty string is typically considered distinct from NULL in other databases.

As a result, the TXN_WRITE_NOTIFICATION_LOG table receives duplicate entries for a single Hive ACID transaction involving MERGE statements.

This discrepancy leads to issues: the _files and _dumpmetadata files in a Hive ACID incremental dump will not align if the dump scope includes one or more MERGE statements. Consequently, the Hive ACID incremental LOAD fails at the target (DR), blocking subsequent replication executions.

Solution

  • Add additional check for partition being null

Testing:

  • Tested on cluster with oracle and mysql as backend database

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

…acle's Handling of Empty Strings

In Oracle, empty strings ('') are treated as NULL values for VARCHAR2 and CHAR data types. This behavior is unique to Oracle and can be confusing, as an empty string is typically considered distinct from NULL in other databases.

As a result, the TXN_WRITE_NOTIFICATION_LOG table receives duplicate entries for a single Hive ACID transaction involving MERGE statements.

This discrepancy leads to issues: the _files and _dumpmetadata files in a Hive ACID incremental dump will not align if the dump scope includes one or more MERGE statements. Consequently, the Hive ACID incremental LOAD fails at the target (DR), blocking subsequent replication executions.

Solution
* Add additional check for partition being null

Testing:
* Tested on cluster with oracle and mysql as backend database
@pudidic
Copy link
Contributor

pudidic commented Jan 22, 2025

+1. Looks good to me.

@pudidic pudidic self-requested a review January 22, 2025 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants