Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1830524 Add decoder logic for Dataframe.join #2802

Conversation

sfc-gh-vbudati
Copy link
Contributor

@sfc-gh-vbudati sfc-gh-vbudati commented Dec 20, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1830524

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.
    Added decoder logic for Dataframe.join. All join tests should work now.

@sfc-gh-vbudati sfc-gh-vbudati requested a review from a team as a code owner December 20, 2024 19:54
@@ -6,9 +6,10 @@ df2 = session.create_dataframe([[1, 2, 3, 4, 5]], schema=['\"A\"','\"B\"','\"C\"

df3 = df1.filter(col("\"A\"") == 1).join(df2.select((col("\"A\"") + 1).as_("\"A\""), col("\"B\""), col("\"C\""), col("\"l_0001_C\""), col("\"l_0003_B\"")))

df4 = df3.sort(df3.columns)
# Commented out since df3.columns produces different results in the first encoding and in the encode-decode-encode result.
Copy link
Collaborator

@sfc-gh-evandenberg sfc-gh-evandenberg Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you find out why this is happening? We shouldn't be removing valid test cases here. I don't see a reason why encode-decode-encode needs to be value equivalent, this is a good example, as long as they are semantically equivalent (the uniqueness of generated columns correspond correctly) that is all that is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unique columns correspond to each other in both cases but the values are different. I'm not sure what the best way around this is since hardcoding the decoder seems like a bad idea. I can add the test back in.

I'm not familiar with how the column names are generated but can try to figure that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@@ -291,6 +291,7 @@ def test_ast(session, tables, test_case):
decoder = Decoder(session)
session._ast_batch.reset_id_gen() # Reset the entity ID generator.
session._ast_batch.flush() # Clear the AST.
global_counter.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand why this is needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When two tables are joined and a new column needs to be created with a randomly generated name, this generation is seeded for testing purposes and it depends on the global counter -- so in order for the randomness to be deterministic in the same way, the global counter needs to be at the same "count" (which is reset for every test run).

@sfc-gh-vbudati sfc-gh-vbudati merged commit a7f5670 into vbudati/SNOW-1794510-merge-decoder Jan 14, 2025
18 of 35 checks passed
@sfc-gh-vbudati sfc-gh-vbudati deleted the vbudati/SNOW-1830524-decoder-df-join branch January 14, 2025 00:25
@github-actions github-actions bot locked and limited conversation to collaborators Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants