Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1856596: df.write.save_as_table fails when column_order="name" and a column has a SQL name clash #2763

Open
tvdboom opened this issue Dec 13, 2024 · 6 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@tvdboom
Copy link

tvdboom commented Dec 13, 2024

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]

  2. What operating system and processor architecture are you using?

    Windows-10-10.0.22631-SP0

  3. What are the component versions in the environment (pip freeze)?

    pandas==2.2.2
    snowflake-snowpark-python==1.26.0

  4. What did you do?

I created a dataframe with a column named ROW and tried to save it as a table using column_order="name".

import pandas as pd
from snowflake.snowpark import Session

session = Session.builder.configs(XXX).create()
test_data = session.create_dataframe(pd.DataFrame({"A": [0, 1, 2], "ROW": ["a", "b", "c"]}))
test_data.write.save_as_table("DATABASE.SCHEMA.TESTTABLE1", mode="append", column_order="name")
  
  1. What did you expect to see?

    No error, but got

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python-ce\helpers\pydev\pydevconsole.py", line 364, in runcode
    coro = func()
           ^^^^^^
  File "<input>", line 2, in <module>
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\telemetry.py", line 230, in wrap
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\utils.py", line 960, in func_call_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\dataframe_writer.py", line 446, in save_as_table
    result = session._conn.execute(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 609, in execute
    result_set, result_meta = self.get_result_set(
                              ^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\analyzer\snowflake_plan.py", line 205, in wrap
    raise ne.with_traceback(tb) from None
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\analyzer\snowflake_plan.py", line 136, in wrap
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 728, in get_result_set
    result = self.run_query(
             ^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 130, in wrap
    raise ex
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 124, in wrap
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 516, in run_query
    raise ex
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 501, in run_query
    results_cursor = self.execute_and_notify_query_listener(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 443, in execute_and_notify_query_listener
    raise ex
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\snowpark\_internal\server_connection.py", line 434, in execute_and_notify_query_listener
    results_cursor = self._cursor.execute(query, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\connector\cursor.py", line 1097, in execute
    Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\connector\errors.py", line 284, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\connector\errors.py", line 339, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "C:\repos\hippolib\venv311\Lib\site-packages\snowflake\connector\errors.py", line 215, in default_errorhandler
    raise error_class(
snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 01b9012e-0305-26b4-0001-0aba15beddd2: 001003 (42000): SQL compilation error:
syntax error line 1 at position 49 unexpected 'ROW'.

I inspected the query created by snowpark which failed, which was
INSERT INTO DATABASE.SCHEMA.TESTTABLE1(A, ROW) SELECT * FROM "DATABASE"."SCHEMA"."SNOWPARK_TEMP_TABLE_8A1BTHDOUP"

as you can see, the column names in (A, ROW) are not enclosed in double quotes, therefore the query is invalid since it thinks ROW is a sql command.

@tvdboom tvdboom added bug Something isn't working needs triage Initial RCA is required labels Dec 13, 2024
@github-actions github-actions bot changed the title df.write.save_as_table fails when column_order="name" and a column has a SQL name clash SNOW-1856596: df.write.save_as_table fails when column_order="name" and a column has a SQL name clash Dec 13, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Dec 16, 2024
@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage and removed needs triage Initial RCA is required labels Dec 16, 2024
@sfc-gh-sghosh
Copy link

Hello @tvdboom ,

Thanks for raising the issue.
ROW is a reserve keyword for Snowflake SQL, and that's why you can't use it.
https://docs.snowflake.com/en/sql-reference/reserved-keywords

As a workaround,

  1. Either you can change the name of the column.
    or
  2. Please use small letter/case sensitive for the column name

test_data = session.create_dataframe(
pd.DataFrame({"a": [0, 1, 2], "row": ["a", "b", "c"]})
)

test_data.write.save_as_table(
"SAMPLEDATABASE.TEST.TESTTABLE3",
mode="append",
column_order="name"
)

I hope its clear

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed bug Something isn't working status-triage Issue is under initial triage labels Dec 16, 2024
@tvdboom
Copy link
Author

tvdboom commented Dec 16, 2024

I understand, but I still think this is a bug in Snowpark. Snowflake does allow to have a column named ROW as can be seen here
image

Also, the same code without column_order="name" does work as expected, so the issue is not with the name of the column, but with the implementation of the column_order parameter. As I mentioned in the description, the bug is that the column names aren't surrounded by double quotes, as the rest of the column names in Snowpark queries have.

@tvdboom
Copy link
Author

tvdboom commented Dec 16, 2024

@sfc-gh-sghosh Changing the function

def insert_into_statement(
table_name: str,
child: str,
column_names: Optional[Iterable[str]] = None,
overwrite: bool = False,
) -> str:
table_columns = f"({COMMA.join(column_names)})" if column_names else EMPTY_STRING
if is_sql_select_statement(child):
return f"{INSERT}{OVERWRITE if overwrite else EMPTY_STRING}{INTO}{table_name}{table_columns}{SPACE}{child}"
return f"{INSERT}{INTO}{table_name}{table_columns}{project_statement([], child)}"

into this solves the issue:

def insert_into_statement(
    table_name: str,
    child: str,
    column_names: Optional[Iterable[str]] = None,
    overwrite: bool = False,
) -> str:
    if column_names:
        table_columns = "(" + COMMA.join(f'"{c}"' for c in column_names) + ")"
    else:
        table_columns = EMPTY_STRING
    if is_sql_select_statement(child):
        return f"{INSERT}{OVERWRITE if overwrite else EMPTY_STRING}{INTO}{table_name}{table_columns}{SPACE}{child}"
    return f"{INSERT}{INTO}{table_name}{table_columns}{project_statement([], child)}"

let me know if you agree so I can open a PR

@sfc-gh-sghosh
Copy link

Thank you @tvdboom ,

We are internally checking with team, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the enhancement The issue is a request for improvement or a new feature label Dec 18, 2024
@sfc-gh-mrojas
Copy link
Collaborator

I think we should not just quote all identifiers. There is a list of reserved words: https://docs.snowflake.com/en/sql-reference/reserved-keywords and we should check whether the column name is a reserved word and then quoted it.

@sfc-gh-yixie
Copy link
Collaborator

sfc-gh-yixie commented Jan 6, 2025

Does this work as a workaround?

import pandas as pd
from snowflake.snowpark import Session

session = Session.builder.configs(XXX).create()
test_data = session.create_dataframe(pd.DataFrame({"A": [0, 1, 2], '"ROW"': ["a", "b", "c"]}))
test_data.write.save_as_table("DATABASE.SCHEMA.TESTTABLE1", mode="append", column_order="name")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants