Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1859974: Issue with SnowflakeCursor.executemany() ? #2123

Open
fcremer-nl opened this issue Dec 17, 2024 · 7 comments
Open

SNOW-1859974: Issue with SnowflakeCursor.executemany() ? #2123

fcremer-nl opened this issue Dec 17, 2024 · 7 comments
Assignees
Labels
status-triage_done Initial triage done, will be further handled by the driver team

Comments

@fcremer-nl
Copy link

Python version

Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]

Operating system and processor architecture

Windows-10-10.0.19045-SP0

Installed packages

snowflake-connector-python==3.11.0
snowflake-snowpark-python==1.18.0

What did you do?

df = self.session.create_dataframe(rows, self.load_schema)
df.write.mode('append').save_as_table(self.mgt)

What did you expect to see?

I expected the data to be written to the table, but got the error "not all arguments converted during string formatting" at

File "C:\Users\504019\AppData\Roaming\Python\Python310\site-packages\snowflake\connector\cursor.py", line 1388, in executemany
    fmt % self._connection._process_params_pyformat(param, self)

Where the format (fmt) is a string, something like '(?,?,?,?,?)' and the right side the tuple with the data for the row.

If I change the line of code above to:
fmt.replace('?','%s') % self._connection._process_params_pyformat(param, self)

Then it works correctly. Similarly if I change the code original code to write row by row:

for row in rows:
    df = self.session.create_dataframe([row], self.load_schema)
    df.write.mode('append').save_as_table(self.mgt)

It also works!

Is this an issue with Snowpark or am I doing something stupid?

Can you set logging to DEBUG and collect the logs?

No
@github-actions github-actions bot changed the title Issue with SnowflakeCursor.executemany() ? SNOW-1859974: Issue with SnowflakeCursor.executemany() ? Dec 17, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Dec 18, 2024
@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented Dec 20, 2024

Hello @fcremer-nl ,

Thanks for raising the issue.

I tried to reproduce it in Windows 11 using Python 3.10 and Snowpark 1.26.0, and I am not able to reproduce it. Could you please share the full code snippet?

`load_schema = StructType([
StructField("ID", IntegerType()),
StructField("NAME", StringType())
])

rows = [
(1, "Alice"),
(2, "Bob"),
(3, "Charlie")
]

df = session.create_dataframe(rows, load_schema)

target_table = "example_table"

df.write.mode('append').save_as_table(target_table)

result_df = session.table(target_table)
result_df.show()

Output:
|"ID" |"NAME" |
|1 |Alice |
|2 |Bob |
|3 |Charlie |`

@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter and removed needs triage labels Dec 20, 2024
@fcremer-nl
Copy link
Author

fcremer-nl commented Dec 20, 2024 via email

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed status-information_needed Additional information is required from the reporter bug status-triage Issue is under initial triage labels Dec 23, 2024
@sfc-gh-sghosh
Copy link

Hello @fcremer-nl ,

Snowpark internally decides whether to batch writes or use executemany() based on the size of the data or other internal heuristics.

In terms of performance and feature, there is no difference whether you use qmark or pyformat or format or numeric.
Its mentioned in the documentation as well
https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-api

You can not use pyformat with positional arguments ( ?,?) etc because
The format string and parameter substitution are mismatched.
The pyformat style explicitly requires named placeholders and dictionary-based parameters.
For pyformat, the parameters must be dictionaries.

If you would like to use using positional placeholders (?,?,?,?) with tuples, please use "qmark" style.

Regards,
Sujan

@fcremer-nl
Copy link
Author

fcremer-nl commented Jan 2, 2025 via email

@sfc-gh-sghosh
Copy link

Hello @fcremer-nl ,

Thanks for the update.

  1. qmark and numeric bindings are server side binding which helps to increase performance for batch inserts via streaming and thats the reason why its kept as default when creating a session in snowpark.
  2. Snowpark only supports qmark binding and its default for snowpark. its mentioned in the documentation
    https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Session.sql

At present, there is no plan to change the bindings for Snowpark.

For python connector, you can change the bindings by specifying at connection level.

from snowflake import connector
Example:

Snowflake connection configurations

snowflake_config = {
"account": "", # Add your Snowflake account
"user": "", # Add your Snowflake user
"password": "", # Add your Snowflake password (consider using a secure method)
"role": "", # Add your Snowflake role
"warehouse": "", # Add your Snowflake warehouse
"database": "", # Add your Snowflake database
"schema": "", # Add your Snowflake schema
"paramstyle": "pyformat"

}

conn = snowflake.connector.connect(**snowflake_config)
cs = conn.cursor()

print(f"Paramstyle: {conn._paramstyle}")

I hope all clear now.

Regards,
Sujan

@fcremer-nl
Copy link
Author

fcremer-nl commented Jan 3, 2025 via email

@sfc-gh-sghosh
Copy link

Hello @fcremer-nl ,

Thanks for the update.

  1. Please note, snowpark and snowflake connector are different.
  2. With Snowpark session, the default binding paramstyle style is always qmark.
  3. With Snowflake python connector, the default binding parameter style is pyformat.
  4. So, when you create a session2 as a snowpark session from an existing Python connector connection, it will not use the qmark; it will still use the default binding pyformat, thats why its throwing error.

You already have the workaround, If you have a requirement to change the binding after connection, then yes use the respective code in snowpark or in python connector
a. Snowpark: session.connection._paramstyle = "qmark"
b. Python connector: conn._paramstyle= "qmark"

Regards,
Sujan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants