Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large queries with connection to Azure Blob Storage fail behind firewall due to proxy not being passed in the connector #2125

Closed
alex-atkins opened this issue Dec 20, 2024 · 4 comments
Assignees
Labels
question status-triage_done Initial triage done, will be further handled by the driver team

Comments

@alex-atkins
Copy link

alex-atkins commented Dec 20, 2024

Python version

3.12.3

Operating system and processor architecture

Ubuntu 24.04

Installed packages

aiohappyeyeballs==2.4.4
aiohttp==3.10.10
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.6.2.post1
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asgiref==3.8.1
asn1crypto==1.5.1
asttokens==2.4.1
async-lru==2.0.4
attrs==24.2.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.2.0
certifi==2024.8.30
cffi==1.17.1
channels==4.1.0
channels-redis==4.2.0
chardet==5.2.0
charset-normalizer==3.4.0
click==8.1.7
cmdstanpy==1.2.4
coloredlogs==15.0.1
comm==0.2.2
contourpy==1.3.1
cryptography==43.0.3
cycler==0.12.1
dataclasses-json==0.6.7
debugpy==1.8.11
decorator==5.1.1
defusedxml==0.7.1
distro==1.9.0
Django==5.1.4
django-auth-adfs==1.14.0
django-cors-headers==4.4.0
django-debug-toolbar==4.4.6
django-filter==21.1
django_csp==3.8
djangorestframework==3.15.2
elastic-transport==8.15.1
elasticsearch==8.15.1
elasticsearch-dsl==8.15.4
et_xmlfile==2.0.0
eval_type_backport==0.2.0
executing==2.1.0
fastjsonschema==2.20.0
filelock==3.16.1
FlashRank==0.2.9
flatbuffers==24.3.25
fonttools==4.54.1
fqdn==1.5.1
frozenlist==1.5.0
fsspec==2024.12.0
fuzzywuzzy==0.18.0
greenlet==3.1.1
h11==0.14.0
holidays==0.60
httpcore==1.0.7
httptools==0.6.4
httpx==0.27.2
httpx-sse==0.4.0
huggingface-hub==0.26.2
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
importlib_resources==6.4.5
ipykernel==6.29.5
ipython==8.31.0
ipywidgets==8.1.5
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.4
jiter==0.8.2
joblib==1.4.2
json5==0.10.0
jsonpatch==1.33
jsonpath-python==1.0.6
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter==1.1.1
jupyter-console==6.6.3
jupyter-events==0.11.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.15.0
jupyter_server_terminals==0.5.3
jupyterlab==4.2.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.13
kiwisolver==1.4.7
langchain==0.3.13
langchain-community==0.3.13
langchain-core==0.3.28
langchain-experimental==0.3.4
langchain-openai==0.2.6
langchain-text-splitters==0.3.4
langchainhub==0.1.21
langsmith==0.1.142
Levenshtein==0.26.1
MarkupSafe==3.0.2
marshmallow==3.23.2
matplotlib==3.9.2
matplotlib-inline==0.1.7
mistune==3.0.2
mpmath==1.3.0
msgpack==1.1.0
multidict==6.1.0
multipledispatch==1.0.0
mypy-extensions==1.0.0
natsort==8.4.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
nested_dict==1.61
nltk==3.9.1
notebook==7.2.2
notebook_shim==0.2.4
numpy==1.26.4
onnxruntime==1.20.0
openai==1.54.3
openpyxl==3.0.10
orjson==3.10.12
overrides==7.7.0
packaging==24.2
pandas==2.1.4
pandas-flavor==0.6.0
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pgvector==0.3.6
pillow==11.0.0
platformdirs==4.3.6
prometheus_client==0.21.1
prompt_toolkit==3.0.48
propcache==0.2.0
prophet==1.1.6
protobuf==5.28.3
psutil==6.1.1
psycopg==3.2.3
psycopg-binary==3.2.3
psycopg-pool==3.2.4
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==18.0.0
pycparser==2.22
pycryptodome==3.21.0
pydantic==2.9.2
pydantic-settings==2.7.0
pydantic_core==2.23.4
Pygments==2.18.0
pyjanitor==0.23.1
PyJWT==2.9.0
pyodbc==5.2.0
pyOpenSSL==24.3.0
pyparsing==3.2.0
pypdf==4.3.1
pyspnego==0.11.2
python-dateutil==2.8.2
python-dotenv==1.0.1
python-json-logger==3.2.1
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
RapidFuzz==3.10.1
redis==5.2.1
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
requests-ntlm==1.2.0
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.21.0
scipy==1.14.1
Send2Trash==1.8.3
setuptools==75.3.0
six==1.16.0
sniffio==1.3.1
snowflake-connector-python==3.12.4
snowflake-sqlalchemy==1.6.1
sortedcontainers==2.4.0
soupsieve==2.6
SQLAlchemy==1.4.54
sqlparse==0.5.1
stack-data==0.6.3
stanio==0.5.1
sympy==1.13.3
tabulate==0.9.0
tenacity==9.0.0
teradatasql==20.0.0.20
teradatasqlalchemy==20.0.0.2
terminado==0.18.1
tiktoken==0.8.0
tinycss2==1.4.0
tokenizers==0.20.3
tomlkit==0.13.2
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
types-python-dateutil==2.9.0.20241206
types-requests==2.32.0.20241016
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2023.4
unstructured-client==0.26.2
uri-template==1.3.0
urllib3==2.2.3
uvicorn==0.31.1
uvloop==0.21.0
watchfiles==1.0.3
wcwidth==0.2.13
webcolors==24.8.0
webencodings==0.5.1
websocket-client==1.8.0
websockets==13.1
widgetsnbextension==4.0.13
xarray==2024.10.0
yarl==1.17.1
zipp==3.21.0

What did you do?

os.environ['MAX_CON_RETRY_ATTEMPTS'] = '0'
        import snowflake.connector
        
        # Establish a connection to Snowflake
        ctx = snowflake.connector.connect(
            account=SNOWFLAKE_ACCOUNT,
            user=SNOWFLAKE_USER,
            password=SNOWFLAKE_PASS,
            warehouse='WH',
            proxy_host=PROXY_HOST,  # proxy set
            proxy_port=PROXY_PORT,  # proxy set
            database='DB',
            schema='SCHEMA',
            role='ROLE',
            protocol='https',
            login_timeout=10
        )

        # Note result size needs to be large enough to trigger connection Azure Blob Storage  (XXXXXXX.blob.core.windows.net)
        query = """select * from table""" 
        cur = ctx.cursor()
        cur.execute(query)
        results = cur.fetchall()

What did you expect to see?

Result should have been returned. This works fine for small queries, but large queries trigger Azure Blob connection that fails. These queries are being executed behind a corporate firewall. It appears that the proxies are not being passed to urllib3 when the Azure blob connection is being made. In addition, the connection keeps retrying, causing the whole process to hang. I've tried extensively to at least be able to raise an exception and halt retries, but have been unsuccessful with this as well. I expect to be able to execute "Large" snowflake queries behind a firewall (without setting proxy in the OS env) when the proxy variables are passed to the snowflake connector. In addition, I expect some capability to halt or limit retries when proxy is not set / passed.

Error:
packages/snowflake/connector/vendored/urllib3/connection.py", line 179, in _new_conn raise ConnectTimeoutError( snowflake.connector.vendored.urllib3.exceptions.ConnectTimeoutError: (<snowflake.connector.vendored.urllib3.connection.HTTPSConnection object at>, 'Connection to XXXXXXX.blob.core.windows.net timed out. (connect timeout=7)')

Can you set logging to DEBUG and collect the logs?

connector/vendored/urllib3/connectionpool.py:1019 Starting new HTTPS connection (10): XXXXXXX.blob.core.windows.net:443
DEBUG [2024-12-20 10:05:47,652] retry snowflake/connector/vendored/urllib3/util/retry.py:594 Incremented Retry for (url='/results/XXXXXXX=gzip'): Retry(total=0, connect=None, read=None, redirect=None, status=None)

WARNING [2024-12-20 10:05:47,652] connectionpool snowflake/connector/vendored/urllib3/connectionpool.py:824 Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<snowflake.connector.vendored.urllib3.connection.HTTPSConnection object at XXXXXXX>, 'Connection to XXXXXXX.blob.core.windows.net timed out. (connect timeout=7)')': /results/XXXXXXX=gzip

DEBUG [2024-12-20 10:05:47,652] connectionpool snowflake/connector/vendored/urllib3/connectionpool.py:1019 Starting new HTTPS connection (10): XXXXXXX.blob.core.windows.net:443

DEBUG [2024-12-20 10:05:58,837] network snowflake/connector/network.py:1226 Session status for SessionPool 'None', SessionPool 3/4 active sessions

ERROR [2024-12-20 10:05:58,837] result_batch snowflake/connector/result_batch.py:362 Failed to fetch the large result set batch XXXXXXX for the 5 th time, backing off for 14s for the reason: 'HTTPSConnectionPool(host='XXXXXXX.blob.core.windows.net', port=443): Max retries exceeded with url: /results/XXXXXXX =gzip (Caused by ConnectTimeoutError(<snowflake.connector.vendored.urllib3.connection.HTTPSConnection object at XXXXXXX >, 'Connection to XXXXXXX.blob.core.windows.net timed out. (connect timeout=7)'))'
@alex-atkins
Copy link
Author

alex-atkins commented Dec 20, 2024

It looks like these proxy parameters (proxy_host, proxy_port) are no longer supported, which explains why they don't work. Was there a reason they were removed? I'm not seeing any possible workarounds for my use-case with the current version of this package.

@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Dec 21, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added question status-triage_done Initial triage done, will be further handled by the driver team and removed bug needs triage labels Dec 21, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi - documented method of connecting through a proxy is using the HTTP_PROXY / HTTPS_PROXY (and if needed NO_PROXY) envvars.

export HTTP_PROXY='http://my.pro.xy:8080'
export HTTPS_PROXY='http://my.pro.xy:8080'

Can you please try that and see how it works for you ?

You can also set those envvars exclusively for the Python process if you don't want to export them globally:
HTTPS_PROXY=http://my.pro.xy 8080 python myscript.py
or inside the program itself should also work

import os
..other stuff, but before creating the Snowflake connection..
os.environ["HTTPS_PROXY"] = "http://my.pro.xy:8080"
..proceeding with Snowflake stuff

@alex-atkins
Copy link
Author

Yes, both of these methods work. How come the ability to set the proxy at the package level was removed?

@sfc-gh-dszmolka
Copy link
Contributor

thank you for confirming the driver works accordingly to its publicly available documentation - closing the issue as we see the expected behaviour here. If you still think this is an error and driver works differently how it should per its documentation, let us know please and I'll reopen.

@sfc-gh-dszmolka sfc-gh-dszmolka closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants