fix: transpiling sql to specific dialect #1596

scaliseraoul · 2025-02-07T11:20:26Z

Tests added and passed if fixing a bug or adding a new feature.
All code checks passed.

Important

Add SQL dialect transpilation to data loaders using SQLParser.transpile_sql_dialect and update tests accordingly.

Behavior:
- SQLParser.transpile_sql_dialect added to convert SQL queries to specific dialects.
- DuckDBConnectionManager.sql now transpiles queries from MySQL to DuckDB.
- SQLDatasetLoader.execute_query and ViewDatasetLoader.execute_query transpile queries to the source type.
Tests:
- Updated test_sql_loader.py to check for transpiled query format.
- Added test_mysql_transpilation in test_sql_parser.py to verify SQL dialect conversion.

^{This description was created by}^{for b98fb72. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to b98fb72 in 1 minute and 53 seconds

More details

Looked at 119 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. pandasai/data_loader/sql_loader.py:44

Draft comment:
Transpiling query using 'to_dialect=source_type' without specifying 'from_dialect' may lead to inconsistent behavior compared to DuckDB loader.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment raises a valid technical concern about SQL dialect conversion. However, without seeing the implementation of SQLParser.transpile_sql_dialect() or understanding the default behavior when from_dialect is omitted, we can't be certain this is actually an issue. The mention of DuckDB seems to require knowledge of other parts of the codebase. The comment is speculative ("may lead to") rather than definitive.
I might be too harsh - this could be a real compatibility issue that an experienced developer spotted. The DuckDB reference might indicate a known issue in the system.
While it could be a real issue, we need stronger evidence within this file's context to keep the comment. The speculative nature and reference to external components make it less actionable.
Delete the comment because it's speculative ("may lead to") and requires knowledge of external components (DuckDB loader) to understand or verify the issue.

2. pandasai/data_loader/view_loader.py:93

Draft comment:
Ensure that transpiling with 'to_dialect=source_type' correctly handles the expected dialect conversion for view queries.
Reason this comment was not posted:
Marked as duplicate.

3. pandasai/data_loader/duck_db_connection_manager.py:37

Draft comment:
Hardcoding the input dialect as 'mysql' may be too restrictive. Consider exposing or detecting the source dialect.
Reason this comment was not posted:
Marked as duplicate.

4. pandasai/data_loader/sql_loader.py:44

Draft comment:
Missing explicit 'from_dialect' parameter may lead to inconsistent transpilation compared to other parts. Confirm if input queries are in the expected dialect.
Reason this comment was not posted:
Marked as duplicate.

5. pandasai/data_loader/view_loader.py:93

Draft comment:
As above, consider whether a 'from_dialect' argument is needed for consistency in transpiling queries.
Reason this comment was not posted:
Marked as duplicate.

6. pandasai/query_builders/sql_parser.py:55

Draft comment:
Consider adding error handling (try/except) around parse_one to provide clearer feedback on SQL parsing failures.
Reason this comment was not posted:
Marked as duplicate.

7. tests/unit_tests/data_loader/test_sql_loader.py:156

Draft comment:
Test assertion relies on specific SQL formatting which may be brittle if sqlglot's pretty-printing changes.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment raises a valid concern - the test is now asserting exact string matching including whitespace and newlines, which could break if the SQL formatter's output changes. However, this seems to be an intentional change to match the actual behavior of the system. The test should reflect the real behavior.
The comment may be too speculative - we don't know if sqlglot's formatting is actually likely to change. The test might intentionally verify the exact formatting.
While the formatting could change, tests should verify actual behavior. If the formatting changes, the test should be updated to match the new behavior.
The comment should be deleted. While it raises a valid point about test brittleness, exact string matching is sometimes necessary in tests and changes to the formatting would require intentional test updates.

8. tests/unit_tests/query_builders/test_sql_parser.py:62

Draft comment:
Basing tests on exact output formatting can lead to brittle tests if the underlying formatter changes. Consider using a more flexible comparison.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_A6ReyntpAVBI0Alg

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/data_loader/duck_db_connection_manager.py

pandasai/query_builders/sql_parser.py

fix: transpiling sql to specific dialect

b98fb72

ellipsis-dev bot reviewed Feb 7, 2025

View reviewed changes

pandasai/data_loader/duck_db_connection_manager.py Outdated Show resolved Hide resolved

pandasai/query_builders/sql_parser.py Show resolved Hide resolved

fix: remove default mysql dialect

182ce68

gventuri merged commit 5e025ed into sinaptik-ai:main Feb 7, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: transpiling sql to specific dialect #1596

fix: transpiling sql to specific dialect #1596

scaliseraoul commented Feb 7, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

fix: transpiling sql to specific dialect #1596

fix: transpiling sql to specific dialect #1596

Conversation

scaliseraoul commented Feb 7, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

scaliseraoul commented Feb 7, 2025 •

edited by ellipsis-dev bot

Loading