Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: transpiling sql to specific dialect #1596

Merged
merged 2 commits into from
Feb 7, 2025

Conversation

scaliseraoul
Copy link
Contributor

@scaliseraoul scaliseraoul commented Feb 7, 2025


Important

Add SQL dialect transpilation to data loaders using SQLParser.transpile_sql_dialect and update tests accordingly.

  • Behavior:
    • SQLParser.transpile_sql_dialect added to convert SQL queries to specific dialects.
    • DuckDBConnectionManager.sql now transpiles queries from MySQL to DuckDB.
    • SQLDatasetLoader.execute_query and ViewDatasetLoader.execute_query transpile queries to the source type.
  • Tests:
    • Updated test_sql_loader.py to check for transpiled query format.
    • Added test_mysql_transpilation in test_sql_parser.py to verify SQL dialect conversion.

This description was created by Ellipsis for b98fb72. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to b98fb72 in 1 minute and 53 seconds

More details
  • Looked at 119 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. pandasai/data_loader/sql_loader.py:44
  • Draft comment:
    Transpiling query using 'to_dialect=source_type' without specifying 'from_dialect' may lead to inconsistent behavior compared to DuckDB loader.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment raises a valid technical concern about SQL dialect conversion. However, without seeing the implementation of SQLParser.transpile_sql_dialect() or understanding the default behavior when from_dialect is omitted, we can't be certain this is actually an issue. The mention of DuckDB seems to require knowledge of other parts of the codebase. The comment is speculative ("may lead to") rather than definitive.
    I might be too harsh - this could be a real compatibility issue that an experienced developer spotted. The DuckDB reference might indicate a known issue in the system.
    While it could be a real issue, we need stronger evidence within this file's context to keep the comment. The speculative nature and reference to external components make it less actionable.
    Delete the comment because it's speculative ("may lead to") and requires knowledge of external components (DuckDB loader) to understand or verify the issue.
2. pandasai/data_loader/view_loader.py:93
  • Draft comment:
    Ensure that transpiling with 'to_dialect=source_type' correctly handles the expected dialect conversion for view queries.
  • Reason this comment was not posted:
    Marked as duplicate.
3. pandasai/data_loader/duck_db_connection_manager.py:37
  • Draft comment:
    Hardcoding the input dialect as 'mysql' may be too restrictive. Consider exposing or detecting the source dialect.
  • Reason this comment was not posted:
    Marked as duplicate.
4. pandasai/data_loader/sql_loader.py:44
  • Draft comment:
    Missing explicit 'from_dialect' parameter may lead to inconsistent transpilation compared to other parts. Confirm if input queries are in the expected dialect.
  • Reason this comment was not posted:
    Marked as duplicate.
5. pandasai/data_loader/view_loader.py:93
  • Draft comment:
    As above, consider whether a 'from_dialect' argument is needed for consistency in transpiling queries.
  • Reason this comment was not posted:
    Marked as duplicate.
6. pandasai/query_builders/sql_parser.py:55
  • Draft comment:
    Consider adding error handling (try/except) around parse_one to provide clearer feedback on SQL parsing failures.
  • Reason this comment was not posted:
    Marked as duplicate.
7. tests/unit_tests/data_loader/test_sql_loader.py:156
  • Draft comment:
    Test assertion relies on specific SQL formatting which may be brittle if sqlglot's pretty-printing changes.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment raises a valid concern - the test is now asserting exact string matching including whitespace and newlines, which could break if the SQL formatter's output changes. However, this seems to be an intentional change to match the actual behavior of the system. The test should reflect the real behavior.
    The comment may be too speculative - we don't know if sqlglot's formatting is actually likely to change. The test might intentionally verify the exact formatting.
    While the formatting could change, tests should verify actual behavior. If the formatting changes, the test should be updated to match the new behavior.
    The comment should be deleted. While it raises a valid point about test brittleness, exact string matching is sometimes necessary in tests and changes to the formatting would require intentional test updates.
8. tests/unit_tests/query_builders/test_sql_parser.py:62
  • Draft comment:
    Basing tests on exact output formatting can lead to brittle tests if the underlying formatter changes. Consider using a more flexible comparison.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_A6ReyntpAVBI0Alg


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/data_loader/duck_db_connection_manager.py Outdated Show resolved Hide resolved
pandasai/query_builders/sql_parser.py Show resolved Hide resolved
@gventuri gventuri merged commit 5e025ed into sinaptik-ai:main Feb 7, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants