-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(Views): adding view for local sources #1586
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to d4b9671 in 2 minutes and 56 seconds
More details
- Looked at
953
lines of code in27
files - Skipped
0
files when reviewing. - Skipped posting
11
drafted comments based on config settings.
1. tests/unit_tests/data_loader/test_sql_loader.py:124
- Draft comment:
Ensure that expected transformations (like email anonymization and timestamp timezone conversion) are stable. Consider comparing key parts of the output rather than the full DataFrame to avoid brittle tests. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. tests/unit_tests/prompts/test_sql_prompt.py:63
- Draft comment:
The expected prompt string is very long and hardcoded. Consider using a dedicated fixture or helper to generate the expected output for better maintainability and to reduce brittleness if formatting changes. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. tests/unit_tests/query_builders/test_view_query_builder.py:24
- Draft comment:
The expected multi-line query string in 'test_build_query' is sensitive to whitespace/line breaks. Consider normalizing whitespace in both the generated and expected queries before comparing. - Reason this comment was not posted:
Comment was on unchanged code.
4. tests/unit_tests/data_loader/test_loader.py:62
- Draft comment:
Using 'builtins.open' patch for schema file loading is good, but make sure file encoding and newline consistency are handled in both production and test code. - Reason this comment was not posted:
Confidence changes required:50%
None
5. tests/unit_tests/data_loader/test_duckdbmanager.py:11
- Draft comment:
Typo in test method name: 'doesnt trow' should be 'doesn't throw'. - Reason this comment was not posted:
Marked as duplicate.
6. tests/unit_tests/data_loader/test_loader.py:58
- Draft comment:
Ensure schema names are sanitized properly. The test 'test_load_schema_mysql_sanitized_name' expects 'non_sanitized-name' to become 'non_sanitized_name'. Verify that the sanitizer consistently replaces invalid characters with underscores. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. tests/unit_tests/query_builders/test_view_query_builder.py:24
- Draft comment:
The expected multiline query string in 'test_build_query' is tightly coupled with whitespace and newlines. Consider using textwrap.dedent to standardize formatting and reduce brittleness in tests. - Reason this comment was not posted:
Marked as duplicate.
8. tests/unit_tests/prompts/test_sql_prompt.py:65
- Draft comment:
The long expected prompt string literal could be simplified using a dedented multiline string (using textwrap.dedent), to improve readability and reduce potential whitespace issues. - Reason this comment was not posted:
Comment was on unchanged code.
9. tests/unit_tests/helpers/test_sql_sanitizer.py:63
- Draft comment:
The test 'test_safe_select_with_comment' correctly blocks queries with comments. Consider adding a note if safe inline comments should ever be permitted or if this strict rule is intentional for security. - Reason this comment was not posted:
Confidence changes required:50%
None
10. pandasai/query_builders/sql_query_builder.py:5
- Draft comment:
Ensure that using lower() on the table name in _get_from_statement is consistent with desired behavior; if source table names may have uppercase letters, this conversion might cause issues in some SQL engines. - Reason this comment was not posted:
Comment did not seem useful.
11. pandasai/query_builders/view_query_builder.py:60
- Draft comment:
The alias for the view schema uses a hyphen (e.g. 'parent-children'). SQL aliases with hyphens may require quoting. Consider using underscores or quoting aliases to avoid SQL syntax issues. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_GrbqqsySHoc9gFIK
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good overall, only left a couple of comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks very good. Found two issues apart for sql injection.
- In pull method it uses
dataset_loader.load(self.path)
where load method doesn't take any argument. - In case of sql connector it is failing in
code_cleaning._clean_sql_query
where allowed table is passed as schema.name which doesn't exists in the database.
Hi @ArslanSaleem , good catch for n1, could you iterate on n2, cause schema.name it's the dataset name, and it's always present |
* commit '36b26ed4566f3a21e70ea00b5c39543d05dfbec9': chore: remove numpy dependency
…ed one, using, refactor query builders to use sqlglot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvements, only a couple of leftovers
Important
Introduces local data source handling with
DuckDBConnectionManager
, new query builders, and updates toDatasetLoader
andAgent
classes, along with comprehensive tests.DuckDBConnectionManager
induck_db_connection_manager.py
for managing DuckDB connections.LocalQueryBuilder
,SqlQueryBuilder
, andViewQueryBuilder
inquery_builders
for handling different query types.DatasetLoader
inloader.py
to support local, SQL, and view-based datasets.Agent
class inbase.py
to useDuckDBConnectionManager
for executing local SQL queries.SemanticLayerSchema
insemantic_layer_schema.py
to handle local sources without requiring a table.sanitize_relation_name
insql_sanitizer.py
for sanitizing relation names.DuckDBConnectionManager
,DatasetLoader
, and query builders intest_duckdbmanager.py
,test_loader.py
, andtest_query_builder.py
.test_sql_sanitizer.py
.query_builder.py
tobase_query_builder.py
and moved toquery_builders
directory.This description was created by
for d4b9671. It will automatically update as commits are pushed.