-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(Views): views for a single dataframe #1594
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 7480ad9 in 1 minute and 52 seconds
More details
- Looked at
316
lines of code in7
files - Skipped
0
files when reviewing. - Skipped posting
14
drafted comments based on config settings.
1. pandasai/__init__.py:117
- Draft comment:
Good use of parsed_columns. Ensure that overriding df-inferred columns with provided ones is intentional. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. pandasai/data_loader/semantic_layer_schema.py:313
- Draft comment:
Validation update: requiring at least one relation or column for views is clearer. Confirm that this new rule meets all use cases. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
3. pandasai/data_loader/view_loader.py:47
- Draft comment:
Fallback in _get_dependencies_datasets uses schema.columns[0]. Ensure schema.columns is non-empty when relations are absent. - Reason this comment was not posted:
Marked as duplicate.
4. pandasai/helpers/sql_sanitizer.py:7
- Draft comment:
Renaming to sanitize_view_column_name clarifies its purpose. No issues observed. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
5. pandasai/query_builders/view_query_builder.py:22
- Draft comment:
Custom normalization methods for view column names and aliases help mitigate injection risks. Review injection tests to confirm consistency. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
6. pandasai/__init__.py:117
- Draft comment:
Parsed-columns initialization is clear. Consider verifying if an empty list should be handled explicitly. - Reason this comment was not posted:
Marked as duplicate.
7. pandasai/__init__.py:130
- Draft comment:
Passing 'parsed_columns' to the SemanticLayerSchema for view/source is a good update; ensure this behavior is documented for users. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is purely informative and suggests ensuring documentation, which violates the rule against making purely informative comments and asking for documentation updates.
8. pandasai/data_loader/semantic_layer_schema.py:313
- Draft comment:
Updated validation now requires at least one relation or column for a view. Confirm that allowing a view with only columns (no relations) fits your design intent. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =50%
<= threshold50%
This comment is asking the PR author to confirm their design intent, which violates the rule against asking for confirmation of intention. However, it also points out a specific change in validation logic that could have significant implications, which is useful feedback. The comment could be rephrased to focus on the implications of the change rather than asking for confirmation.
9. pandasai/data_loader/semantic_layer_schema.py:332
- Draft comment:
The uncovered_tables check now triggers only for multi-table scenarios. Verify that this logic correctly permits single-table views without raising errors. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is asking the PR author to verify the logic, which is against the rules. It doesn't provide a specific suggestion or point out a clear issue. It should be removed.
10. pandasai/data_loader/view_loader.py:47
- Draft comment:
Accessing self.schema.columns[0] assumes columns are present when relations are absent. Consider adding a safeguard or clear documentation of preconditions. - Reason this comment was not posted:
Marked as duplicate.
11. pandasai/helpers/sql_sanitizer.py:7
- Draft comment:
Renaming to 'sanitize_view_column_name' improves clarity for view contexts; ensure all call sites are updated accordingly. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is about checking call sites after a rename, which would be automatically caught by the build/compiler. Any IDE or build system would flag unresolved references. This falls under the rule "Do NOT comment on anything that would be obviously caught by the build, such as variable renames."
The function rename might have semantic implications that aren't caught by the compiler, like ensuring the new name accurately reflects its usage.
While the semantic change is relevant, asking to "ensure all call sites are updated" is specifically about the mechanical rename process, which is handled by tooling.
The comment should be deleted as it primarily concerns a mechanical rename that would be caught by the build system.
12. pandasai/query_builders/view_query_builder.py:22
- Draft comment:
Consider adding error-handling in 'normalize_view_column_name' to catch parsing issues if the sanitized string isn't a valid SQL expression. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment points out a legitimate concern - parse_one() could throw exceptions if the input isn't valid SQL. However, sanitize_view_column_name() should already ensure the input is safe. Looking at the usage, this is only called on column names from the schema, which should be validated elsewhere. Adding error handling here would be defensive programming but may not be necessary.
I may be assuming too much about the validation happening elsewhere. The sanitize function may not guarantee valid SQL syntax.
While input validation could fail, error handling should be handled at the schema validation level, not in this utility function. This function assumes valid inputs.
The comment should be deleted as it suggests defensive programming that would be better handled elsewhere in the validation chain.
13. pandasai/query_builders/view_query_builder.py:27
- Draft comment:
The alias normalization (replacing dots with underscores) is effective; verify that the resulting aliases remain consistent across different SQL engines. - Reason this comment was not posted:
Marked as duplicate.
14. tests/unit_tests/query_builders/test_view_query_builder.py:77
- Draft comment:
Injection tests are comprehensive. Ensure that the expected outputs (including alias quoting) remain stable if underlying sanitization logic evolves. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_WA7nfrDzslIdmc4w
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks strong to me, only left a minor comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation fails for mysql database it gives this error:
pandas.errors.DatabaseError: Execution failed on sql 'SELECT COUNT(*) AS "total_rows" FROM "hearts" AS hearts': (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'"hearts" AS hearts\' at line 3')
Important
Adds support for views with columns in dataframes, updating schema, query building, and SQL sanitization logic, with corresponding tests.
create()
in__init__.py
now supports views with columns, updating schema creation logic.check_columns_relations()
insemantic_layer_schema.py
now allows views with either relations or columns.ViewDatasetLoader
inview_loader.py
updated to handle dependencies when columns are present.sanitize_relation_name()
tosanitize_view_column_name()
insql_sanitizer.py
.sanitize_view_column_name()
inview_query_builder.py
and tests.ViewQueryBuilder
inview_query_builder.py
now normalizes view column names and aliases._get_table_expression()
to handle views with columns.test_sql_sanitizer.py
to testsanitize_view_column_name()
.test_view_query_builder.py
to test new view handling logic.This description was created by
for 7480ad9. It will automatically update as commits are pushed.