feature(Views): views for a single dataframe #1594

scaliseraoul · 2025-02-06T14:04:02Z

Tests added and passed if fixing a bug or adding a new feature.
All code checks passed.

Important

Adds support for views with columns in dataframes, updating schema, query building, and SQL sanitization logic, with corresponding tests.

Behavior:
- create() in __init__.py now supports views with columns, updating schema creation logic.
- check_columns_relations() in semantic_layer_schema.py now allows views with either relations or columns.
- ViewDatasetLoader in view_loader.py updated to handle dependencies when columns are present.
Sanitization:
- Renames sanitize_relation_name() to sanitize_view_column_name() in sql_sanitizer.py.
- Updates all references to sanitize_view_column_name() in view_query_builder.py and tests.
Query Building:
- ViewQueryBuilder in view_query_builder.py now normalizes view column names and aliases.
- Updates _get_table_expression() to handle views with columns.
Tests:
- Updates test_sql_sanitizer.py to test sanitize_view_column_name().
- Updates test_view_query_builder.py to test new view handling logic.

^{This description was created by}^{for 7480ad9. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 7480ad9 in 1 minute and 52 seconds

More details

Looked at 316 lines of code in 7 files
Skipped 0 files when reviewing.
Skipped posting 14 drafted comments based on config settings.

1. pandasai/__init__.py:117

Draft comment:
Good use of parsed_columns. Ensure that overriding df-inferred columns with provided ones is intentional.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

2. pandasai/data_loader/semantic_layer_schema.py:313

Draft comment:
Validation update: requiring at least one relation or column for views is clearer. Confirm that this new rule meets all use cases.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

3. pandasai/data_loader/view_loader.py:47

Draft comment:
Fallback in _get_dependencies_datasets uses schema.columns[0]. Ensure schema.columns is non-empty when relations are absent.
Reason this comment was not posted:
Marked as duplicate.

4. pandasai/helpers/sql_sanitizer.py:7

Draft comment:
Renaming to sanitize_view_column_name clarifies its purpose. No issues observed.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

5. pandasai/query_builders/view_query_builder.py:22

Draft comment:
Custom normalization methods for view column names and aliases help mitigate injection risks. Review injection tests to confirm consistency.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

6. pandasai/__init__.py:117

Draft comment:
Parsed-columns initialization is clear. Consider verifying if an empty list should be handled explicitly.
Reason this comment was not posted:
Marked as duplicate.

7. pandasai/__init__.py:130

Draft comment:
Passing 'parsed_columns' to the SemanticLayerSchema for view/source is a good update; ensure this behavior is documented for users.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is purely informative and suggests ensuring documentation, which violates the rule against making purely informative comments and asking for documentation updates.

8. pandasai/data_loader/semantic_layer_schema.py:313

Draft comment:
Updated validation now requires at least one relation or column for a view. Confirm that allowing a view with only columns (no relations) fits your design intent.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 50% <= threshold 50%
This comment is asking the PR author to confirm their design intent, which violates the rule against asking for confirmation of intention. However, it also points out a specific change in validation logic that could have significant implications, which is useful feedback. The comment could be rephrased to focus on the implications of the change rather than asking for confirmation.

9. pandasai/data_loader/semantic_layer_schema.py:332

Draft comment:
The uncovered_tables check now triggers only for multi-table scenarios. Verify that this logic correctly permits single-table views without raising errors.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
This comment is asking the PR author to verify the logic, which is against the rules. It doesn't provide a specific suggestion or point out a clear issue. It should be removed.

10. pandasai/data_loader/view_loader.py:47

Draft comment:
Accessing self.schema.columns[0] assumes columns are present when relations are absent. Consider adding a safeguard or clear documentation of preconditions.
Reason this comment was not posted:
Marked as duplicate.

11. pandasai/helpers/sql_sanitizer.py:7

Draft comment:
Renaming to 'sanitize_view_column_name' improves clarity for view contexts; ensure all call sites are updated accordingly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is about checking call sites after a rename, which would be automatically caught by the build/compiler. Any IDE or build system would flag unresolved references. This falls under the rule "Do NOT comment on anything that would be obviously caught by the build, such as variable renames."
The function rename might have semantic implications that aren't caught by the compiler, like ensuring the new name accurately reflects its usage.
While the semantic change is relevant, asking to "ensure all call sites are updated" is specifically about the mechanical rename process, which is handled by tooling.
The comment should be deleted as it primarily concerns a mechanical rename that would be caught by the build system.

12. pandasai/query_builders/view_query_builder.py:22

Draft comment:
Consider adding error-handling in 'normalize_view_column_name' to catch parsing issues if the sanitized string isn't a valid SQL expression.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment points out a legitimate concern - parse_one() could throw exceptions if the input isn't valid SQL. However, sanitize_view_column_name() should already ensure the input is safe. Looking at the usage, this is only called on column names from the schema, which should be validated elsewhere. Adding error handling here would be defensive programming but may not be necessary.
I may be assuming too much about the validation happening elsewhere. The sanitize function may not guarantee valid SQL syntax.
While input validation could fail, error handling should be handled at the schema validation level, not in this utility function. This function assumes valid inputs.
The comment should be deleted as it suggests defensive programming that would be better handled elsewhere in the validation chain.

13. pandasai/query_builders/view_query_builder.py:27

Draft comment:
The alias normalization (replacing dots with underscores) is effective; verify that the resulting aliases remain consistent across different SQL engines.
Reason this comment was not posted:
Marked as duplicate.

14. tests/unit_tests/query_builders/test_view_query_builder.py:77

Draft comment:
Injection tests are comprehensive. Ensure that the expected outputs (including alias quoting) remain stable if underlying sanitization logic evolves.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_WA7nfrDzslIdmc4w

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

gventuri

Looks strong to me, only left a minor comment

tests/unit_tests/query_builders/test_view_query_builder.py

pandasai/dataframe/virtual_dataframe.py

ArslanSaleem

Current implementation fails for mysql database it gives this error:
pandas.errors.DatabaseError: Execution failed on sql 'SELECT COUNT(*) AS "total_rows" FROM "hearts" AS hearts': (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'"hearts" AS hearts\' at line 3')

feature(Views): views for a single dataframe

7480ad9

ellipsis-dev bot reviewed Feb 6, 2025

View reviewed changes

gventuri approved these changes Feb 6, 2025

View reviewed changes

tests/unit_tests/query_builders/test_view_query_builder.py Show resolved Hide resolved

gventuri requested a review from ArslanSaleem February 6, 2025 14:46

scaliseraoul added 2 commits February 6, 2025 17:13

fix: regression on sql dataframe, added tests

522980f

fix(VirtualDataframe): postpone head loading for dependency needs

e768118

scaliseraoul commented Feb 6, 2025

View reviewed changes

pandasai/dataframe/virtual_dataframe.py Show resolved Hide resolved

ArslanSaleem requested changes Feb 6, 2025

View reviewed changes

gventuri merged commit 2455592 into sinaptik-ai:main Feb 7, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(Views): views for a single dataframe #1594

feature(Views): views for a single dataframe #1594

scaliseraoul commented Feb 6, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

gventuri left a comment

ArslanSaleem left a comment

feature(Views): views for a single dataframe #1594

feature(Views): views for a single dataframe #1594

Conversation

scaliseraoul commented Feb 6, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

gventuri left a comment

Choose a reason for hiding this comment

ArslanSaleem left a comment

Choose a reason for hiding this comment

scaliseraoul commented Feb 6, 2025 •

edited by ellipsis-dev bot

Loading