Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sandbox): adding docker sandbox #1517

Merged
merged 19 commits into from
Jan 28, 2025
Merged

feat(sandbox): adding docker sandbox #1517

merged 19 commits into from
Jan 28, 2025

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Jan 14, 2025

Important

Adds a Docker-based sandbox environment for secure code execution in PandaAI, with comprehensive documentation and testing.

  • Sandbox Environment:
    • Introduces DockerSandbox class in pandasai_docker/docker_sandbox.py for isolated code execution using Docker.
    • Adds Sandbox base class in sandbox.py for sandbox interface.
    • Updates Agent class in base.py to support sandbox execution.
  • Documentation:
    • Updates agent.mdx to include sandbox usage instructions.
    • Adds README.md for Docker sandbox extension.
  • Testing:
    • Adds unit tests for DockerSandbox in test_sandbox.py.
    • Adds tests for ResponseSerializer in test_serializer.py.
    • Adds tests for Sandbox base class in test_sandbox.py.
  • Miscellaneous:
    • Updates mint.json to change logo href.
    • Adds Dockerfile for sandbox environment setup.
    • Adds pyproject.toml for managing dependencies with Poetry.

This description was created by Ellipsis for 7c18a8c. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri January 14, 2025 14:27
@ArslanSaleem ArslanSaleem marked this pull request as draft January 14, 2025 14:27
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 14, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to fdcfb87 in 1 minute and 57 seconds

More details
  • Looked at 338 lines of code in 6 files
  • Skipped 1 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. pandasai/sandbox/Dockerfile:7
  • Draft comment:
    It's recommended to specify versions for the Python packages installed to ensure reproducibility. For example:
RUN pip install pandas==1.3.3 numpy==1.21.2 matplotlib==3.4.3
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The Dockerfile installs specific Python packages but does not specify versions. This can lead to inconsistencies if the packages are updated. It's a best practice to specify package versions to ensure reproducibility.
2. pandasai/sandbox/docker_sandbox.py:99
  • Draft comment:
    Ensure that dfs is not empty before accessing dfs[0] to avoid potential IndexError.
  • Reason this comment was not posted:
    Comment did not seem useful.
3. pandasai/sandbox/docker_sandbox.py:71
  • Draft comment:
    Consider expanding the SQL query detection to include other SQL operations like "INSERT", "UPDATE", or "DELETE" for a more comprehensive extraction.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The extract_sql_queries_from_code method only looks for SQL queries containing "SELECT". This might miss other types of SQL queries like "INSERT", "UPDATE", or "DELETE". Consider expanding the logic to capture a broader range of SQL queries.
4. pandasai/sandbox/docker_sandbox.py:147
  • Draft comment:
    Consider making the path "/tmp" configurable or document the reason for choosing this path for transferring the tar archive to the container.
  • Reason this comment was not posted:
    Confidence changes required: 30%
    The pass_csv method uses a hardcoded path "/tmp" for transferring the tar archive to the container. This might not be suitable for all use cases. Consider making the path configurable or document the choice of path.
5. pandasai/sandbox/sandbox.py:8
  • Draft comment:
    raise NotImplementedError("Subclasses must implement the stop method.")
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
  1. The current error message follows a consistent pattern used throughout the file. 2. The suggested change would break this consistency. 3. Both messages convey the same meaning, but the existing one matches the style used in start(), _exec_code(), and pass_csv() methods. 4. Consistency in error messages is generally more important than minor wording preferences.
    The suggested message might be marginally more concise. The consistency argument might not be as important if the other messages should also be changed.
    While the suggested message is slightly shorter, maintaining consistency across the codebase is more valuable than a minor rewording that doesn't significantly improve clarity.
    The comment should be deleted as it suggests breaking the established error message pattern without providing meaningful improvement.
6. pandasai/sandbox/sandbox.py:6
  • Draft comment:
        raise NotImplementedError("Subclasses must implement the start method.")
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The suggested change is extremely minor - just rewording the same message in a slightly different way. Both versions are grammatically correct and clear. The original message follows the same pattern as other error messages in the file. Making this change would actually make this error message inconsistent with the others.
    The suggested wording could be considered more active voice and direct. Error messages should be clear and consistent.
    While active voice is good, consistency across error messages in the same file is more important. The original message is already clear and matches the style of other messages.
    Delete this comment as it suggests an unnecessary rewording that would make the error messages less consistent within the file.
7. pandasai/sandbox/sandbox.py:28
  • Draft comment:
            "Subclasses must implement the pass_csv method."
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The current error message follows the same pattern as other error messages in the file ("The X method must be implemented by subclasses"). The suggested change would break this consistency. While both messages convey the same meaning, consistency in error messages is valuable for readability and maintenance.
    The suggested message might be considered more concise. Error messages should be clear and to the point.
    While conciseness is good, consistency across related error messages is more important for maintainability and readability. The current message already follows a clear, established pattern.
    The comment should be deleted as it suggests breaking the consistent error message pattern established in the file without providing significant benefit.

Workflow ID: wflow_LuwXf4cZub1aDXIj


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/sandbox/sandbox.py Outdated Show resolved Hide resolved
@ArslanSaleem ArslanSaleem marked this pull request as ready for review January 21, 2025 11:33
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 21, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to b630742 in 2 minutes and 14 seconds

More details
  • Looked at 996 lines of code in 14 files
  • Skipped 2 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. examples/docker_sandbox.py:1
  • Draft comment:
    Typo in comment. Change Intall to Install.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The comment on line 1 of examples/docker_sandbox.py contains a typo in the word 'Install'.
2. extensions/sandbox/docker/pandasai_docker/docker_sandbox.py:163
  • Draft comment:
    Consider adding type annotations for the csv_data parameter in pass_csv method for better code clarity.
  • Reason this comment was not posted:
    Confidence changes required: 30%
    The DockerSandbox class in docker_sandbox.py has a method pass_csv that takes a csv_data parameter, but it is not type-annotated. Adding type annotations improves code readability and helps with static analysis.
3. extensions/sandbox/docker/pandasai_docker/docker_sandbox.py:186
  • Draft comment:
    Ensure the container is running before attempting to stop and remove it in the __del__ method to avoid potential errors.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In docker_sandbox.py, the __del__ method should check if the container is running before attempting to stop and remove it to avoid potential errors.
4. extensions/sandbox/docker/pandasai_docker/serializer.py:20
  • Draft comment:
    Ensure result contains a type key before accessing it to avoid potential KeyError.
  • Reason this comment was not posted:
    Comment did not seem useful.
5. examples/docker_sandbox.py:1
  • Draft comment:
    Typo in comment. Change Intall to Install.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The comment on line 1 of examples/docker_sandbox.py contains a typo in the word 'Install'.
6. examples/docker_sandbox.py:21
  • Draft comment:
    Typo in method call. Change sanbox.stop() to sandbox.stop().
  • Reason this comment was not posted:
    Marked as duplicate.
7. extensions/sandbox/docker/pandasai_docker/docker_sandbox.py:88
  • Draft comment:
    Typo in parameter name. Change enviroment to environment. This typo appears in multiple places in this file.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_fREAPEFqLWwZDa2Y


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

examples/docker_sandbox.py Outdated Show resolved Hide resolved
examples/docker_sandbox.py Outdated Show resolved Hide resolved
extensions/sandbox/docker/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on f880b01 in 40 seconds

More details
  • Looked at 61 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pandasai/sandbox/sandbox.py:23
  • Draft comment:
            "The _exec_code method must be implemented by subclasses."
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. extensions/sandbox/docker/pandasai_docker/docker_sandbox.py:118
  • Draft comment:
                "execute_sql_query function is not defined in the environment."
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_jU5dIBuygeamV8WF


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 221c1dc in 13 seconds

More details
  • Looked at 11 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. extensions/sandbox/docker/README.md:8
  • Draft comment:
    The change from poetry install to poetry add is correct for adding a package dependency.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The change from 'poetry install' to 'poetry add' is correct for adding a package dependency. However, the PR description is empty, which is not ideal for understanding the context of the change.
2. extensions/sandbox/docker/README.md:8
  • Draft comment:
    The change from poetry install to poetry add is correct and improves the installation instructions. No issues found here.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The change in the README.md file is correct and improves the installation instructions. No issues found.

Workflow ID: wflow_a3ZTqJFTY63AjpZz


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped PR review on a341e07 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

Copy link
Collaborator

@gventuri gventuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, just a few leftovers!

examples/docker_sandbox.py Outdated Show resolved Hide resolved
extensions/sandbox/docker/pyproject.toml Outdated Show resolved Hide resolved
@gventuri
Copy link
Collaborator

@ArslanSaleem one of the most interesting features of the docker is the fact of running the code without internet connection.

Docker allows it with --network=no-internet.

Could we enforce it so that it can run offline?

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 95.83333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.80%. Comparing base (63e8c26) to head (79f71f6).
Report is 152 commits behind head on release/v3.

Files with missing lines Patch % Lines
pandasai/agent/base.py 75.00% 1 Missing ⚠️
pandasai/core/code_execution/code_executor.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           release/v3    #1517      +/-   ##
==============================================
- Coverage       79.09%   78.80%   -0.29%     
==============================================
  Files             149       61      -88     
  Lines            6013     2015    -3998     
==============================================
- Hits             4756     1588    -3168     
+ Misses           1257      427     -830     
Flag Coverage Δ
unittests 78.80% <95.83%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ArslanSaleem
Copy link
Collaborator Author

@ArslanSaleem one of the most interesting features of the docker is the fact of running the code without internet connection.

Docker allows it with --network=no-internet.

Could we enforce it so that it can run offline?

Done

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on a78039d in 39 seconds

More details
  • Looked at 15 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_XvwQSg5a7DWDlIzr


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/sandbox/sandbox.py Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 5d99577 in 14 seconds

More details
  • Looked at 35 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_62CoZ3EI3jQYCCc7


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 22, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 1806976 in 32 seconds

More details
  • Looked at 90 lines of code in 2 files
  • Skipped 1 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/v3/agent.mdx:137
  • Draft comment:
    The addition of the sandbox environment is well-documented. Ensure users are aware of the security benefits and usage instructions.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    The addition of the sandbox environment is a significant change, and the documentation should ensure that users are aware of the security benefits and usage instructions. The documentation seems clear and provides necessary details.
2. docs/v3/agent.mdx:137
  • Draft comment:
    Ensure that there are corresponding unit tests for the new sandbox feature to verify its functionality.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_OS4qeKIVzVU72BOR


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -134,6 +134,56 @@ print(response)
# The model will use the information provided in the training to generate a response
```

## Using the Sandbox Environment

To enhance security and protect against malicious code through prompt injection, PandaAI provides a sandbox environment for code execution. The sandbox runs your code in an isolated Docker container, ensuring that potentially harmful operations are contained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would be useful to specify that the docker daemon must be running in the machine before using this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scaliseraoul Yes definitely we should mention

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArslanSaleem great, can we add it accordingly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@scaliseraoul scaliseraoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code is solid and well tested.

There are few things that could be discussed.

  1. The first time user run
    sandbox = DockerSandbox() 2 container with random names get created in Docker. And not deleted.

  2. It would be more solid to have the serializer code as part of the pandasai-docker image instead of adding the code in the code_executor.

  3. The SQL function is still executed outside of the docker so we should think how to avoid any DML statements (INSERT, UPDATE, DELETE, DROP etc.)

  4. Should df.chat and pai.chat have the possibility to use the sandbox as well?

Copy link
Contributor

@scaliseraoul-sinaptik scaliseraoul-sinaptik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on previous feedback.

@gventuri
Copy link
Collaborator

  1. Seems to be a bug to me
  2. Not much opinionated on this. Both have pros and cons
  3. Aren’t we already checking for that?
  4. Yess, let’s also make it available to df.chat and pai.chat

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 878f325 in 50 seconds

More details
  • Looked at 85 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. extensions/sandbox/docker/pandasai_docker/Dockerfile:3
  • Draft comment:
LABEL image_name="pandasai-sandbox"
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. extensions/sandbox/docker/pandasai_docker/docker_sandbox.py:20
  • Draft comment:
        self._image_name: str = image_name
  • Reason this comment was not posted:
    Comment was not on a valid diff hunk.
3. extensions/sandbox/docker/pandasai_docker/Dockerfile:3
  • Draft comment:
    The image name in the LABEL should be consistent with the rest of the code.
LABEL image_name="pandasai-sandbox"
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_m3miv8VMEUEd1pd6


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 7c18a8c in 2 minutes and 4 seconds

More details
  • Looked at 66 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. extensions/sandbox/docker/tests/test_sandbox.py:59
  • Draft comment:
    Ensure mock_subprocess.run is called with the correct image name and arguments.
        mock_subprocess.run.assert_called_once_with([
            "docker",
            "build",
            "-f",
            dockerfile_path,
            "-t",
            image_name,
            ".",
        ], check=True, capture_output=True, text=True)
  • Reason this comment was not posted:
    Comment did not seem useful.
2. extensions/sandbox/docker/tests/test_sandbox.py:38
  • Draft comment:
    The test test_build_image has been correctly updated to mock subprocess.run instead of self._client.images.build. Ensure that the test covers all necessary scenarios for the new implementation.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The _build_image method in docker_sandbox.py has been modified to use subprocess.run instead of self._client.images.build. The corresponding test test_build_image has been updated to mock subprocess.run. This is a good practice to ensure the new implementation is tested.

Workflow ID: wflow_xeBc5cpcz4LXXbH5


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ArslanSaleem and others added 2 commits January 24, 2025 17:31
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@gventuri gventuri merged commit d4a1d0e into release/v3 Jan 28, 2025
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants