Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update v3 documentation, readme and examples #1526

Merged
merged 6 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,16 @@ Load your data, save them as a dataframe, and push them to the platform
```python
import pandasai as pai

pai.api_key.set("your-pai-api-key")

df = pai.read_csv("./filepath.csv")
df.push()

df.save(path="your-organization/dataset-name",
df = pai.create(path="your-organization/dataset-name",
df=df,
name="dataset-name",
description="dataset-description")

df.push()
```
Your team can now access and query this data using natural language through the platform.

Expand Down
4 changes: 3 additions & 1 deletion docs/v3/data-ingestion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ import pandasai as pai
df = pai.read_csv("data.csv")

# Use the semantic layer on CSV
df.save(
df = pai.create(
path="company/sales-data",
name="sales_data",
df = df,
description="Sales data from our retail stores",
columns={
"transaction_id": {"type": "string", "description": "Unique identifier for each sale"},
Expand Down
14 changes: 6 additions & 8 deletions docs/v3/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install pandasai
## Chat with your data

In order to use PandaAI, you need a large language model (LLM). While you can use any LLM, for the purpose of this guide, we are using BambooLLM.
You can get your free API key signing up at [pandabi.ai](https://app.pandabi.ai), which allows you to both use the data platform and get BambooLLM credits.
You can get your free API key signing up at [app.pandabi.ai](https://app.pandabi.ai), which allows you to both use the data platform and get BambooLLM credits.

```python
import pandasai as pai
Expand Down Expand Up @@ -49,7 +49,7 @@ Depending on your question, it can return different objects:

Find it more about output data formats [here](/v3/output-formats)

### Save and load dataframes
### Create and load dataframes

To work faster with your data, you can save dataframes.
This allows you to avoid reading the data every time.
Expand All @@ -60,20 +60,18 @@ import pandasai as pai
# read csv - replace "filepath" with your file path
df = pai.read_csv("filepath")

df.save(path="organization/dataset-name",
df = pai.create(path="organization/dataset-name",
name="dataset-name",
df = df,
description="describe your dataset")
```

The saved dataframe can then be loaded using the `load` method.
The created dataframe can then be loaded using the `load` method.

```python
import pandasai as pai

# read csv - replace "filepath" with your file path
df = pai.read_csv("filepath")

df.load("organization/dataset-name")
df = pai.load("organization/dataset-name")
```

## Share with your team
Expand Down
66 changes: 58 additions & 8 deletions docs/v3/semantic-layer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,19 @@ The semantic layer allows you to turn raw data into [dataframes](/v3/dataframes)

There are two ways to use the semantic layer:

### For CSV files: using the save method
### For CSV files: using the create method

The simplest way to create a semantic layer for CSV files is using the `save` method:
The simplest way to create a semantic layer for CSV files is using the `create` method:

```python
import pandasai as pai

df = pai.read_csv("data.csv")

df.save(
df = pai.create(
path="company/sales-data", # Format: "organization/dataset"
name="sales-data", # Human-readable name
df = df, # Input Dataframe
description="Sales data from our retail stores", # Optional description
columns=[
{
Expand All @@ -44,10 +45,12 @@ df.save(

#### name

The name field identifies your dataset in the save method.
The name field identifies your dataset in the create method.

```python
df.save(
df = pai.read_csv("data.csv")

pai.create(
path="company/sales-data",
name="sales-data", # Unique, descriptive name
...
Expand All @@ -59,13 +62,55 @@ df.save(
- Unique within your project
- Examples: "sales-data", "customer-profiles"

#### path

The path uniquely identifies your dataset in the PandaAI ecosystem using the format "organization/dataset".

```python
df = pai.read_csv("data.csv")

pai.create(
path="acme-corp/sales-data", # Format: "organization/dataset"
...
)
```

**Type**: `str`
- Must follow the format: "organization-identifier/dataset-identifier"
- Organization identifier should be unique to your organization
- Dataset identifier should be unique within your organization
- Can be used both locally and with the PandaAI Data Platform
- Examples: "acme-corp/sales-data", "my-org/customer-profiles"

#### df

The input dataframe that contains your data, typically created using `pai.read_csv()`.

```python
df = pai.read_csv("data.csv") # Create the input dataframe

pai.create(
path="acme-corp/sales-data",
df=df, # Pass your dataframe here
...
)
```

**Type**: `DataFrame`
- Must be a pandas DataFrame created with `pai.read_csv()`
- Contains the raw data you want to enhance with semantic information
- Required parameter for creating a semantic layer

#### description
A clear text description that helps others understand the dataset's contents and purpose.

```python
df.save(
df = pai.read_csv("data.csv")

pai.create(
path="company/sales-data",
name="sales-data",
df = df,
description="Daily sales transactions from all retail stores, including transaction IDs, dates, and amounts",
...
)
Expand All @@ -77,14 +122,19 @@ df.save(
- Any relevant context about data collection or usage
- Optional but recommended for better data understanding


#### columns
Define the structure and metadata of your dataset's columns to help PandaAI understand your data better.

**Note**: If the `columns` parameter is not provided, all columns from the input dataframe will be included in the semantic layer.
When specified, only the declared columns will be included, allowing you to select specific columns for your semantic layer.

```python
df.save(
df = pai.read_csv("data.csv")

pai.create(
path="company/sales-data",
name="sales-data",
df = df,
description="Daily sales transactions from all retail stores",
columns=[
{
Expand Down
18 changes: 10 additions & 8 deletions examples/data_platform_guide.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Save Dataframes\n",
"## 1. Create Dataframes\n",
"\n",
"The `save()` method allows you to save your dataframes with metadata and column descriptions. This enriches your data with semantic meaning."
"The `create()` method allows you to save your dataframes with metadata and column descriptions. This enriches your data with semantic meaning."
]
},
{
Expand All @@ -71,10 +71,11 @@
"metadata": {},
"outputs": [],
"source": [
"# Save heart disease dataset with semantic information\n",
"heart_df.save(\n",
"# Create heart disease dataset with semantic information\n",
"heart = pai.create(\n",
" path=\"my-team/heart\",\n",
" name=\"Heart Disease Data\",\n",
" df = heart_df,\n",
" description=\"Dataset containing heart disease patient information\",\n",
" columns=[\n",
" {\"name\": \"Age\", \"type\": \"integer\", \"description\": \"Age of the patient in years\"},\n",
Expand All @@ -90,9 +91,10 @@
")\n",
"\n",
"# Save loans dataset\n",
"loans_df.save(\n",
"loans = pai.create(\n",
" path=\"my-team/loans\",\n",
" name=\"Loan Payments Data\",\n",
" df = loans_df,\n",
" description=\"Dataset containing loan payment information\",\n",
" columns=[\n",
" {\"name\": \"loan_id\", \"type\": \"integer\", \"description\": \"Unique identifier for each loan\"},\n",
Expand Down Expand Up @@ -120,8 +122,8 @@
"outputs": [],
"source": [
"# Push datasets to platform\n",
"heart_df.push('my-team-slug/heart')\n",
"loans_df.push('my-team-slug/loans')"
"heart.push('my-team-slug/heart')\n",
"loans.push('my-team-slug/loans')"
]
},
{
Expand Down Expand Up @@ -160,7 +162,7 @@
"outputs": [],
"source": [
"# Pull latest versions\n",
"latest_heart = pai.pull('my-team--slug/heart')\n",
"latest_heart = pai.pull('my-team-slug/heart')\n",
"latest_loans = pai.pull('my-team-slug/loans')"
]
},
Expand Down
35 changes: 21 additions & 14 deletions examples/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@
"This notebook demonstrates how to get started with PandaAI and how to use it to analyze data through natural language."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -52,11 +57,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save DataFrame\n",
"## Chat with Your Data\n",
"\n",
"To use PandaAI in combination with our Data Platform, you need to save your dataframes. \n",
"The path must be in format 'organization/dataset'. \n",
"You can create organizations directly within the data platform ([pandabi.ai](https://pandabi.ai))"
"You can ask questions about your data using natural language"
]
},
{
Expand All @@ -65,18 +68,19 @@
"metadata": {},
"outputs": [],
"source": [
"df.save(path=\"your-organization/heart\",\n",
" name=\"Heart\",\n",
" description=\"Heart Disease Dataset\")"
"response = df.chat(\"What is the correlation between age and cholesterol?\")\n",
"print(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Chat with Your Data\n",
"## Create Dataset\n",
"\n",
"Now you can ask questions about your data using natural language"
"To use PandaAI in combination with our Data Platform, you need to save your dataframes. \n",
"The path must be in format 'organization/dataset'. \n",
"You can create organizations directly within the data platform ([pandabi.ai](https://pandabi.ai))"
]
},
{
Expand All @@ -85,15 +89,17 @@
"metadata": {},
"outputs": [],
"source": [
"response = df.chat(\"What is the correlation between age and cholesterol?\")\n",
"print(response)"
"dataset = pai.create(path=\"your-organization/heart\",\n",
" name=\"Heart\",\n",
" df = df,\n",
" description=\"Heart Disease Dataset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Turn the dataframe into a chatbot\n",
"## Turn the dataframe into a shareble link with a built-in chatbot\n",
"\n",
"Push the dataframe to the data platform for collaboration.\n",
"This will turn the dataframe into a chatbot and allow non technical users to interact with your data using natural language."
Expand All @@ -105,8 +111,9 @@
"metadata": {},
"outputs": [],
"source": [
"df = pai.load(\"your-organization/heart\")\n",
"df.push()"
"dataset = pai.load(\"your-organization/heart\")\n",
"\n",
"dataset.push()"
]
}
],
Expand Down
2 changes: 2 additions & 0 deletions examples/semantic_layer_csv.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
"Requirements for the semantic layer:\n",
"- `path`: Must be in format 'organization/dataset'\n",
"- `name`: A descriptive name for the dataset\n",
"- `df`: A dataframe\n",
"- `description`: Brief overview of the dataset\n",
"- `columns`: List of dictionaries with format:\n",
" ```python\n",
Expand All @@ -82,6 +83,7 @@
"df.save(path=\"organization/heart\",\n",
" name=\"Heart\",\n",
" description=\"Heart Disease Dataset\",\n",
" df = df,\n",
" columns=[\n",
" {\n",
" \"name\": \"Age\",\n",
Expand Down
Loading