Skip to content

Commit

Permalink
fix[SIN-292]: remove virtualized param from data loader
Browse files Browse the repository at this point in the history
  • Loading branch information
gventuri committed Jan 16, 2025
1 parent 9ea50c7 commit f9644cf
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 15 deletions.
30 changes: 16 additions & 14 deletions docs/v3/dataframes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,35 @@ title: 'Semantic Dataframes'
description: 'Working with semantic dataframes in PandaAI'
---

Once you have turned raw data into semantic enhanced dataframes with the [semantic layer](/v3/semantic-layer), you can load them locally as either materialized or virtualized dataframes.
Once you have turned raw data into semantic enhanced dataframes with the [semantic layer](/v3/semantic-layer), you can load them as either materialized or virtualized dataframes, depending on the data source.
Using the `.chat` method, you can ask questions and get responses and charts.
Both materialized and virtualized dataframes can be [shared with your team](/v3/share-dataframes) by pushing them to our [data platform](/v3/ai-dashboards).
These dataframes can be [shared with your team](/v3/share-dataframes) by pushing them to our [data platform](/v3/ai-dashboards).

## Materialized Dataframes

Materialized dataframes load the entire dataset into memory, providing:
When working with local files (CSV, Parquet) or datasets based on such files, the dataframes are materialized, meaning:
- Data is loaded entirely into memory
- Fast access to all data
- Full in-memory operations
- Ideal for small to medium datasets
- Ideal for local file processing or cross-source analysis

```python
from pandasai import load
import pandas as pd
from pandasai import SmartDataframe

# Load as materialized dataframe (default)
df = load("organization/dataset-name")
# Load local files as materialized dataframes
df = pd.read_csv("local_file.csv")
smart_df = SmartDataframe(df)
```

## Virtualized Dataframes

Virtualized dataframes are ideal for large datasets as they:
- Minimize memory usage
- Load data on-demand rather than all at once
- Support the same operations as materialized dataframes
When loading remote datasets, dataframes are virtualized by default, providing:
- Minimal memory usage through on-demand data loading
- Efficient handling of large datasets
- Optimal for remote data sources

```python
from pandasai import load

# Load as virtualized dataframe
df = load("organization/dataset-name", virtualized=True)
# Load remote datasets (virtualized by default)
df = load("organization/dataset-name")
2 changes: 1 addition & 1 deletion pandasai/dataframe/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ def pull(self):
from pandasai import DatasetLoader

dataset_loader = DatasetLoader()
df = dataset_loader.load(self.path, virtualized=not isinstance(self, DataFrame))
df = dataset_loader.load(self.path)
self.__init__(
df, schema=df.schema, name=df.name, description=df.description, path=df.path
)
Expand Down

0 comments on commit f9644cf

Please sign in to comment.