Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Expr.to_dicts() #10697

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Jan 21, 2025

fixes #9185

I hope that the name to_dicts won't lead people to think this returns a dict[column_name, list[values]. I don't think it will. This is why I included the Column.to_list() method in the "See Also" section. I can be more explicit there if you think this is good. I considered the name to_records, but I like to_dicts more. Open ot other ideas.

I'm not sure if we want to sneak this in for 10.0, the implementation doesn't look that hard to review, but the increased API is really what to watch out for here I think.

@github-actions github-actions bot added the tests Issues or PRs related to tests label Jan 21, 2025
@NickCrews NickCrews added the docs-preview Add this label to trigger a docs preview label Jan 21, 2025
@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Jan 21, 2025
@NickCrews NickCrews added feature Features or general enhancements format Issues related to output formats labels Jan 21, 2025
@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Jan 21, 2025

@@ -586,6 +586,44 @@ def to_delta(
with expr.to_pyarrow_batches(params=params) as batch_reader:
write_deltalake(path, batch_reader, **kwargs)

@util.experimental
def to_dicts(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but maybe call it to_records()?

@cpcloud
Copy link
Member

cpcloud commented Jan 22, 2025

One of the issues with methods like these (to_list, to_dicts, to_records) is that their API tends to add support for different shapes by stuffing in a bunch of non-mutually exclusive arguments.

Can we a pick a name for this that is both concise and mostly-unambiguously indicates the shape of the dictionary/list-of-dictionaries?

@cpcloud
Copy link
Member

cpcloud commented Jan 22, 2025

Here are my current thoughts:

  • to_dicts: implies a sequence because of the pluralization, pigeonholes the output type in perpetuity to literal python dictionaries
  • to_records: implies a mapping or something like named tuple (a thing with fields), leaves the individual record type open.

@NickCrews
Copy link
Contributor Author

I went through the same train of thought as Phillip and that is why I chose to_dicts after considering to_records. Phillip are you saying you are happy with to_dicts or do you see a problem with it?

The two basic shapes I see are Iterable[mapping] (this PR) and Iterable[Iterable] such as an Iterable of tuples or named tuples. Is there another basic shape we might want to support?

I DEFINITELY want different methods for the different shapes. Eg avoid pandas chaos where their to_records() gives you totally different things depending on what you pass to orient. But I'm not sure if I think we need a single method for every concrete type. Eg do we want to_tuples() and to_namedtuples(), or to_rows(record_type=tuple | namedtuple)?
I think I am more of a fan of splitting them into separate methods. One reason is that namedtuples blue the line between being an Iterable and being a mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements format Issues related to output formats tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Table.to_dicts()
3 participants