-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Expr.to_dicts() #10697
base: main
Are you sure you want to change the base?
feat: add Expr.to_dicts() #10697
Conversation
@@ -586,6 +586,44 @@ def to_delta( | |||
with expr.to_pyarrow_batches(params=params) as batch_reader: | |||
write_deltalake(path, batch_reader, **kwargs) | |||
|
|||
@util.experimental | |||
def to_dicts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but maybe call it to_records()
?
One of the issues with methods like these (to_list, to_dicts, to_records) is that their API tends to add support for different shapes by stuffing in a bunch of non-mutually exclusive arguments. Can we a pick a name for this that is both concise and mostly-unambiguously indicates the shape of the dictionary/list-of-dictionaries? |
Here are my current thoughts:
|
I went through the same train of thought as Phillip and that is why I chose to_dicts after considering to_records. Phillip are you saying you are happy with to_dicts or do you see a problem with it? The two basic shapes I see are Iterable[mapping] (this PR) and Iterable[Iterable] such as an Iterable of tuples or named tuples. Is there another basic shape we might want to support? I DEFINITELY want different methods for the different shapes. Eg avoid pandas chaos where their to_records() gives you totally different things depending on what you pass to orient. But I'm not sure if I think we need a single method for every concrete type. Eg do we want to_tuples() and to_namedtuples(), or to_rows(record_type=tuple | namedtuple)? |
fixes #9185
I hope that the name
to_dicts
won't lead people to think this returns adict[column_name, list[values]
. I don't think it will. This is why I included theColumn.to_list()
method in the "See Also" section. I can be more explicit there if you think this is good. I considered the nameto_records
, but I liketo_dicts
more. Open ot other ideas.I'm not sure if we want to sneak this in for 10.0, the implementation doesn't look that hard to review, but the increased API is really what to watch out for here I think.