Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for striding / rolling windows #16

Open
mjwillson opened this issue May 28, 2021 · 1 comment
Open

Support for striding / rolling windows #16

mjwillson opened this issue May 28, 2021 · 1 comment

Comments

@mjwillson
Copy link

Nice library, thanks! Will it work to pass e.g. dataset.rolling(...) or dataset.rolling(...).construct(...) to xarray_beam.DatasetToChunks? (I'm assuming not, or that it would result in something hugely inefficient?) If not, would it be possible to support striding / sliding windows in DatasetToChunks?

@shoyer
Copy link
Member

shoyer commented May 28, 2021

DatasetToChunks is a rather sharp tool at the moment: you can pass in Dask backed xarray.Dataset objects and it often works, but you have to be careful (1) to ensure that the dask graph is not too large to serialize and send to each worker, and (2) that it is safe to compute individual chunks from the dataset on separate workers (without running out of memory or doing too much duplicate work).

So yes, you probably could get away with passing the result of dataset.rolling(...).construct(...) into DatasetToChunks but you would need to take some care.

Ultimately I think it would be nice to have Beam PTransforms for rolling window computation, similar to dask.array.map_overlap:
https://docs.dask.org/en/latest/array-overlap.html

I don't have any immediate plans to work on this but it would definitely be within scope for the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants