-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes for 3.0 #48
Changes for 3.0 #48
Conversation
julia> using ChunkSplitters
julia> function f(x)
s = zero(eltype(x))
for xc in chunk(x; n=4)
s += sum(xc)
end
return s
end
f (generic function with 1 method)
julia> x=rand(10^3);
julia> f(x) ≈ sum(x)
true
julia> function f_indices(x)
s = zero(eltype(x))
for idcs in chunk_indices(x; n=4)
s += sum(@view(x[idcs]))
end
return s
end
f_indices (generic function with 1 method)
julia> f_indices(x) ≈ sum(x)
true
julia> @benchmark f($x)
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
Range (min … max): 160.057 ns … 195.801 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 160.426 ns ┊ GC (median): 0.00%
Time (mean ± σ): 160.774 ns ± 1.690 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄██▄ ▂
████▇▆▅▄▅▅▅▆▇██▆▅▅▄▄▅▅▄▃▅▅▆▄▄▅▄▅▄▄▅▆▆▇▇▇███▇▇▅▆▅▅▅▅▃▅▁▄▄▄▄▁▄▆ █
160 ns Histogram: log(frequency) by time 170 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark f_indices($x)
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
Range (min … max): 159.688 ns … 181.313 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 159.898 ns ┊ GC (median): 0.00%
Time (mean ± σ): 160.350 ns ± 1.865 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▄▂ ▁ ▁
█████▆▆▆▆▇█▇▇▅▆▅▄▄▄▁▁▆▅▄▅▅▄▃▃▅▅▇▆▆▇██▇▆▅▆▆▅▄▃▅▄▄▅▅▅▄▅▆▆▃▄▆▃▄▇ █
160 ns Histogram: log(frequency) by time 171 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
Preview of the revamped documentation: https://juliafolds2.github.io/ChunkSplitters.jl/previews/PR48/ |
On 1.10, function f_chunk(x; n=nothing, size=nothing)
s = zero(eltype(x))
for xdata in chunk(x; n=n, size=size)
s += sum(xdata)
end
return s
end
x = rand(10^3)
b = @benchmark $f_chunk($x; n=4) samples = 1 evals = 1
@test b.allocs == 0
b = @benchmark $f_chunk($x; size=10) samples = 1 evals = 1
if VERSION > v"1.10"
@test b.allocs == 0
else
@test_broken b.allocs == 0 # TODO: why does this allocate on 1.10?
end julia> b = @benchmark $f_chunk($x; size=10) samples = 1 evals = 1
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 1.542 μs (0.00% GC) to evaluate,
with a memory estimate of 6.19 KiB, over 99 allocations. |
Alright guys, as it turned out, I had the motivation to implement this right away. Please checkout the new (preview) docs and run some tests to see if things are stable. What I did beyond the points mentioned in the OP is drop 1.6 support (1.10 is the new lower bound). Also, if you have an idea about #48 (comment) that would be great as well. |
|
I'll try to look at this tonight. I suspect there's an inference problem here on 1.10, but I'll dig in and see if we can figure out what went wrong. |
I still think that it might make sense to consider renaming As for renaming
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
==========================================
- Coverage 96.33% 92.03% -4.30%
==========================================
Files 2 2
Lines 300 113 -187
==========================================
- Hits 289 104 -185
+ Misses 11 9 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I've implemented the suggestions by @fredrikekre (thanks for them!). One remaining minor question is whether we should export the subtypes If we don't export them one has to write chunks(x; n=10, split=ChunkSplitters.RoundRobin())
# or, alternatively
using ChunkSplitters: RoundRobin
chunks(x; n=10, split=RoundRobin()) Again, I don't have a strong opinion but tend to slightly prefer not exporting them. We could also name them Opinions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One remaining minor question is whether we should export the subtypes <:Split
I prefer not exporting, in particular since the practice of always explicitly using
symbols anyway is getting more common (https://github.com/ericphanson/ExplicitImports.jl).
On the one hand, it is nice that one can just do using ChunkSplitters and write chunks(x; n=10, split=RoundRobin()).
This export is actually slightly annoying because the perfect variable name for the return value is chunks
, and obviously chunks = chunks(...)
doesn't work.
With the current export situation I would simply use
using ChunkSplitters: ChunkSplitters
and always use the ChunkSplitters.
prefix for clarity.
src/internals.jl
Outdated
size::Int | ||
end | ||
|
||
struct IndexChunksIterator{T,C<:Constraint,S<:Split} <: AbstractChunksIterator{T,C,S} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More bikeshedding, but since you can random access these, is the Iterator
suffix misleading? Perhaps it is just me, but I would have guessed this could only be iterated from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding indeed 😉 I'm fine dropping the Iterator
suffix. Let me just note that
- the primary use case is to iterate over these, and
- this isn't public API anyways (so we can rename these types whenever we want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer exporting the symbols, not bringing them to the namespace is always an option for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmiq, wrong thread? I assume by "the symbols" you mean Consecutive
and RoundRobin
and aren't suggesting to make the "iterator" types public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I've dropped the "Iterator" suffixes.
I like the trend of explicitly
Once again, not exporting |
Yea I just meant that if anything |
Just to say that I think this PR is fine, but we need to find a solution to #48 (comment) . (even if it means reporting an inference bug to 1.10.5 and see if the solution can be back-ported from 1.11) |
#50 fixes #48 (comment) |
Simplify iterate method to remove allocation on 1.10.
I think this is good to go (unless we want to make other breaking changes). |
Thanks a lot @carstenbauer, and others of course, great work! |
Released 3.0 now. |
Closes #47
Copied from there:
A summary of the new API / changes that need to be made (likely incomplete and maybe some points are still up for debate):
chunk(x; ...)
that returns an iterator that delivers the chunks, i.e. views intox
. (We choosechunk
, singular, here to indicate the verb formto chunk
. This is to be in line withenumerate
,zip
, and co.)chunks(x; ...)
tochunk_indices(x; ...)
and make the latter return an iterator that delivers the indices (ofx
) for each chunk.iterator names:I implemented a single iteratorChunk
(returned bychunk
) andChunkIndices
(returned bychunk_indices
). This is in line withBase.Iterators.Enumerate
(returned byenumerate
),Base.Pairs
(returned bypairs
), andBase.Iterators.Zip
(returned byzip
)ChunksIterator
that has a type parameterReturnIndices
orReturnViews
to indicate what will be eventually returned.Symbol
option forsplit
. While it is slightly more convenient, it can be less efficient and it doesn't make much sense to have two ways to do the same thing.getchunk
internal (not public API anymore) + rename it togetchunkindices
What I think might make sense but I'm unsure about / not sure it's worth it:
BatchSplit
toContiguousSplit
? The latter seems to communicate the critical point more clearly: the resulting chunks hold contiguous data.chunk(x; n=4, strategy=Contiguous())
or something less verbose makes more sense as it gets rid of "split" terminology entirely?