Changes for 3.0 #48

carstenbauer · 2024-09-18T21:47:16Z

Closes #47

Copied from there:
A summary of the new API / changes that need to be made (likely incomplete and maybe some points are still up for debate):

new function chunk(x; ...) that returns an iterator that delivers the chunks, i.e. views into x. (We choose chunk, singular, here to indicate the verb form to chunk. This is to be in line with enumerate, zip, and co.)
rename chunks(x; ...) to chunk_indices(x; ...) and make the latter return an iterator that delivers the indices (of x) for each chunk.
iterator names: Chunk (returned by chunk) and ChunkIndices (returned by chunk_indices). This is in line with Base.Iterators.Enumerate (returned by enumerate), Base.Pairs (returned by pairs), and Base.Iterators.Zip (returned by zip) I implemented a single iterator ChunksIterator that has a type parameter ReturnIndices or ReturnViews to indicate what will be eventually returned.
drop the legacy API that relies on positional arguments
drop the Symbol option for split. While it is slightly more convenient, it can be less efficient and it doesn't make much sense to have two ways to do the same thing.
make getchunk internal (not public API anymore) + rename it to getchunkindices

What I think might make sense but I'm unsure about / not sure it's worth it:

Should we rename BatchSplit to ContiguousSplit? The latter seems to communicate the critical point more clearly: the resulting chunks hold contiguous data.
I don't like that we use "chunk" on the one hand and "split" on the other hand. Why two terms for the same thing? Perhaps something like chunk(x; n=4, strategy=Contiguous()) or something less verbose makes more sense as it gets rid of "split" terminology entirely?

carstenbauer · 2024-09-18T21:54:49Z

julia> using ChunkSplitters

julia> function f(x)
           s = zero(eltype(x))
           for xc in chunk(x; n=4)
               s += sum(xc)
           end
           return s
       end
f (generic function with 1 method)

julia> x=rand(10^3);

julia> f(x) ≈ sum(x)
true

julia> function f_indices(x)
           s = zero(eltype(x))
           for idcs in chunk_indices(x; n=4)
               s += sum(@view(x[idcs]))
           end
           return s
       end
f_indices (generic function with 1 method)

julia> f_indices(x) ≈ sum(x)
true

julia> @benchmark f($x)
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
 Range (min … max):  160.057 ns … 195.801 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     160.426 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   160.774 ns ±   1.690 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄██▄                                                          ▂
  ████▇▆▅▄▅▅▅▆▇██▆▅▅▄▄▅▅▄▃▅▅▆▄▄▅▄▅▄▄▅▆▆▇▇▇███▇▇▅▆▅▅▅▅▃▅▁▄▄▄▄▁▄▆ █
  160 ns        Histogram: log(frequency) by time        170 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f_indices($x)
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
 Range (min … max):  159.688 ns … 181.313 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     159.898 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   160.350 ns ±   1.865 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▄▂      ▁                                                   ▁
  █████▆▆▆▆▇█▇▇▅▆▅▄▄▄▁▁▆▅▄▅▅▄▃▃▅▅▇▆▆▇██▇▆▅▆▆▅▄▃▅▄▄▅▅▅▄▅▆▆▃▄▆▃▄▇ █
  160 ns        Histogram: log(frequency) by time        171 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

carstenbauer · 2024-09-19T23:32:14Z

Preview of the revamped documentation: https://juliafolds2.github.io/ChunkSplitters.jl/previews/PR48/

carstenbauer · 2024-09-20T09:33:27Z

On 1.10, chunk(x; size=10) seems to allocate whereas chunk_indices(x; size=10) does not. Why?

function f_chunk(x; n=nothing, size=nothing)
    s = zero(eltype(x))
    for xdata in chunk(x; n=n, size=size)
        s += sum(xdata)
    end
    return s
end
x = rand(10^3)
b = @benchmark $f_chunk($x; n=4) samples = 1 evals = 1
@test b.allocs == 0
b = @benchmark $f_chunk($x; size=10) samples = 1 evals = 1
if VERSION > v"1.10"
    @test b.allocs == 0
else
    @test_broken b.allocs == 0 # TODO: why does this allocate on 1.10?
end

julia> b = @benchmark $f_chunk($x; size=10) samples = 1 evals = 1
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 1.542 μs (0.00% GC) to evaluate,
 with a memory estimate of 6.19 KiB, over 99 allocations.

carstenbauer · 2024-09-20T10:10:16Z

I noticed that the svg in the old/current docs is wrong.

Indicating the scattering by coloring chunks:

Indicating the scattering by arrows:

I prefer the latter (and will use it).

carstenbauer · 2024-09-20T10:30:28Z

Alright guys, as it turned out, I had the motivation to implement this right away. Please checkout the new (preview) docs and run some tests to see if things are stable.

What I did beyond the points mentioned in the OP is drop 1.6 support (1.10 is the new lower bound).

Also, if you have an idea about #48 (comment) that would be great as well.

carstenbauer · 2024-09-20T10:33:57Z

add a transition guide, that highlights the most important breaking changes (→ I've added a CHANGELOG.md)
test OhMyThreads.jl with this (→ Prepare for ChunkSplitters 3.0 OhMyThreads.jl#119)
update or drop examples/ folder

MasonProtter · 2024-09-20T10:37:59Z

I'll try to look at this tonight. I suspect there's an inference problem here on 1.10, but I'll dig in and see if we can figure out what went wrong.

carstenbauer · 2024-09-21T12:09:30Z

I also like these changes, in any case.

I still think that it might make sense to consider renaming Batch → Consecutive and Scatter → RoundRobin. (I've just done this in a new commit).

As for renaming split to strategy (or something similar), there are two reasons I didn't do it right away (but I'm still open to be convinced):

strategy feels long compared to split
The package name ChunkSplitters.jl already establishes "chunk" and "split" as terminology.

codecov-commenter · 2024-09-21T12:19:51Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 92.03540% with 9 lines in your changes missing coverage. Please review.

Project coverage is 92.03%. Comparing base (ceadbb6) to head (1d4cdf4).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/internals.jl	91.89%	9 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
- Coverage   96.33%   92.03%   -4.30%     
==========================================
  Files           2        2              
  Lines         300      113     -187     
==========================================
- Hits          289      104     -185     
+ Misses         11        9       -2

Flag	Coverage Δ
	`92.03% <92.03%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/internals.jl

…erator

carstenbauer · 2024-09-24T06:43:18Z

I've implemented the suggestions by @fredrikekre (thanks for them!).

One remaining minor question is whether we should export the subtypes <:Split, i.e. Consecutive and RoundRobin (in any case, they are public API). On the one hand, it is nice that one can just do using ChunkSplitters and write chunks(x; n=10, split=RoundRobin()). OTOH, the terms are quite generic and might cause name clashes (e.g. https://juliahub.com/ui/Search?type=symbols&q=Consecutive&u=define).

If we don't export them one has to write

chunks(x; n=10, split=ChunkSplitters.RoundRobin())

# or, alternatively

using ChunkSplitters: RoundRobin
chunks(x; n=10, split=RoundRobin())

Again, I don't have a strong opinion but tend to slightly prefer not exporting them.

We could also name them RoundRobinSplit and ConsecutiveSplit, of course, at the cost of redundancy (split=RoundRobinSplit()).

Opinions?

fredrikekre

One remaining minor question is whether we should export the subtypes <:Split

I prefer not exporting, in particular since the practice of always explicitly using symbols anyway is getting more common (https://github.com/ericphanson/ExplicitImports.jl).

On the one hand, it is nice that one can just do using ChunkSplitters and write chunks(x; n=10, split=RoundRobin()).

This export is actually slightly annoying because the perfect variable name for the return value is chunks, and obviously chunks = chunks(...) doesn't work.

With the current export situation I would simply use

using ChunkSplitters: ChunkSplitters

and always use the ChunkSplitters. prefix for clarity.

src/internals.jl

fredrikekre · 2024-09-24T07:59:17Z

src/internals.jl

+    size::Int
+end
+
+struct IndexChunksIterator{T,C<:Constraint,S<:Split} <: AbstractChunksIterator{T,C,S}


More bikeshedding, but since you can random access these, is the Iterator suffix misleading? Perhaps it is just me, but I would have guessed this could only be iterated from the name.

Bikeshedding indeed 😉 I'm fine dropping the Iterator suffix. Let me just note that

the primary use case is to iterate over these, and

this isn't public API anyways (so we can rename these types whenever we want)

I prefer exporting the symbols, not bringing them to the namespace is always an option for the user.

@lmiq, wrong thread? I assume by "the symbols" you mean Consecutive and RoundRobin and aren't suggesting to make the "iterator" types public API?

FWIW, I've dropped the "Iterator" suffixes.

carstenbauer · 2024-09-24T09:26:35Z

I prefer not exporting, in particular since the practice of always explicitly using symbols anyway is getting more common (ericphanson/ExplicitImports.jl).

On the one hand, it is nice that one can just do using ChunkSplitters and write chunks(x; n=10, split=RoundRobin()).

I like the trend of explicitly using the bits that one actually needs. However, for this case, it doesn't matter whether we export the <:Split types or not. The question is about what should be "imported" if one simply does using ChunkSplitters. Not exporting anything (are you suggesting that?) seems radical to me. The naive using ChunkSplitters is useful in interactive workflows where I don't want to explicitly list everything I need.

This export is actually slightly annoying because the perfect variable name for the return value is chunks, and obviously chunks = chunks(...) doesn't work.

With the current export situation I would simply use
using ChunkSplitters: ChunkSplitters
and always use the ChunkSplitters. prefix for clarity.

Once again, not exporting chunks, the main API of this package, seems bad/too radical to me. And while I see your point about the chunks name clash, the same could be said about size = size(...), length = length(...), and probably many more functions. Moreover, the typical use case of chunks(...) is in combination with for or map in which case you don't store it in a variable anyways. As you pointed out, you can always opt-out of the "import everything".

fredrikekre · 2024-09-24T10:26:58Z

Yea I just meant that if anything chunks would be the "problematic" export, not the split types.

lmiq · 2024-09-24T12:58:11Z

Just to say that I think this PR is fine, but we need to find a solution to #48 (comment) .

(even if it means reporting an inference bug to 1.10.5 and see if the solution can be back-ported from 1.11)

fredrikekre · 2024-09-24T13:58:13Z

#50 fixes #48 (comment)

Simplify iterate method to remove allocation on 1.10.

carstenbauer · 2024-09-24T17:08:32Z

I think this is good to go (unless we want to make other breaking changes).

lmiq · 2024-09-25T00:12:36Z

Thanks a lot @carstenbauer, and others of course, great work!

lmiq · 2024-09-25T11:32:11Z

Released 3.0 now.

carstenbauer added 2 commits September 18, 2024 23:45

a start...

f22d31b

continue

e9eba84

carstenbauer added 9 commits September 19, 2024 11:54

tests, drop symbol interface etc

89915d5

more

0d20833

docstrings

709d0d8

more

d127c7f

among other things, rename: ConsecutiveSplit() and RoundRobinSplit()gs

575bd6b

more

38fd7ba

README

f442705

docs

e37cea4

push_preview=true

51056ad

carstenbauer added 4 commits September 20, 2024 01:34

test_broken -> test

74d5ed3

drop 1.6; make julia 1.10 new lower bound

1a1aee7

mark chunk + size alloc test as broken on 1.10

8b763fc

make downgrade ci use 1.10

8a6bba2

carstenbauer added 3 commits September 20, 2024 11:36

ci fixes

253ece3

ci fix

19b6cf8

readme

cf0270c

carstenbauer added 2 commits September 20, 2024 12:14

svg update

e7f495b

doctests

3b09fd0

carstenbauer marked this pull request as ready for review September 20, 2024 10:30

carstenbauer requested review from lmiq and MasonProtter September 20, 2024 10:31

carstenbauer added 3 commits September 21, 2024 13:48

Update CHANGELOG.md

1870380

Update CHANGELOG.md

86cf5e2

Update CHANGELOG.md

e6b28e9

rename BatchSplit -> Consecutive and ScatterSplit -> RoundRobin

99e1aed

carstenbauer added 2 commits September 21, 2024 14:25

update svg

0501c0a

splitting strategies example in docs

b867c6d

fredrikekre reviewed Sep 22, 2024

View reviewed changes

src/internals.jl Outdated Show resolved Hide resolved

src/internals.jl Outdated Show resolved Hide resolved

carstenbauer added 3 commits September 24, 2024 08:12

rename chunk_indices -> index_chunks and chunk -> chunks

17a385a

separate (non-public) iterators: IndexChunksIterator and ViewChunksIt…

f555204

…erator

fix docs

f218eaf

fredrikekre reviewed Sep 24, 2024

View reviewed changes

fredrikekre mentioned this pull request Sep 24, 2024

Simplify iterate method to remove allocation on 1.10. #50

Merged

fredrikekre and others added 5 commits September 24, 2024 16:03

Simplify iterate method to remove allocation on 1.10.

af3e0ef

Merge pull request #50 from fredrikekre/fe/allocs

6220502

Simplify iterate method to remove allocation on 1.10.

quality Base functions method overload

1d4cdf4

drop iterator suffix

7c5b8a4

rename minchunksize -> minsize

0bad0d8

carstenbauer mentioned this pull request Sep 24, 2024

3.0 #47

Closed

6 tasks

update CHANGELOG

6ae670e

lmiq merged commit 9d18e48 into main Sep 25, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes for 3.0 #48

Changes for 3.0 #48

carstenbauer commented Sep 18, 2024 •

edited

Loading

carstenbauer commented Sep 18, 2024

carstenbauer commented Sep 19, 2024 •

edited

Loading

carstenbauer commented Sep 20, 2024 •

edited

Loading

carstenbauer commented Sep 20, 2024 •

edited

Loading

carstenbauer commented Sep 20, 2024

carstenbauer commented Sep 20, 2024 •

edited

Loading

MasonProtter commented Sep 20, 2024

carstenbauer commented Sep 21, 2024 •

edited

Loading

codecov-commenter commented Sep 21, 2024 •

edited

Loading

carstenbauer commented Sep 24, 2024 •

edited

Loading

fredrikekre left a comment

fredrikekre Sep 24, 2024

carstenbauer Sep 24, 2024

lmiq Sep 24, 2024

carstenbauer Sep 24, 2024

carstenbauer Sep 24, 2024

carstenbauer commented Sep 24, 2024 •

edited

Loading

fredrikekre commented Sep 24, 2024

lmiq commented Sep 24, 2024 •

edited

Loading

fredrikekre commented Sep 24, 2024

carstenbauer commented Sep 24, 2024

lmiq commented Sep 25, 2024

lmiq commented Sep 25, 2024

Changes for 3.0 #48

Changes for 3.0 #48

Conversation

carstenbauer commented Sep 18, 2024 • edited Loading

carstenbauer commented Sep 18, 2024

carstenbauer commented Sep 19, 2024 • edited Loading

carstenbauer commented Sep 20, 2024 • edited Loading

carstenbauer commented Sep 20, 2024 • edited Loading

carstenbauer commented Sep 20, 2024

carstenbauer commented Sep 20, 2024 • edited Loading

MasonProtter commented Sep 20, 2024

carstenbauer commented Sep 21, 2024 • edited Loading

codecov-commenter commented Sep 21, 2024 • edited Loading

Codecov Report

carstenbauer commented Sep 24, 2024 • edited Loading

fredrikekre left a comment

Choose a reason for hiding this comment

fredrikekre Sep 24, 2024

Choose a reason for hiding this comment

carstenbauer Sep 24, 2024

Choose a reason for hiding this comment

lmiq Sep 24, 2024

Choose a reason for hiding this comment

carstenbauer Sep 24, 2024

Choose a reason for hiding this comment

carstenbauer Sep 24, 2024

Choose a reason for hiding this comment

carstenbauer commented Sep 24, 2024 • edited Loading

fredrikekre commented Sep 24, 2024

lmiq commented Sep 24, 2024 • edited Loading

fredrikekre commented Sep 24, 2024

carstenbauer commented Sep 24, 2024

lmiq commented Sep 25, 2024

lmiq commented Sep 25, 2024

carstenbauer commented Sep 18, 2024 •

edited

Loading

carstenbauer commented Sep 19, 2024 •

edited

Loading

carstenbauer commented Sep 20, 2024 •

edited

Loading

carstenbauer commented Sep 20, 2024 •

edited

Loading

carstenbauer commented Sep 20, 2024 •

edited

Loading

carstenbauer commented Sep 21, 2024 •

edited

Loading

codecov-commenter commented Sep 21, 2024 •

edited

Loading

carstenbauer commented Sep 24, 2024 •

edited

Loading

carstenbauer commented Sep 24, 2024 •

edited

Loading

lmiq commented Sep 24, 2024 •

edited

Loading