Replies: 4 comments 7 replies
-
On it Lemme take a look this weekend . Hope it helps in the benchmarks. Also loved the article, (esp the anti-patterns, fell prey to some) Will update this very soon |
Beta Was this translation helpful? Give feedback.
-
A little bit complicated, needs some comments around it #[inline(always)]
pub fn copy_rep_matches_new(dest: &mut [u8], offset: usize, dest_offset: usize, length: usize)
{
let diff = dest_offset - offset + 1;
for window in Cell::from_mut(&mut dest[offset..offset+length+1])
.as_slice_of_cells()
.windows(diff)
{
window.last().unwrap().set(window[0].get());
}
} |
Beta Was this translation helpful? Give feedback.
-
Btw this Cells::as_slice is goood. Kills bounds checks in png filterings I.e this pub fn handle_avg(
prev_row: &[u8], raw: &[u8], current: &mut [u8], components: usize, _use_sse4: bool
)
{
if raw.len() < components || current.len() < components
{
return;
}
// no simd, so just do it the old fashioned way
// handle leftmost byte explicitly
for i in 0..components
{
current[i] = raw[i].wrapping_add(prev_row[i] >> 1);
}
// raw length is one row,so always keep it in check
let end = current.len().min(raw.len()).min(prev_row.len());
if components > 8
{
// optimizer hint to tell the compiler that we don't see this ever happening
return;
}
for i in components..end
{
let a = u16::from(current[i - components]);
let b = u16::from(prev_row[i]);
let c = (((a + b) >> 1) & 0xFF) as u8;
current[i] = raw[i].wrapping_add(c);
}
} And this use std::cell::Cell;
pub fn handle_avg(
prev_row: &[u8], raw: &[u8], current: &mut [u8], components: usize, _use_sse4: bool
)
{
if raw.len() < components || current.len() < components
{
return;
}
// handle leftmost byte explicitly
for i in 0..components
{
current[i] = raw[i].wrapping_add(prev_row[i] >> 1);
}
// raw length is one row,so always keep it in check
let end = current.len().min(raw.len()).min(prev_row.len());
if components > 8
{
// optimizer hint to tell the compiler that we don't see this ever happening
return;
}
let current_as_cells = Cell::from_mut(&mut current[..end])
.as_slice_of_cells()
.windows(components + 1);
for ((current_window, byte), r) in current_as_cells
.zip(&prev_row[components..])
.zip(&raw[components..])
{
let a = u16::from(current_window.first().unwrap().get());
let b = u16::from(*byte);
let c = (((a + b) >> 1) & 0xFF) as u8;
current_window.last().unwrap().set(r.wrapping_add(c));
}
} Are semantically equivalent,but the latter doesn't have bounds check. My only problem is that the latter is a code bloat since it generates sse code for paths greater than 8, the only one I can think of is 16 bit RGBA images(4 (colorspace)* 2(number of bytes)), I should probably add a benchmark for that. |
Beta Was this translation helpful? Give feedback.
-
Sadly doesn't help, e.g benchmark numbers for 16 bit png unfiltering
If i add a SSE path for 16 bit unfiltering,we have
|
Beta Was this translation helpful? Give feedback.
-
We've been having trouble with bounds checks in the RLE hot loop in
zune-inflate
, and couldn't eliminate them because the slices we wanted to operate on were overlapping. Since there's nowindows_mut
in Rust, we could not use iterators to get rid of bounds checks.Following the release of my article, a PR has been merged into the standard library explaining how to work around this.
You can see it in action on the samples from the article here.
Beta Was this translation helpful? Give feedback.
All reactions