Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: Implement 8bpc cdef_dist_kernel #3292

Merged
merged 3 commits into from
Nov 28, 2023

Conversation

barrbrain
Copy link
Collaborator

No description provided.

@@ -0,0 +1,164 @@
// Copyright (c) 2022, The rav1e contributors. All rights reserved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say 2023, maybe we could go over all the headers at once later.

Copy link
Collaborator

@lu-zero lu-zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :)

@barrbrain barrbrain merged commit 92b34fc into xiph:master Nov 28, 2023
@barrbrain
Copy link
Collaborator Author

See #3280 (comment) for a perf trace from before this PR.
Here is the same workload after this PR:

# Samples: 175K of event 'cycles'
# Event count (approx.): 97114371472
#
#       Overhead  Command / Shared Object / Symbol
# ..............  ...............................................................................................................................................................................................................
#
   100.00%        rav1e  
       92.25%        rav1e                
           6.62%        [.] put_8tap_neon
            |          
            |--5.54%--rav1e::me::estimate_motion
            |          rav1e::me::sub_pixel_me (inlined)
            |          rav1e::me::subpel_diamond_search (inlined)
            |          rav1e::me::get_subpel_mv_rd (inlined)
            |          put_8tap_neon
            |          
             --0.55%--put_8tap_neon
                       put_8tap_neon

           5.86%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |          
             --5.77%--rav1e::rdo::rdo_mode_decision
                       |          
                        --5.57%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  rav1e::rdo::sse_wxh (inlined)
                                  rav1e::dist::rust::get_weighted_sse
                                  core::iter::traits::iterator::Iterator::sum (inlined)
                                  <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                                  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                  core::iter::traits::iterator::Iterator::fold (inlined)
                                  core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                  rav1e::dist::rust::get_weighted_sse::{{closure}} (inlined)
                                  core::iter::traits::iterator::Iterator::sum (inlined)
                                  <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                                  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                  |          
                                   --5.21%--core::iter::traits::iterator::Iterator::fold (inlined)
                                             |          
                                             |--4.65%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                             |          |          
                                             |           --4.65%--rav1e::dist::rust::get_weighted_sse::{{closure}}::{{closure}} (inlined)
                                             |                     |          
                                             |                      --4.43%--core::iter::traits::iterator::Iterator::sum (inlined)
                                             |                                <u32 as core::iter::traits::accum::Sum>::sum (inlined)
                                             |                                <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                             |                                core::iter::traits::iterator::Iterator::fold (inlined)
                                             |                                |          
                                             |                                |--3.67%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                             |                                |          |          
                                             |                                |           --3.67%--rav1e::dist::rust::get_weighted_sse::{{closure}}::{{closure}}::{{closure}} (inlined)
                                             |                                |                     core::iter::traits::iterator::Iterator::sum (inlined)
                                             |                                |                     <u32 as core::iter::traits::accum::Sum>::sum (inlined)
                                             |                                |                     <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                             |                                |                     core::iter::traits::iterator::Iterator::fold (inlined)
                                             |                                |                     |          
                                             |                                |                      --2.19%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                             |                                |                                <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)
                                             |                                |          
                                             |                                 --0.72%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                             |                                           <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)
                                             |          
                                              --0.55%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                                        <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)

           4.61%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               |          
                --4.60%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                          core::iter::traits::iterator::Iterator::fold (inlined)
                          |          
                           --4.53%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                                     |          
                                      --4.53%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                                |          
                                                 --4.48%--rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
                                                           |          
                                                            --1.18%--<v_frame::plane::Plane<T> as rav1e::frame::plane::AsRegion<T>>::region (inlined)
                                                                      |          
                                                                       --0.91%--rav1e::tiling::plane_region::PlaneRegion<T>::new (inlined)
                                                                                 rav1e::tiling::plane_region::PlaneRegion<T>::from_slice (inlined)

           4.39%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct32
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct32
               |          
                --3.60%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          |          
                           --2.39%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16_asym (inlined)
                                     |          
                                      --0.60%--rav1e::asm::aarch64::transform::forward::RotateKernel::half_kernel (inlined)

           4.10%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct64
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct64
               |          
               |--2.19%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_32_asym (inlined)
               |          |          
               |           --0.52%--rav1e::asm::aarch64::transform::forward::RotateKernel::half_kernel (inlined)
               |          
               |--0.71%--rav1e::asm::aarch64::transform::forward::daala_fdct64::butterfly_pair (inlined)
               |          
                --0.57%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)

           4.06%        [.] rav1e::rdo::compute_distortion
            |          
             --4.02%--rav1e::rdo::rdo_mode_decision
                       |          
                        --3.91%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  |          
                                  |--2.34%--rav1e::rdo::cdef_dist_wxh (inlined)
                                  |          |          
                                  |           --1.36%--rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel (inlined)
                                  |                     |          
                                  |                      --1.03%--rav1e::activity::apply_ssim_boost (inlined)
                                  |                                |          
                                  |                                 --0.93%--rav1e::activity::ssim_boost_rsqrt (inlined)
                                  |          
                                   --1.13%--rav1e::rdo::sse_wxh (inlined)
                                             |          
                                              --0.90%--rav1e::rdo::compute_distortion::{{closure}} (inlined)
                                                        |          
                                                         --0.64%--rav1e::rdo::distortion_scale (inlined)

           4.00%        [.] rav1e_satd8x8_neon
            |          
             --3.77%--rav1e::api::internal::ContextInner<T>::receive_packet
                       rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
                       <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                       rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
                       rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
                       core::iter::traits::iterator::Iterator::for_each (inlined)
                       <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
                       <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       core::iter::traits::iterator::Iterator::fold (inlined)
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                       core::iter::adapters::map::map_fold::{{closure}} (inlined)
                       core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
                       <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
                       <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       core::iter::traits::iterator::Iterator::fold (inlined)
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                       core::iter::adapters::map::map_fold::{{closure}} (inlined)
                       rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
                       rav1e::asm::aarch64::dist::get_satd (inlined)
                       satd8x8_neon (inlined)

           3.85%        [.] rav1e::asm::aarch64::transform::forward::forward_transform_neon
            |          
             --3.80%--rav1e::asm::aarch64::transform::forward::forward_transform_neon
                       |          
                       |--0.75%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                       |          <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)
                       |          
                       |--0.68%--rav1e::asm::aarch64::transform::forward::round_shift_array_neon (inlined)
                       |          
                        --0.58%--rav1e::asm::aarch64::transform::forward::transpose_8x8_neon (inlined)

           3.42%        [.] rav1e::encoder::encode_block_post_cdef
            |          
             --2.80%--rav1e::encoder::encode_partition_topdown
                       rav1e::rdo::rdo_partition_decision
                       |          
                       |--1.52%--rav1e::rdo::rdo_partition_simple (inlined)
                       |          rav1e::rdo::rdo_mode_decision
                       |          |          
                       |           --1.48%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                       |                     <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                       |                     core::iter::traits::iterator::Iterator::try_fold (inlined)
                       |                     <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                       |                     rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                       |                     rav1e::rdo::luma_chroma_mode_rdo
                       |                     rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                       |                     rav1e::encoder::encode_block_post_cdef
                       |          
                        --1.27%--rav1e::rdo::rdo_partition_none (inlined)
                                  rav1e::rdo::rdo_mode_decision
                                  |          
                                   --1.23%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                             <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                             core::iter::traits::iterator::Iterator::try_fold (inlined)
                                             <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                             rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                             rav1e::rdo::luma_chroma_mode_rdo
                                             rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                             rav1e::encoder::encode_block_post_cdef

           3.41%        [.] rav1e_satd16x8_neon
            |          
            |--2.83%--rav1e::me::estimate_motion
            |          |          
            |           --2.51%--rav1e::me::sub_pixel_me (inlined)
            |                     rav1e::me::subpel_diamond_search (inlined)
            |                     rav1e::me::get_subpel_mv_rd (inlined)
            |                     rav1e::me::compute_mv_rd (inlined)
            |                     satd16x8_neon (inlined)
            |          
             --0.53%--core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
                       core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut (inlined)
                       rav1e::encoder::encode_tile_group::{{closure}} (inlined)
                       rav1e::encoder::encode_tile (inlined)

           2.93%        [.] rav1e_cdef_dist_kernel_8x8_neon
            |          
             --2.89%--rav1e::rdo::rdo_mode_decision
                       |          
                        --2.82%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  rav1e::rdo::cdef_dist_wxh (inlined)
                                  rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel (inlined)
                                  cdef_dist_kernel_8x8_neon (inlined)

           2.21%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
               |--1.49%--rav1e::asm::aarch64::transform::forward::daala_fdct16
               |          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16
               |          |          
               |          |--0.61%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8_asym (inlined)
               |          |          
               |           --0.56%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8_asym (inlined)
               |          
                --0.72%--rav1e::asm::aarch64::transform::forward::daala_fdct64
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16

           2.19%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
            |          
             --2.15%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
                       |          
                       |--0.80%--<rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol (inlined)
                       |          |          
                       |           --0.58%--<rav1e::ec::WriterBase<rav1e::ec::WriterCounter> as rav1e::ec::StorageBackend>::store (inlined)
                       |          
                        --0.75%--rav1e::context::cdf_context::CDFContextLog::push (inlined)
                                  rav1e::context::cdf_context::CDFContextLogPartition<_>::push (inlined)

           2.13%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
            |
            ---rav1e::encoder::encode_tx_block
               rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
               |          
               |--0.89%--rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
               |          |          
               |           --0.52%--<core::iter::adapters::rev::Rev<I> as core::iter::traits::iterator::Iterator>::next (inlined)
               |                     <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::double_ended::DoubleEndedIterator>::next_back (inlined)
               |                     <core::iter::adapters::zip::Zip<A,B> as core::iter::traits::double_ended::DoubleEndedIterator>::next_back (inlined)
               |                     <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next_back (inlined)
               |          
                --0.62%--rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeff_signs (inlined)

           2.11%        [.] core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
               |          
                --2.04%--core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (inlined)
                          rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}} (inlined)
                          |          
                           --0.68%--rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)

           1.78%        [.] rav1e::quantize::QuantizationContext::quantize
            |          
             --1.76%--rav1e::encoder::encode_tx_block
                       rav1e::quantize::QuantizationContext::quantize
                       |          
                        --0.51%--core::iter::traits::iterator::Iterator::max (inlined)
                                  core::iter::traits::iterator::Iterator::max_by (inlined)
                                  core::iter::traits::iterator::Iterator::reduce (inlined)
                                  |          
                                   --0.50%--<core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
                                             core::iter::traits::iterator::Iterator::fold (inlined)

           1.64%        [.] rav1e::quantize::rust::dequantize
            |          
             --1.64%--rav1e::encoder::encode_tx_block
                       rav1e::quantize::rust::dequantize
                       |          
                        --1.18%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next (inlined)
                                  <core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)

           1.38%        [.] rav1e::encoder::encode_tx_block
            |          
             --1.35%--rav1e::encoder::encode_tx_block
                       |          
                        --0.90%--rav1e::encoder::diff (inlined)

           1.24%        [.] rav1e_sad32x32_neon
            |          
             --0.75%--rav1e::api::internal::ContextInner<T>::send_frame
                       rav1e::api::internal::ContextInner<T>::compute_frame_invariants (inlined)
                       rav1e::api::internal::ContextInner<T>::compute_lookahead_motion_vectors (inlined)
                       rav1e::api::lookahead::compute_motion_vectors
                       rayon::iter::ParallelIterator::for_each (inlined)
                       rayon::iter::for_each::for_each (inlined)
                       <rayon::vec::IntoIter<T> as rayon::iter::ParallelIterator>::drive_unindexed (inlined)
                       rayon::iter::plumbing::bridge (inlined)
                       <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
                       <rayon::vec::Drain<T> as rayon::iter::IndexedParallelIterator>::with_producer (inlined)
                       <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback (inlined)
                       rayon::iter::plumbing::bridge_producer_consumer (inlined)
                       rayon::iter::plumbing::bridge_producer_consumer::helper
                       rayon::iter::plumbing::Producer::fold_with (inlined)
                       <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter (inlined)
                       core::iter::traits::iterator::Iterator::for_each (inlined)
                       core::iter::traits::iterator::Iterator::fold (inlined)
                       core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (inlined)
                       core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
                       rav1e::api::lookahead::compute_motion_vectors::{{closure}} (inlined)
                       rav1e::me::estimate_tile_motion
                       rav1e::me::refine_subsampled_sb_motion (inlined)
                       rav1e::me::refine_subsampled_motion_estimate (inlined)
                       rav1e::me::full_search
                       rav1e::me::compute_mv_rd (inlined)
                       rav1e::asm::aarch64::dist::get_sad (inlined)
                       sad32x32_neon (inlined)

           1.05%        [.] rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
            |          
             --0.81%--rav1e::encoder::encode_partition_topdown
                       |          
                        --0.81%--rav1e::rdo::rdo_partition_decision
                                  |          
                                   --0.57%--rav1e::rdo::rdo_partition_simple (inlined)
                                             rav1e::rdo::rdo_mode_decision
                                             |          
                                              --0.51%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                                        <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                                        core::iter::traits::iterator::Iterator::try_fold (inlined)
                                                        <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                                        rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                                        rav1e::rdo::luma_chroma_mode_rdo
                                                        rav1e::rdo::luma_chroma_mode_rdo::{{closure}}

           1.04%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.86%--rav1e::asm::aarch64::transform::forward::daala_fdct32
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8

           1.01%        [.] rav1e::cdef::cdef_filter_superblock
            |          
             --0.93%--rav1e::encoder::encode_frame
                       rav1e::encoder::encode_tile_group (inlined)
                       rav1e::encoder::FrameState<T>::apply_tile_state_mut (inlined)
                       rav1e::encoder::encode_tile_group::{{closure}} (inlined)
                       rav1e::cdef::cdef_filter_tile
                       rav1e::cdef::cdef_filter_superblock

           0.94%        [.] rav1e::me::get_fullpel_mv_rd
            |
            ---rav1e::me::estimate_motion
               |          
                --0.92%--rav1e::me::full_pixel_me (inlined)
                          |          
                           --0.89%--rav1e::me::full_pixel_me::{{closure}}
                                     |          
                                      --0.50%--rav1e::me::get_best_predictor (inlined)
                                                rav1e::me::get_fullpel_mv_rd

           0.90%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.88%--rav1e::asm::aarch64::transform::forward::daala_fdct64
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16

           0.85%        [.] rav1e::lrf::rust::sgrproj_box_ab_r1
           0.83%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
            |          
             --0.81%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
                       |          
                        --0.78%--rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx (inlined)
                                  |          
                                   --0.63%--rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx_from_stats (inlined)

           0.81%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.81%--rav1e::asm::aarch64::transform::forward::daala_fdct32
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8

           0.78%        [.] rav1e::deblock::sse_size14
           0.74%        [.] prep_neon
            |          
             --0.54%--rav1e::rdo::rdo_partition_decision

           0.74%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_mag
            |          
             --0.72%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_mag

           0.72%        [.] rav1e_sad16x16_neon
            |
            ---rav1e::me::estimate_motion
               rav1e::me::full_pixel_me (inlined)
               |          
                --0.67%--rav1e::me::full_pixel_me::{{closure}}

           0.69%        [.] rav1e::me::full_pixel_me::{{closure}}
            |
            ---rav1e::me::estimate_motion
               rav1e::me::full_pixel_me (inlined)
               rav1e::me::full_pixel_me::{{closure}}

           0.62%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |          
             --0.56%--rav1e::rdo::rdo_mode_decision
                       |          
                        --0.52%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  rav1e::rdo::sse_wxh (inlined)
                                  rav1e::dist::rust::get_weighted_sse
                                  core::iter::traits::iterator::Iterator::sum (inlined)
                                  <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                                  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold

           0.56%        [.] rav1e::lrf::rust::sgrproj_box_f_r1
           0.54%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct16

           0.54%        [.] rav1e::predict::rust::pred_directional
           0.52%        [.] rav1e::predict::PredictionMode::predict_inter_single
           0.51%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.51%        [.] rav1e::partition::BlockSize::from_width_and_height_opt
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
               rav1e::asm::aarch64::dist::get_satd (inlined)
               rav1e::partition::BlockSize::from_width_and_height_opt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants