Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: a few usage examples for div #377

Closed
wants to merge 157 commits into from
Closed

docs: a few usage examples for div #377

wants to merge 157 commits into from

Conversation

Dustin-Ray
Copy link
Contributor

@Dustin-Ray Dustin-Ray commented Dec 3, 2023

Adds a few usage examples for div operations. ref issue #283

src/uint/div.rs Outdated Show resolved Hide resolved
Co-authored-by: Tony Arcieri <[email protected]>
src/uint/div.rs Outdated
///
/// // Verify the result
/// assert_eq!(remainder, U448::from(1_u64));
/// assert!(<CtChoice as Into<bool>>::into(is_some));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few more occurrences similar to this which can be bool::from instead:

Suggested change
/// assert!(<CtChoice as Into<bool>>::into(is_some));
/// assert!(bool::from(is_some));

@tarcieri
Copy link
Member

@drcapybara this also needs a rebase

tarcieri and others added 21 commits January 2, 2024 10:45
Based on PR #277.

The constant-time square root algorithm is described here:

https://github.com/RustCrypto/crypto-bigint/files/12600669/ct_sqrt.pdf

Co-authored-by: Daniel Hast <[email protected]>
Adapted from: privacy-scaling-explorations/halo2curves#83

Original code is Apache 2.0+MIT. Attribution has been added to the top
of the module.
* Disambiguate Uint::LOG2_BITS
* Simplify sqrt() loops and add some comments
* Add edge case tests for vartime sqrt(), and comments to the tests
* Add a TODO for `Uint::sqrt()`
Uses a more efficient squaring algorithm with fewer iterations, with an
implementation shared with `Uint`.

Proptested against `num-bigint`.
Uses the faster `BoxedUint::square` algorithm
- Places `shl`/`shr` ahead of `shl_vartime`/`shr_vartime`
- Removes variable-time comments about trait impls that call the
  constant-time bit shift functions
- Renames internal `sh*_1` functions to `sh*1`.
Splits apart the `Uint` and `DynResidue` benchmarks, in preparation for
adding benchmarks for `BoxedUint` and `BoxedResidue`.
This reverts commit cd9cdce.

Benchmarks reveal this caused a performance regression to modpow.
Reverting it yields the following improvement:

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [76.362 µs 76.381 µs 76.403 µs]
                        change: [-72.846% -72.795% -72.747%] (p = 0.00 < 0.05)
                        Performance has improved.
- Add `shl1` and `shr1` methods
- Change function for returning overflow bit to `shr1_with_overflow`
- Move `HI_BIT` to an inherent constant of `Limb`
Analogous changes to #388 for `BoxedUint`, which eliminates some
allocations by using in-place operations instead.

Montgomery arithmetic/BoxedResidueParams creation
                        time:   [17.577 µs 17.599 µs 17.625 µs]
                        change: [-25.727% -25.518% -25.315%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/BoxedResidue creation
                        time:   [212.68 ns 213.38 ns 214.06 ns]
                        change: [-6.0085% -5.4842% -4.9935%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/BoxedResidue retrieve
                        time:   [250.04 ns 250.99 ns 251.81 ns]
                        change: [+0.9240% +1.6723% +2.4260%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Montgomery arithmetic/multiplication, BoxedUint*BoxedUint
                        time:   [267.70 ns 268.75 ns 269.82 ns]
                        change: [-4.5751% -4.0798% -3.5329%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [73.615 µs 73.707 µs 73.812 µs]
                        change: [-4.4060% -4.1952% -3.9593%] (p = 0.00 < 0.05)
                        Performance has improved.
Eliminates several allocations by performing most operations of a
Montgomery reduction in-place.

Only conditionally adding the modulus presently incurs an allocation.

Montgomery arithmetic/BoxedResidue creation
                        time:   [135.30 ns 135.50 ns 135.76 ns]
                        change: [-36.403% -36.104% -35.794%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/BoxedResidue retrieve
                        time:   [203.59 ns 204.01 ns 204.45 ns]
                        change: [-17.533% -16.947% -16.403%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/multiplication, BoxedUint*BoxedUint
                        time:   [223.19 ns 223.76 ns 224.40 ns]
                        change: [-19.576% -19.107% -18.645%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [58.927 µs 59.009 µs 59.103 µs]
                        change: [-18.486% -18.227% -17.952%] (p = 0.00 < 0.05)
                        Performance has improved.
Adds internal `conditional_adc_assign` and `conditional_sbb_assign`
methods on `BoxedUint` which conditionally perform in-place
addition/subtraction with carry.

This eliminates the last allocations in Montgomery reduction as well as
a number of them in modular inversions.

Montgomery arithmetic/BoxedResidueParams creation
                        time:   [17.432 µs 17.453 µs 17.475 µs]
                        change: [-5.8279% -5.5913% -5.3509%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/BoxedResidue creation
                        time:   [113.13 ns 113.75 ns 114.46 ns]
                        change: [-16.909% -16.489% -16.042%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/BoxedResidue retrieve
                        time:   [167.52 ns 168.03 ns 168.56 ns]
                        change: [-18.571% -17.870% -17.225%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/multiplication, BoxedUint*BoxedUint
                        time:   [185.41 ns 186.34 ns 187.23 ns]
                        change: [-18.862% -18.195% -17.606%] (p = 0.00 < 0.05)
                        Performance has improved.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [48.787 µs 48.858 µs 48.937 µs]
                        change: [-17.467% -17.232% -16.971%] (p = 0.00 < 0.05)
                        Performance has improved.
Avoids allocating the immediate intermediate Montgomery form value used
for modpow, instead performing multiply and squarings (partially)
in-place.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [44.597 µs 44.653 µs 44.706 µs]
                        change: [-13.255% -13.003% -12.759%] (p = 0.00 < 0.05)
                        Performance has improved.
Changes `BoxedUint::conditional_assign` to actually be an in-place
operation rather than allocating, and uses it to impl
`BoxedResidue::pow`.

This leads to a fairly significant performance increase.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [27.769 µs 27.798 µs 27.830 µs]
                        change: [-39.063% -38.898% -38.724%] (p = 0.00 < 0.05)
                        Performance has improved.
Performs Montgomery multiplications and squarings in-place, avoiding
allocations, by constructing a reusable `MontgomeryMultiplier`.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [24.265 µs 24.274 µs 24.288 µs]
                        change: [-24.321% -24.194% -24.081%] (p = 0.00 < 0.05)
                        Performance has improved.
Groups all the `subtle`-based constant time code together under
`uint::boxed::ct`.

This is mostly the `ConditionallySelectable`-alike methods, but we can't
actually impl that trait due to its `Copy` bound.
Some were previously named `cond_*` instead
Adapts the implementation originally from #277 to `BoxedUint`, adding
the following methods:

- `BoxedUint::div_rem`
- `BoxedUint::rem`

Additionally, `wrapping_div` and `checked_div` have been changed to use
the constant-time versions, rather than `*_vartime`.
This avoids the compiler potentially optimizing away part or all of an
operation inside of a benchmark.

Using a larger `BoxedResidue` is both more indicative of real-world
usages as well as reduces noise in the benchmark.
tarcieri and others added 28 commits January 2, 2024 10:54
For consistency with `BoxedMontyParams::new_vartime` and our general
labeling strategy.
Notably `Odd` permits a simple reference conversion to `NonZero` which
makes it possible to clean up some tests.
Adds a constant-time equivalent to `MontyParams::new_vartime`
The vartime constructor is 2.75X faster:

Dynamic Montgomery arithmetic/MontyParams::new
                        time:   [5.8611 µs 5.8708 µs 5.8812 µs]

Dynamic Montgomery arithmetic/MontyParams::new_vartime
                        time:   [2.1284 µs 2.1405 µs 2.1567 µs]
Adds a trait for generating a random number of a given bit size.
Provide a more comprehensive overview of the main types in the crate
This is an oversight from #501 which switched to using Bernstein-Yang
inversions for `Uint::inv_odd_mod`.

The new implementation assumes `s` is always `Odd` but doesn't actually
check for that and set the `ConstCtOption` to be "none" accordingly.
Changes to a "little endian" `lo, hi` convention for the ordering of
arguments to concatenation methods and the ordering of the returned
2-tuple from split methods.

This is more consistent with the rest of the crate, and the
`Uint { limbs }` array which uses a little endian ordering.

Closes #519
- Expand documentation on `Limb` representations
- Bernstein-Yang doc cleanups
It has been removed in recent Rust releases, and replaced with a printed
warning:

rust-lang/rust#103877
It wasn't updated to reflect the low/high ordering changes from #526
It's really an implementation detail of the Bernstein-Yang inverter, and
not actually customizable by the caller, so get it out of the way.
This parameter is inferred via `Concat<Output = Uint<WIDE_LIMBS>`.
* Add `Integer::from_limb_like()`, `one_like()`, `zero_like()`.
* Add `Monty::params()` and `as_montgomery()`
Moves the `const WIDE_LIMBS` generic parameter from `MontyParams::new`
to the outer scope, since it's intended to be inferred via the `Concat`
trait rather than explicitly specified.
It's safer to use explicit types than to let the compiler infer them in
these situations
@Dustin-Ray Dustin-Ray closed this by deleting the head repository Jan 2, 2024
@Dustin-Ray
Copy link
Contributor Author

@drcapybara this also needs a rebase

Im going to close this and open a new PR to simplify that process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants