Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chacha20: 64-bit counter support #334

Open
tarcieri opened this issue Oct 25, 2023 · 21 comments
Open

chacha20: 64-bit counter support #334

tarcieri opened this issue Oct 25, 2023 · 21 comments

Comments

@tarcieri
Copy link
Member

The ChaCha20Legacy construction, i.e. the djb variant, is supposed to use a 64-bit counter but currently uses a 32-bit counter because it shares its core implementation with the IETF construction which uses a 32-bit counter.

This results in a counter overflow after generating 256 GiB of keystream. Compatible implementations are able to generate larger keystreams.

I'm not sure how much of a practical concern this actually is, but it did come up in discussions here: rust-random/rand#934 (comment)

We can probably make the counter type generic between u32/u64 in the core implementation if need be.

@nstilt1
Copy link
Contributor

nstilt1 commented Dec 17, 2023

I have some seemingly working code for this, but it builds on the Rng PR with variants.rs. Would you like me to try to adjust it for the current state of chacha20 or wait for the potential merge before opening a PR for it?

@tarcieri
Copy link
Member Author

Let's try to land #333 first

@nstilt1
Copy link
Contributor

nstilt1 commented May 30, 2024

I took another look at my 64-bit counter code, and it isn't working correctly on the NEON backend atm.

It uses a const bool in variants.rs to determine which way the counter should be added in the backend, but if there's a better way to do this that doesn't involve modifying the backend code... it might be better. It is currently failing a test shortly after reaching the u32::MAX block pos.

The largest use-case for a 64-bit counter (that I can think of) is file system encryption for mobile devices. I've ran some benches, and chacha20 and even chacha20poly1305 is faster than AES on aarch64. I've benched it on the cloud, on a compute EC2 instance. I would imagine that it's also faster on mobile devices. So... if the 64-bit counter only works for x86 and the software implementation, then there wouldn't really be a point if the main use case is on aarch64.

I will keep working on it a little, but I'd probably prefer a solution that doesn't need to modify the backends, if that's possible. Because the way it is now, every new backend would also need to have specific code for updating the counter

@nstilt1
Copy link
Contributor

nstilt1 commented Jan 1, 2025

On second thought, I feel like more control over a larger nonce would be favorable over having a larger counter

@tarcieri
Copy link
Member Author

tarcieri commented Jan 3, 2025

@nstilt1 not sure what you mean by that? Are you suggesting making the construction generic over the nonce size rather than a larger counter?

@nstilt1
Copy link
Contributor

nstilt1 commented Jan 3, 2025

@tarcieri I'm suggesting that 64-bit counters are unnecessary, contrary to what I've said previously. I think having 96 bits for a nonce is more practical because it is easier to prevent nonce reuse with finer control over the value. What do you mean about making the construction generic over the nonce size?

@tarcieri
Copy link
Member Author

tarcieri commented Jan 5, 2025

I wasn't sure what you meant by "more control over a larger nonce".

I agree practically larger counter support probably doesn't matter which is why the current implementation avoided dealing with it. I'm not sure the added complexity is worth it, though it could be considered a compatibility bug with the legacy implementation which is somewhat surprisingly pervasive.

@ChocolateLoverRaj
Copy link

Here is my use case which currently panics, but would not panic (I think) with 64 bit counters:

Encrypting a stream of files to backup a file to S3 object. It is reasonable that I might have a stream that is >256GiB long.

I did not write the backup program yet. It is in progress. Here is something I tried out if I imagine continuing an in progress upload at 500GiB:

let key = [0x42; 32];
let nonce = [0x24; 12];
let mut cipher = ChaCha20::new(&key.into(), &nonce.into());
const SEEK_AMOUNT: usize = 536_870_912_000;
cipher.seek(SEEK_AMOUNT);

But currently the seek panics.

I am new to file encryption so let me know if my use case is invalid and there is something else I should be using other than ChaCha20 (I don't need protection against the data being modified, I just want the files to not be readable without the key).

@nstilt1
Copy link
Contributor

nstilt1 commented Jan 10, 2025

You could change the nonce every so often, such as increment a portion of the nonce once every 256 GiB of encrypted data. But I could also revisit #359. I'll probably use a different PR since that one is kinda far behind.

The only problem is that the code would be slightly different if #380 were to be merged. I could put the counter support in #380, or #380 could just be closed and I could put the counter support in a new PR. Also regarding this:

I'm not sure the added complexity is worth it

The changes made in #359 worked. If I'm going to revisit this issue, the backends will look almost exactly like they do in that PR. If desired, I could make some adjustments, such as making an add_counter! macro for each SIMD backend. That makes it so that the counter logic is written once per backend, reducing the chances of a typo while (supposedly) maintaining the compile-time evaluation of the logic. The macro would look similar to add_counter! in neon.rs, and I personally think that using this macro reads a little bit better than without the macro.

@tarcieri
Copy link
Member Author

tarcieri commented Jan 10, 2025

@ChocolateLoverRaj are you worried about encrypting a single file that is 256 GiB, or the sum total of multiple files being larger?

If it's the latter, you should split up the encryption so each file is encrypted under a different key/nonce.

@ChocolateLoverRaj
Copy link

@ChocolateLoverRaj are you worried about encrypting a single file that is 256 GiB, or the sum total of multiple files being larger?

My plan is to create a stream of all the file changes (including the contents of all new files). So every file will be <256GiB but all of the files together might result in a backup >256GiB. I will be splitting the stream up into 5GB chunks (because AWS limit), but I would prefer to have a single cipher for the entire big stream and split up the encrypted stream rather than have to create multiple ciphers for different chunks of the stream.

If I do split it up the stream and use multiple ciphers, can I use the same key and increment the nonce (like using a nonce of 0 for part 0 and 1 for part 1, etc), as long as I don't use the same nonce twice?

@tarcieri
Copy link
Member Author

If I do split it up the stream and use multiple ciphers, can I use the same key and increment the nonce (like using a nonce of 0 for part 0 and 1 for part 1, etc), as long as I don't use the same nonce twice?

Yes though for large files it would be better to use a unique key per file

@ChocolateLoverRaj
Copy link

Yes though for large files it would be better to use a unique key per file

Why?

@tarcieri
Copy link
Member Author

256 GB is the data volume limit of e.g. Poly1305 as an authenticator, which you should be using in tandem with ChaCha20 in the combined ChaCha20Poly1305 AEAD cipher to prevent chosen ciphertext attacks

@tarcieri
Copy link
Member Author

256GB is also generally quite a bit beyond what most computers can hold in RAM, and really you should be working with authenticated AEAD messages you can hold in RAM.

@tarcieri
Copy link
Member Author

I wonder if it would be possible to make a newtype wrapper which implements the djb variant in terms of the IETF variant by encoding a portion of the counter in the nonce and handling incrementing it where necessary

@nstilt1
Copy link
Contributor

nstilt1 commented Feb 1, 2025

I don't think I know how to do that. If we aren't going to be revising the backends to achieve this, then I suppose I ought to go ahead fix #381.

I was thinking about changing the cipher backends to only use a 64-bit counter, but the rounds functions handle counters which the rng backends also use, so for the rng to internally use a 32-bit counter and the cipher to internally use a 64-bit counter, rounds would need to be duplicated. 64-bit counters could be acceptable for cipher since the wrappers panic at the end of the 32-bit counter keystream. It might not be so bad for the RNG to internally use a 64-bit counter and have a 96-bit nonce, with a little bit of overlap. The problem with that is explaining how that works in the docs, and ensuring that setting the counter with a u32 does not overwrite the upper 32 bits of the nonce

Btw, I have a big school project due at the end of the month, so I may be a bit less active

@nstilt1
Copy link
Contributor

nstilt1 commented Feb 2, 2025

If everything were to use a 64-bit counter under the hood, we might also want to adjust some things in rng.rs so that:

  • setting the block pos/word pos using 32 bits does not overwrite the upper 32 bits of the stream id/nonce
  • setting the nonce using 64 bits does not overwrite the upper 32 bits of the block pos

Here is a high-level overview of StreamId and fn set_stream(), assuming that core::mem::size_of() works on generic types of type Into<T>:

/// A wrapper for the `stream_id`.
///
/// The following types will overwrite the upper 32 bits of the 64-bit counter
/// * `[u32; 3]`
/// * `[u8; 12]` or a
/// * `u128`
///
/// The following types will preserve the upper 32 bits of the 64-bit counter
/// * `[u32; 2]`
/// * `[u8; 8]` or a
/// * `u64`
pub struct StreamId([u32; 3]);

/// A wrapper for the `block_pos`.
/// 
/// Using a 32-bit value will preserve the upper 32 bits of the stream ID. 
///
/// Using a 64-bit value will overwrite the upper 32 bits of the stream ID.
///
/// Block pos accepts:
/// * u32
/// * u64
/// * [u32; 2]
pub struct BlockPos([u32; 2]);

/// Sets the stream ID. Providing a 64-bit value will preserve the upper 32 bits of the 
/// stream ID (and the upper 32 bits of the counter)
///
/// Providing 96-bit values will overwrite the upper 32 bits of the counter.
pub fn set_stream<S: Into<StreamId>>(&mut self, stream: S) {
    let bytes = core::mem::size_of::<S>();
    let stream: StreamId = stream.into();
    if bytes >= 12 {
        // write 96 bits to the nonce
    } else {
        // write 64 bits to the nonce
    }
    // continue
}

// do something similar with pub fn set_block_pos()

I know it adds a little more boilerplate, and it could certainly introduce a bug if a user were to only use 32-bit values for setting the counter, and then the user exhausts the 32-bit counter stream and tries (and fails) to reset the counter using a u32... but realistically, the only way a user will exhaust the stream (surpass block pos = u32::MAX) is if they set the block pos/word pos to an extremely high value. It's definitely possible for someone to misuse the RNG in that manner by setting the block/word pos to arbitrary values, and then end up with a broken system

@tarcieri
Copy link
Member Author

tarcieri commented Feb 2, 2025

I don't think I know how to do that. If we aren't going to be revising the backends to achieve this

The idea is all backends could remain 32-bit.

The IETF variant repurposes 32-bits of the counter space to extend the nonce to 96-bits.

So, when the 32-bit IETF ChaCha20 counter is exhausted, another way to implement a 64-bit counter would be to increment another counter stored in the lower 32-bit portion of that 96-bit nonce, then the newtype can re-initialize the inner cipher with that new nonce containing the incremented counter, and keep doing this in 256GB chunks of keystream.

I was thinking about changing the cipher backends to only use a 64-bit counter

The main problem with this is it opens up potential bugs where the 32-bit counter of the IETF variant isn't properly respected, which could potentially lead to nonce reuse if the counter ever overflows 32-bits.

@nstilt1
Copy link
Contributor

nstilt1 commented Feb 2, 2025

I was thinking about changing the cipher backends to only use a 64-bit counter

The main problem with this is it opens up potential bugs where the 32-bit counter of the IETF variant isn't properly respected, which could potentially lead to nonce reuse if the counter ever overflows 32-bits.

The cipher panics when it overflows though... I know it's kind of silly for the internal counter to be 64-bit while restricting the end user to a 32-bit counter, but there wouldn't be any extra if statements or macros in the backends than there are right now (besides the extra boilerplate in rng.rs). I'll admit, there could still be bugs when the rng.rs counter overflows 32 bits, but if the documentation is clear enough, that could reduce issues happening in the wild

To me, it just seems a little bit tricky to regulate the counter from outside of the backends, aside from panicking the way it does now. It would need to be able to handle the edge case where the block_pos = u32::MAX by possibly restricting the internal block pos to be a multiple of 4, so that way it always overflows evenly at state[12] = 0 when processing 4 blocks at a time... but that exact approach probably wouldn't work with the soft.rs backend, and the actual cipher wrapper itself only has a 1 block buffer if I recall correctly... so this specific approach probably wouldn't even work (the idea might work, just not in the way that I described it just now). Like I said, it just seems a little tricky

@nstilt1
Copy link
Contributor

nstilt1 commented Feb 3, 2025

I suppose that that approach might work if there was a new wrapper that used ParBlocks as the buffer, and then impl ParBlocksSizeUser or something called MaxParBlocksSizeUser for the core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants