Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider providing Rewind-style traits #330

Open
seanmonstar opened this issue Nov 27, 2019 · 8 comments
Open

Consider providing Rewind-style traits #330

seanmonstar opened this issue Nov 27, 2019 · 8 comments

Comments

@seanmonstar
Copy link
Member

Sometimes when encoding or decoding, it can be desirable to "rewind" a read or write operation. Some examples:

  • When decoding, after reading a u32 that describes the "length" of the rest of message, and then seeing that remaining() does not have "length" bytes, a user may wish to "unread" the "length" and wait for more data.
  • When encoding, it might not be known how long a value will compress to. After compressing part of the message and realizing there isn't room for the rest, a user could want to "unwrite" the in-complete part.

See also #299

cc @sfackler

@seanmonstar
Copy link
Member Author

An idea I've thought about for a couple hours is to have a separate trait, if there are buffer types that couldn't support this (do they exist?).

trait Rewind {
    fn position(&self) -> usize;
    fn rewind(&mut self, pos: usize);
}

@Ralith
Copy link

Ralith commented Nov 27, 2019

A convenient way to implement this would be a transaction-style adapter that stores the position at which it was constructed and can be either aborted or committed.

@carllerche
Copy link
Member

As an added note: peeking util fns would be useful to add here.

@carllerche
Copy link
Member

Also, instead of "rewinding" it probably should be generalized as scannable or indexable? I imagine buffers backed by linked lists would not implement this trait, but a rope could (despite O(log(n)) complexity).

That said, a linked list backed buffer could implement a strategy where the current position is saved by cloning the "iterator".

@carllerche
Copy link
Member

Now I wonder if the trait should be specifically for the case of a buf backed by a single slice. Then there could be helpers like “get_line”

@carllerche
Copy link
Member

@seanmonstar points out that a Buf backed by a single slice is "just" a Cursor<&[u8]>. However, we cannot add util fns to this type directly.

An alternate strategy would be for bytes to define its own Cursor type backed by explicitly by a &[u8]:

struct SliceBuf<'a> {
    bytes: &'a [u8],
    pos: usize,
}

This type could implement misc helpers such as "get_line", "peek", ... misc other types could have:

fn as_slice_buf(&self) -> SliceBuf<'_> {
    // ...
}

thoughts? @sfackler @seanmonstar

@vincentdephily
Copy link

I've been struggling with this while trying to convert an MQTT decoder to take a dyn Buf instead of a Bytes. MQTT packet is 1 "header" byte then 1-4 "length" bytes then length "payload" bytes (if length > 0). The codec API requires that I consume the bytes for exactly one such packet.

I could use buf.bytes() to peek at the first 1-5 bytes without advancing, but this only works if the buf uses contiguous memory. I could use buf.bytes_vectored() instead, but it is depressingly complicated for the task at hand, and requires std.

I'm guessing that peek_*() functions can be implemented by all backends (?), which seems preferable to adding a RewindableBuf trait (although the later seem to cover more usecases).

The transaction is tempting too (it'd be great if slice() was a Buf method, it'd also neatly solve my "don't read past the first packet" issue), but preventatively opening a transaction to cover a rare worst-case scenario seems a bit wasteful.

vincentdephily added a commit to vincentdephily/mqttrs that referenced this issue Jan 10, 2020
The initial idea was to have `decode()` take a `Buf` and `encode()` take a `ButMut`, as they
semantically should. It would have enabled our users to use other buffer backends.

Switching `encode()` to take a BufMut was straightforward, that's done.

Concerning `decode()`, `Buf` currently lacks facilities to rewind the read position
(tokio-rs/bytes#330) and to read a specified number of bytes
(tokio-rs/bytes#129), making it impossible to implement the desired API
unless we assume that our user is using a continuous-memory backend like BytesMut (and even then,
the implementation isn't really pretty).

It is possible to have `decode()` take a `Bytes` (as it semantically should) instead of a
`BytesMut`, but because users typically want to pass a `BytesMut` to `TcpStrem.read()` it would
require a `.freeze()` before passing to `decode()`, which is cumbersome and prevents buffer reuse.

With all that said, this commit does move things in the right direction, and if upstream fixes the
mentioned bugs it'll be that much easyer to switch.
vincentdephily added a commit to vincentdephily/mqttrs that referenced this issue Jan 10, 2020
The initial idea was to have `decode()` take a `Buf` and `encode()` take a `ButMut`, as they
semantically should. It would have enabled our users to use other buffer backends.

Switching `encode()` to take a BufMut was straightforward, that's done.

Concerning `decode()`, `Buf` currently lacks facilities to rewind the read position
(tokio-rs/bytes#330) and to read a specified number of bytes
(tokio-rs/bytes#129), making it impossible to implement the desired API
unless we assume that our user is using a continuous-memory backend like BytesMut (and even then,
the implementation isn't really pretty).

It is possible to have `decode()` take a `Bytes` (as it semantically should) instead of a
`BytesMut`, but because users typically want to pass a `BytesMut` to `TcpStrem.read()` it would
require a `.freeze()` before passing to `decode()`, which is cumbersome and prevents buffer reuse.

With all that said, this commit does move things in the right direction, and if upstream fixes the
mentioned bugs it'll be that much easyer to switch.
vincentdephily added a commit to vincentdephily/mqttrs that referenced this issue Mar 3, 2020
The initial idea was to have `decode()` take a `Buf` and `encode()` take a `ButMut`, as they
semantically should. It would have enabled our users to use other buffer backends.

Switching `encode()` to take a BufMut was straightforward, that's done. This solves a gotcha about
using `bytes::BytesMut` vs `bytes::bytes_mut::BytesMut`, and enables the use of other buffer types,
like `Vec`.

Concerning `decode()`, `Buf` currently lacks facilities to rewind the read position
(tokio-rs/bytes#330) and to read a specified number of bytes
(tokio-rs/bytes#129), making it impossible to implement the desired API
unless we assume that our user is using a continuous-memory backend like BytesMut (and even then,
the implementation isn't really pretty).

It is possible to have `decode()` take a `Bytes` (as it semantically should) instead of a
`BytesMut`, but because users typically want to pass a `BytesMut` to `TcpStrem.read()` it would
require a `.freeze()` before passing to `decode()`, which is cumbersome and prevents buffer reuse.

With all that said, this commit does move things in the right direction, and if upstream fixes the
mentioned bugs it'll be that much easyer to switch.
@little-dude
Copy link

little-dude commented Nov 19, 2024

The use case for me would be to rewind to populate a checksum field.

+---+----------+---+---+---+------------+
| A | checksum | B | C | D | payload ...
+---+----------+---+---+---+------------+
^                          ^

The data header contains a bunch of fields. The checksum is computed over the payload and part of the header, so it's the last field I'll set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants