Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More information from DecodeError #158

Open
vorner opened this issue Mar 2, 2019 · 2 comments
Open

More information from DecodeError #158

vorner opened this issue Mar 2, 2019 · 2 comments

Comments

@vorner
Copy link
Contributor

vorner commented Mar 2, 2019

Hello

I know you mentioned in #106 that you don't want to change the DecodeError. However, being able to get some information from the error might be useful to handle it differently ‒ yes, one of the possibilities is buffer underrun.

So I would be interested to know what your rationale for not wanting the change the error type is. I guess it's one of these:

  1. You want to prevent people from doing the „wrong thing“ of trying to decode something and refill the buffer only if it was too short, so you don't expose the information on purpose. Is it because some (all) the decoding routines consume the bytes even if they fail to parse? Does decode_length_delimiter (maybe this should be mentioned in its docs)?
  2. You don't want to expose too much internal details about the error type, because that would be an API level promise and could limit further future development. So, turning the error into an public enum would essentially limit the variants from being added in the future.
  3. It's too much work to do properly and ensure the details are reliable.

If it's just the second, would it be OK to still hide the details, but add some methods to it, like is_buffer_underrun? This seems to be the case other libraries take when they want to provide more info.

@danburkert
Copy link
Collaborator

Off the bat: I don't have a philosophical objection to exposing more error details if they are reliably available, but I believe this particular case isn't cut and dry.

It's not possible, in general, to distinguish between a malformed input and buffer underrun (just so we're on the same page, I'm taking that to mean the caller failed to pass in the full message worth of bytes). For example, when decoding a string field in a message. The length tag says the string field should be 100 bytes, but there is fewer than that remaining. Is that a buffer under-run, or a malformed length tag?

The situation is perhaps a bit more approachable if you limit yourself to only trying to distinguish between malformed input and buffer under-run in the case of decoding a top-level length-delimited message. In principal prost could return a special error variant calling out that case. My response, however, is that in an async context it's not an exceptional circumstance to have an underflow situation; it's an expected part of the control flow. So in those cases I think it's cleaner to have a standalone prost::decode_length_delimiter function which async callers can use to do the necessary control flow (either continue with decoding the message, or yield waiting for more input).

@vorner
Copy link
Contributor Author

vorner commented Mar 2, 2019

For example, when decoding a string field in a message. The length tag says the string field should be 100 bytes, but there is fewer than that remaining. Is that a buffer under-run, or a malformed length tag?

I probably don't have enough internal knowledge about how exactly protobufs work. I'd probably still think that case is a buffer under-run, which is a special case of malformed message. But for example a wrong type of a field (eg trying to put an integer into a string) would be another case of malformed message. For the underrun case, adding more bytes could but wouldn't have to „solve“ the error, while for the latter one, it surely wouldn't.

Anyway, I was actually reading the documentation of the prost::decode_length_delimiter function and its explanation of errors and 10 bytes. That one feels like very complex and indirect way to check if I should or should not wait for more data and that's the place where I would prefer to have the .is_buffer_underrun() instead of buffer.len() < 10 ‒ because, after all, having less than 10 bytes available is completely possible as the length of the last message, so I can't just wait to get full 10 bytes before calling this function. Then, after I know the length, waiting for exactly that much without calling into any decoding is fine (and probably best for performance anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants