Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON compliant to_chars? #65

Closed
stephenberry opened this issue Jul 20, 2024 · 8 comments
Closed

JSON compliant to_chars? #65

stephenberry opened this issue Jul 20, 2024 · 8 comments
Labels
duplicate This issue or pull request already exists

Comments

@stephenberry
Copy link

I'd like to use dragonbox for serializing floats in Glaze. What I'm particularly looking for is a JSON compliant to_chars implementation that has the shortest string result. It seems like dragonbox's to_chars will print out ...E0 when it is not necessary.

The other requirement for JSON compliance is that infinity and nan should be printed as "null".

Could options to support these features be added to dragonbox or should I fork the code and edit it myself?

I'd also like the to_chars to be header only, but I don't mind doing this conversion myself and keeping it up to date with dragonbox.

Thanks for all the hard work you've put into this great library!

@stephenberry
Copy link
Author

Also, a number like 10.1 right now prints as 1.01E1, which adds 2 additional characters to the output, and it is harder for users to read.

I'm curious if you know what the performance impact of supporting this cleaner output would be?

@jk-jeon jk-jeon added the duplicate This issue or pull request already exists label Jul 20, 2024
@jk-jeon
Copy link
Owner

jk-jeon commented Jul 20, 2024

Hi, thanks a lot for your interest!

Since this is a duplicate of #35 I'm closing this, but please feel free to discuss further in this thread (or in #35) if you want.

To answer your specific questions:

It seems like dragonbox's to_chars will print out ...E0 when it is not necessary.
Also, a number like 10.1 right now prints as 1.01E1

You're right, please read my answer in #35.

Could options to support these features be added to dragonbox or should I fork the code and edit it myself?

Maybe in the future, but not planned at this moment. PR is welcome though!

Ideally, a hypothetical to_chars should do what std::to_chars do (except for the bound-checking API which I dislike) , i.e., to print a string with the least number of characters. But there is a nontrivial issue here: there are cases when fixed-point notation yields smaller number of characters, but the number of digits that will appear in the integer part of the fixed-point output exceeds the number of digits in the decimal significand that Dragonbox finds out. The simplest solution is to fill out missing digits with 0, which retains the roundtrip gurantee and the shortness guarantee, but it breaks the correct rounding guarantee. So for that case we must do something else, and I did not yet really thought carefully about what could be a possible solution. Please see fmtlib/fmt#3649 for more info.

I'm curious if you know what the performance impact of supporting this cleaner output would be?

I'm not sure. I will not be surprised either way. I think I saw supporting evidences in both ways.

I'd also like the to_chars to be header only,

To be absolutely honest, I really want to move away from header-only library 😃 constexpr is the only reason why I keep to_decimal header-only.

@jk-jeon jk-jeon closed this as completed Jul 20, 2024
@stephenberry
Copy link
Author

Thanks for your helpful reply. I will probably develop a to_chars implementation in Glaze using dragonbox::to_decimal under the hood. This can be an incubator for a potential pull request in the future. I just wanted to make sure I wasn't duplicating efforts.

I appreciate the prompt reply!

@jk-jeon
Copy link
Owner

jk-jeon commented Jul 20, 2024

Thank you so much!

@stephenberry
Copy link
Author

Using modified code from yyjson and Glaze I was able to incorporate dragonbox and serialize in the desired JSON format.
Here's the current implementation: https://github.com/stephenberry/glaze/blob/main/include/glaze/util/dtoa.hpp

I'm sure the performance can be improved, but it is quite efficient in its current form. Thanks for your library, it made this very straightforward!

@jk-jeon
Copy link
Owner

jk-jeon commented Jul 21, 2024

Here's the current implementation:

Didn't check all the details but looks pretty good to me. Just have a couple of comments:

https://github.com/stephenberry/glaze/blob/659bd8d2c5880d758162329c7d9e540f75f26c2c/include/glaze/util/dtoa.hpp#L125

As far as I can see this function does not remove trailing zeros, but is named as _trim. I initially supposed it's named as such because it removes them. I guess maybe it's better to call it _no_trim to contrast it to the one above it, unless there are other reasons to call it as _trim?

https://github.com/stephenberry/glaze/blob/659bd8d2c5880d758162329c7d9e540f75f26c2c/include/glaze/util/dtoa.hpp#L247

For IEEE-754 binary32, the exponent is at most of two decimal digits so you don't need this branching.

Your implementation does have the correct rounding issue I mentioned, but of course it is not necessarily a real issue depending on your goal. IIRC the JSON spec doesn't really mandate correct rounding, just roundtrip guarantee, but not sure if I'm remembering correctly. In any case I think it's good to be aware of this.

Thanks for your library, it made this very straightforward!

Really glad my work is being integrated into such a great library!

Also I love those small branchless tricks. Reminds me that I'm not really good at micro-optimizations lol

@stephenberry
Copy link
Author

As far as I can see this function does not remove trailing zeros, but is named as _trim.

Yeah, I had at first intended to trim and then realized it was simpler to use dragonbox's trimming. I updated the name.

For IEEE-754 binary32, the exponent is at most of two decimal digits...

Thanks for this observation.

IIRC the JSON spec doesn't really mandate correct rounding...

Round-tripping and shortness are the primary goals. So, I also think this is okay.

Also, I had to make a fix for denormalized float writing as JSON requires a 0 before the exponent. E.G. -1.E-47 must be written as "-1.0E-47"

@jk-jeon
Copy link
Owner

jk-jeon commented Jul 21, 2024

IIRC the JSON spec doesn't really mandate correct rounding, just roundtrip guarantee, but not sure if I'm remembering correctly.

So I looked it up, and it seems the spec is very blurry in this respect, and basically allows the implementations to do whatever they want. So.... whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants