Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate from zero-sized fragment and no fragment in url #779

Open
lu-zero opened this issue Jul 13, 2023 · 7 comments
Open

Differentiate from zero-sized fragment and no fragment in url #779

lu-zero opened this issue Jul 13, 2023 · 7 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api

Comments

@lu-zero
Copy link

lu-zero commented Jul 13, 2023

scheme://host:port/#

and

scheme://host:port/

if fed to the URL do not distinguish between the two: URL.hash returns ''

and to make it even stranger passing .hash = '#' produces scheme://host:port/# but calling .hash returns '' nonetheless.

would be nicer if .hash returns undefined/null if it is unset or "#" if the trailing hash is present.

@annevk annevk added topic: api needs implementer interest Moving the issue forward requires implementers to express interest addition/proposal New features or enhancements labels Jul 13, 2023
@annevk
Copy link
Member

annevk commented Jul 13, 2023

We cannot change the existing API, but I'm somewhat supportive of adding API surface for this as it is indeed hidden information. For search too. (hasSearch & hasHash seem more palatable.)

Having said that, is there evidence on Stack Overflow or in popular JS libraries that this is a shortcoming people have to work around?

@lu-zero
Copy link
Author

lu-zero commented Jul 14, 2023

I found the problem while looking at how the url fragment is supported across languages while working at another standard, so I cannot tell you how widespread this need is within JS, I guess we'll have to make a note and signal the pitfall.

What is surprising me even more is that you do not get what you set.

let url = new URL("scheme://host/path/");
console.log(url.hash);
url.hash = "#";
console.log(url.toString()); // -> scheme://host/path/#
console.log(url.hash); // -> ''
url.hash = "#a";
console.log(url.toString()); // -> scheme://host/path/#a
console.log(url.hash); // -> '#a'

@karwa
Copy link
Contributor

karwa commented Jul 14, 2023

I agree that this part of the JS URL API is awkward. To give another data point: in my library WebURL, which implements the WHATWG standard in Swift, I made this change ("not present" is communicated as nil, not as an empty string) and some other tweaks.

WebURL uses nil to signal that a value is not present, rather than an empty string.
This is a more accurate description of components which keep their delimiter even when empty. For example, consider the following URLs:

http://example.com/
http://example.com/?

According to the URL Standard, these URLs are different; however, JavaScript’s search property returns an empty string for both. In fact, these URLs return identical values for every component in JS, and yet still the overall URLs compare as not equal to each other. This has some subtle secondary effects, such as url.search = url.search potentially changing the URL.

WebURL avoids this by saying that the first URL has a nil query (to mean “not present”), and the latter has an empty query. This has the nice property that every unique URL has a unique combination of URL components.

I appreciate that the JS API cannot be changed at this point, though.

@alwinb
Copy link
Contributor

alwinb commented Jan 7, 2024

Host has this problem too.

  • You cannot distinguish sc:///foo from sc:/foo, nor can you distinguish sc: from sc:// by inspecting the properties of their corresponding URL objects (other than the href itself).

There is this classic post according to which query and fragment have been in use fairly consistently to refer to the search without the ? sigil and the hash without the # sigil.

So one option is to fix search and hash and make them available as query and fragment instead. The search and hash getters / setters can then be marked as legacy or deprecated (but not removed).

@pimterry
Copy link

Having said that, is there evidence on Stack Overflow or in popular JS libraries that this is a shortcoming people have to work around?

I've run into this problem myself, in multiple projects and libraries, in both Node & browsers.

Right now I'm building developer tools, where URLs are taken as string input, parsed, and manipulated by component, and preserving the raw formatting where possible is useful. Not being able to differentiate between /? and / and the end of a URL is quite inconvenient! I'm still using Node's url.parse in some places in part because it does not have this behaviour and that's important.

Of course this state does exist within the URL parser (the URL's internal query and fragment states in the spec do store empty & null differently) but it's just not currently exposed the same way in search & hash (in both cases, both null and empty are exposed as '').

Totally understand that changing the existing API is impractical. Either of the options proposed here so far would work well in scenarios like mine:

  • hasSearch and hasHash booleans to distinguish no-delimiter vs delimiter-but-empty-value (or has{Search,Hash}Delimiter, if we want to be even more explicit)
  • query & fragment fields that do always include the delimiter as it was originally parsed, so they're set even if the value itself is empty

The latter is definitely more convenient as a user (fullPath = url.pathname + url.query + url.fragment would effectively reproduce the original relative url components - which it does not do today!) but both are workable, and the confusion of two very similar fields with almost always identical values might not be worthwhile.

@annevk
Copy link
Member

annevk commented Dec 2, 2024

I would still like to solve this. There is a constraint around naming that makes this trickier. Namely, I don't think we want to introduce a new term for "search" or "hash". That would not work for "hostname", which has the same issue as @alwinb pointed out. We also have APIs building upon those names, e.g., URLSearchParams and No-Vary-Search, and I would strongly prefer we stay with a single term within the web-exposed universe for each of these concepts.

I see these options for API shape:

  1. hasX -> boolean.
  2. x -> string. Empty string to denote null in the model. ? or # to denote the empty string in the model.
  3. x -> null or string. Null to denote null in the model. A string to denote the string in the model (i.e., without a leading ? or #).

2 is argued for above, but I personally think that is an awkward API to use as you have to use string manipulation to get to the actual value. It also does not work for "hostname". 1 or 3 seem reasonable to me, but I don't really know what name would be okay for 3.

Some options:

a. hashData
b. hashString (though this would be a better fit for 2)
c. hashValue

I personally prefer API shape 1, followed by 3 using the c naming scheme.

Thoughts? cc @hayatoito @valenting @domenic @rmisev

@valenting
Copy link
Collaborator

I feel like no 3 is the most JavaScript-y API, but I'm definitely no expert.
API no 1 is similar to what we use internally in Firefox - and I suspect this one would cause the least confusion with the existing getters.
Option 2 works best to cover the gaps in this testcase
@annevk I think any of the proposed options would work equally well. We just need to reach an agreement as to which one is the best fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api
Development

No branches or pull requests

6 participants