Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): allow querying for user profile by ULID #3141

Merged

Conversation

wescopeland
Copy link
Member

@wescopeland wescopeland commented Jan 26, 2025

https://discord.com/channels/310192285306454017/1017568867574362283/1333195457597411460

API consumers need a stable value to query by, as a user's name is no longer stable and we probably don't want them querying by user ID. Therefore, we add a new column, ulid, to the user accounts table. API_GetUserProfile has been updated to return a ULID field as well as be queryable by ?i={ulid}.

Documentation for this API change will come after we've agreed this is the right direction & before merging.

This PR includes a migration and a mandatory sync command:

sail artisan ra:sync:user-ulids

If we're happy with this direction, I'll open an api-docs PR, as well as update the other endpoints.

Example

GET /API/API_GetUserProfile.php?y=redacted&i=00003EMFWR7XB8SDPEHB3K56ZQ
{
  "User": "wesweswes",
  "ULID": "00003EMFWR7XB8SDPEHB3K56ZQ",
  "UserPic": "\/UserPic\/WCopeland.png",
  "MemberSince": "2020-02-02 20:10:35",
  "RichPresenceMsg": "Playing Sonic the Hedgehog [Subset - Perfect Bonus]",
  "LastGameID": 29895,
  "ContribCount": 31962,
  "ContribYield": 255792,
  "TotalPoints": 27503,
  "TotalSoftcorePoints": 0,
  "TotalTruePoints": 164850,
  "Permissions": 4,
  "Untracked": 0,
  "ID": 117089,
  "UserWallActive": true,
  "Motto": "https:\/\/i.imgur.com\/ov30jeD.jpg"
}

@wescopeland wescopeland requested a review from a team January 26, 2025 22:55
@luchaos
Copy link
Member

luchaos commented Jan 28, 2025

Heads up: the user model maintains a Hash ID attribute for permalinks that probably serves the same purpose but are not exposed in the API yet. They were meant to replace references for user IDs and usernames in rich/shortcode text without exposing IDs (which would allow for enumeration) nor relying on "static" usernames.

You can find them on Filament user details pages; being genereated on the fly, are based on the actual user ID, calculated using Knuth's integer hash via the "Optimus" (Prime) package: https://github.com/jenssegers/optimus

As an example, this is my user's permalink which redirects to the canonical URL: https://retroachievements.org/u/1354722784957298787

Hash IDs were meant to replace pretty much all of the auto-increment IDs but might not be applicable nor needed anymore.

The reason I bring this up here is that if we choose to go with stored ULIDs there is no need for Hash IDs and vice versa. I don't hold a strong opinion on which one to move forward with but would argue that we probably should only maintain one of them.

@wescopeland
Copy link
Member Author

wescopeland commented Jan 28, 2025

Hey @luchaos! 👋

This does indeed add some nuance to the discussion.

I agree that both approaches are sound. It isn't a strong opinion, but I lean towards ULIDs only because Knuth prime hashes are reversible. Right now, it appears HasHashId is hashing against the primary key, which ultimately means someone clever could take the hash and calculate what your real user ID is.

We could (maybe?) change the hashing field to be User, but this could have the same problem if it's undesirable to not leak an original username.

@wescopeland
Copy link
Member Author

Actually, it looks like Optimus includes a random integer in the calculation. Given that, maybe it's better to not use ULIDs and use those hash IDs instead.

@wescopeland wescopeland marked this pull request as draft January 28, 2025 18:37
@luchaos
Copy link
Member

luchaos commented Jan 28, 2025

I'm very fine with going with ULIDs and support that decision.

Another downside to hash IDs is that if the configured prime numbers are lost for whatever reason all of the permalinks would change. Which is certainly not what a permalink should ever do 😅 Plus the resulting integers may vary in length - that's more of a cosmetic concern; still.

In this case, I think removing the Optimus package and transitioning permalinks to utilize the new ULIDs would be appropriate.

Thanks for considering my input and maintaining focus on the optimal solution, as usual 😊

@wescopeland wescopeland marked this pull request as ready for review January 28, 2025 18:52
@wescopeland
Copy link
Member Author

Another downside to hash IDs is that if the configured prime numbers are lost for whatever reason all of the permalinks would change. Which is certainly not what a permalink should ever do 😅

I hadn't considered that either. ULIDs it is, then!

@Jamiras
Copy link
Member

Jamiras commented Jan 28, 2025

Another downside to hash IDs is that if the configured prime numbers are lost for whatever reason all of the permalinks would change. Which is certainly not what a permalink should ever do 😅

This is only true if the hashes aren't stored in the DB (which they currently aren't).

We'd have the same problem with ULIDs. If we generate a ULID from the Created timestamp, there's still a possibility of multiple users joining at the same second. ULIDs are unique to the millisecond, and claim to handle clash resolution (though I'm not certain that's true for arbitrarily requesting a ULID for some timestamp in the past).

Even if we faked the millisecond value, there's no way to derive a millisecond from the user ID to ensure 100% uniqueness.

@luchaos
Copy link
Member

luchaos commented Jan 28, 2025

Actually, it looks like Optimus includes a random integer in the calculation. Given that, maybe it's better to not use ULIDs and use those hash IDs instead.

Whichever you prefer. I thought about storing hash IDs to never run into the loss problem.

The hashed IDs should be resilient. As you said, someone with enough compute power could theoretically figure it out. It would be security by obscurity against enumeration attacks - which are in turn unlikely to gain much insight.

@luchaos
Copy link
Member

luchaos commented Jan 28, 2025

Another downside to hash IDs is that if the configured prime numbers are lost for whatever reason all of the permalinks would change. Which is certainly not what a permalink should ever do 😅

I hadn't considered that either. ULIDs it is, then!

Yeah, the more I think about it the more I lean towards ULIDs myself. Much easier to reason about and well supported in Laravel. Less esoteric, too. I just liked the idea of hashed IDs and played around with it to be honest - as a shortcode replacement and permalinks introduction to have something conceptually fulfilling that purpose. ULIDs might as well do that.

@luchaos
Copy link
Member

luchaos commented Jan 28, 2025

Another downside to hash IDs is that if the configured prime numbers are lost for whatever reason all of the permalinks would change. Which is certainly not what a permalink should ever do 😅

This is only true if the hashes aren't stored in the DB (which they currently aren't).

We'd have the same problem with ULIDs. If we generate a ULID from the Created timestamp, there's still a possibility of multiple users joining at the same second. ULIDs are unique to the millisecond, and claim to handle clash resolution (though I'm not certain that's true for arbitrarily requesting a ULID for some timestamp in the past).

Even if we faked the millisecond value, there's no way to derive a millisecond from the user ID to ensure 100% uniqueness.

Very true. I guess the next step could be snowflakes - I had an implementation for that, too 😅 Doesn't make much sense without sharding though. And we'd be back to "esoteric" approaches that are not as native to Laravel as ULIDs or UUIDs.

Interesting problem - we do have quite a few users sharing the same creation date and time iirc. For future signups and retroactive generation some uniqueness check during creation with a simple retry loop might be enough.

I did a similar thing for UUIDs in the past - just because the chance for collision is non-zero.

@wescopeland
Copy link
Member Author

Great discussion!

The path of least resistance right now feels like ULIDs.

We need to handle two collision scenarios:

  • Historical data (the sync command)
  • New user signups / registration flow

For historical data, we can add randomized milliseconds during the sync. For new user signups, if there's a collision, we can clone the current timestamp, increment milliseconds and retry. If a certain # of collisions continues to occur on a registration (seems very unlikely), we can throw.

I believe this should gracefully resolve any potential for collisions.

@Jamiras
Copy link
Member

Jamiras commented Jan 28, 2025

We need to handle two collision scenarios:

  • Historical data (the sync command)
  • New user signups / registration flow

I don't think you have to worry about either. https://github.com/ulid/spec

Monotonic sort order (correctly detects and handles the same millisecond)

In testing the command, I naively just used Created, and you could see the dozen or so Scott alts created on 11/21/12 had ULIDs that only differed by one character. Adding the random milliseconds made the "random" part of the ULID truly unique between records.

I'm assuming the ULID implementation just has an internal counter that gets incremented for each call, so it will handle collisions when passed the same timestamp. And I believe that would prevent us from being able to consistently generate the same ULID from a given timestamp.

I think storing the values in the database solves this problem. And if two users do sign up at the exact same millisecond, they may get neighboring ULIDs, but not conflicting ones.

@luchaos
Copy link
Member

luchaos commented Jan 28, 2025

I don't think you have to worry about either. https://github.com/ulid/spec

Monotonic sort order (correctly detects and handles the same millisecond)

nice! learned something today, thank you!

app/Community/Commands/SyncUserUlids.php Outdated Show resolved Hide resolved
app/Community/Commands/SyncUserUlids.php Outdated Show resolved Hide resolved
@wescopeland wescopeland added the deployment/command Includes an Artisan command that needs to be run before/during deployment label Feb 1, 2025
@wescopeland wescopeland changed the title feat(api): allow querying for user profile by ULID (proof of concept) feat(api): allow querying for user profile by ULID Feb 1, 2025
@wescopeland wescopeland merged commit 62672aa into RetroAchievements:master Feb 1, 2025
9 checks passed
@wescopeland wescopeland deleted the get-user-profile-ulid branch February 1, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployment/command Includes an Artisan command that needs to be run before/during deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants