Implementation Plan: Proxying frontend API requests through the Nuxt server #5265

obulat · 2024-12-11T09:09:47Z

Due date:

2024-01-11

Assigned reviewers

Description

Related to #3473

Current round

This discussion is following the Openverse decision-making process. Information about this process can be found
on the Openverse documentation site.
Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.

This discussion is currently in the Decision round. The deadline for review of this round is 2024-01-11

Revision details

[2024-01-08] I updated the plan based on the comments. The main changes are:

Proxy All Frontend Traffic: Updated the plan to proxy all frontend routes, not just the /api routes. This will make the rate-limiting setup easier and eliminate the need to integrate modules like Nuxt Turnstile. Handling Cloudflare challenges directly in the Nuxt server avoids the complexity of requiring the Pinia media store to process challenge responses instead of result responses.
Use h3 session for identifying users: The plan now uses h3 sessions to create a verifiable session header/cookie. Cloudflare will validate that the header/cookie exists, allowing requests to pass even if the cookie is invalid (e.g., generated by automated requests). The Nuxt server will then decode the cookie using the secret and verify its validity. Invalid cookies will trigger a 401 Unauthorized response from the Nuxt server, preventing these requests from reaching the Django API. This ensures robust protection against automated abuse.
Shared IP scenario testing: Detailed a method to test shared IP scenarios using two different browsers or an incognito window to simulate multiple users. This approach demonstrates how authenticated users bypass stricter limits while anonymous users face challenges if limits are exceeded.
Excluding static asset routes: Explicitly stated that static asset requests (e.g., /_nuxt/, .css, .jpg) are excluded from rate limiting to streamline traffic and reduce unnecessary challenges.

Here's a branch with a draft of changes implementing this project (single commit for all changes): https://github.com/WordPress/openverse/tree/add/api-proxying

github-actions · 2024-12-11T18:42:04Z

Full-stack documentation: https://docs.openverse.org/_preview/5265

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

New files ➕:

sarayourfriend

I've left some comments suggesting clarifications or alternatives. Happy to chat more about this. Excited to see this project come to light, I think it will massively improve the user experience for a major category of Openverse users (in education) as well as make the project's ingress more sustainable and manageable for the maintainers. Kudos on the IP!

sarayourfriend · 2024-12-11T22:50:34Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+- From the
+  [k6 implementation](https://github.com/WordPress/openverse/pull/4908), copy
+  the helper function that signs the request with HMAC.


Why is HMAC signing of the requests required? What is the goal with it?

For context on my question, the HMAC approach was designed to bypass Cloudflare's rate limiting for the frontend, without the Nuxt server needing to know anything about the authorization/limits. It isn't necessary to use that method for the API because Cloudflare should be configured to bypass rate limiting for any API requests that contain an Authorization header (as the API itself will handle those rate limits).

In other words, I would think the request flow could go like this, and not require any HMAC at all for regular browser users of the Openverse website:

> User requests Openverse page and receives a response with a unique session ID in set-cookie

> User executes a client-side search -> Cloudflare checks rate limit for the IP + session ID: challenge if needed; otherwise ...

> Request sent to the new Nuxt /api/... proxy routes

> The request is forwarded by Nuxt to the API with the API authentication method the Nuxt server already uses

> Cloudflare does not rate limit because an Authorization header is present on the request

> The API does not rate limit because the Authorization header has a rate-limiting exempt token

> Response is sent back to Nuxt

> Nuxt removes any response information that could expose Authorization information

> Sends response body back to the client

I wrote a long reply first, and then realized that my thoughts were valid if only the /api/ routes are behind the rate limiting. If all frontend routes are behind the rate limit rules, we don't need to do anything special for protecting the /api/ routes (that can send unlimited requests to the API) from automatic abuse, as they would be behind the frontend rate limiting rules.

@sarayourfriend, your bullet points helped me picture this and made me realize that the cookie is probably an important part I didn't get. Thank you for summarizing it.

sarayourfriend · 2024-12-11T22:52:50Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+  authentication, HMAC)
+- send the request using `ofetch` (or `ofetch.raw` to extract the response
+  headers for `SEARCH_RESPONSE_TIME` Plausible event)
+- handle errors and return the appropriate response to the client


Should the route also reject requests as Unauthorized (401) if they are missing the session ID cookie? The API proxy routes should only be used for client-side requests of search.

Good idea. I had to write the draft of the implementation to see how it would work.

in the beginning of the server setup (e.g., in the init-stores.server plugin, if it's renamed init.server), set the sessionId cookie that is httpOnly (if it does not exist).

send the request from the store to the /api route.

On the client, the httpOnly will automatically be sent.

On the server, the request will be sent between the different parts of the Nuxt app, without the browser passing the cookies. We need to specifically pass the cookies using getProxyRequestHeaders from h3 to extract them and pass them on with the $fetch request. Since we are on the server, the httpOnly cookies are also available.

sarayourfriend · 2024-12-11T22:58:03Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+envisage that we will need to adjust the limits many times because testing the
+Cloudflare rate limiting locally is not possible.
+
+### Use server routes in the frontend when the feature flag is on


I think that leveraging the @openverse/api-client package would allow some simplification here. If the Nuxt API proxy routes are designed to match the same path-specification as the API itself, then the same API client can be used by swapping out the base URL based on the feature flag and whether the app is running server side or client side. To the rest of the application code, there would be no difference in how the requests are made.

It's probably more work up-front, because existing code would need to be changed to use the @openverse/api-client, so worth considering whether it's something to plan for in the future. But even without using the @openverse/api-client package, the same strategy of swapping the base URL could be used with the api-service and avoid needing multiple different functions.

I think using the Nuxt-native methods would be better for logging and error handling of the requests on the server.
During the transition period, there would need to be separate branches for requesting the proxied/non-proxied urls. However, it would be more convenient to handle errors within the server route, where we can log all requests and errors on the server.

We could use the same naming for the server routes as the API has (so, instead of /api/search/image/?q=cat have /v1/images/?q=cat), which would make the conversion of the URLs simpler.

Fair enough! For what it's worth, it's possible to swap the underlying fetch implementation to pass options through to Nuxt's version, if it has better logging and other options. Check out the fetch option to createClient in the docs.

The main benefit of using @openverse/api-client is unifying the API and types, and it provides a convenient way to swap out the target URL as needed. It would be possible to re-write the paths dynamically in the function passed to fetch, so that the Nuxt routes don't need to reproduce exactly the same paths as the upstream API.

Anyway, just an alternative idea in case it is useful. The plan you've got doesn't look bad to me at all 🙂

sarayourfriend · 2024-12-11T23:10:18Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+- **Always Issuing a Challenge Without Thresholds**: Issuing a managed challenge
+  to every new user, regardless of load or conditions, would inconvenience users
+  behind NATs and degrade the initial user experience. While simpler to
+  implement (we would not need to calculate the exact rate limit), it fails our
+  design goal of minimizing friction for normal users.


Issuing a managed to every new user, regardless of load or conditions

Can you explain further why every new user would be issued a challenge? For what it's worth, the managed challenge "should" most of the time be a JavaScript challenge, which requires no user interaction.

To reduce friction without needing to introduce the turnstile module and involving the application code, you could have a separate (or no) rate limit for the home page, relying on that being the most common entry point for users where they would receive the original cookie to carry for the rest of their session.

To clarify the situation behind a NAT, it might be overly cautious to think that the rate limit will be tripped so quickly. Imagine a large school with maybe 3 computer labs having a simultaneous class, 50 students each. That is 150 fully unauthenticated requests from a single IP that need to happen for everyone to get a cookie. If the rate limit of a cookie-less request is set to 150 requests / 30 minute period, and it just takes a managed challenge to clear it for the IP, I don't think this would introduce very much friction at all.

For users on low spec Android phone, they would probably avoid ever seeing the managed challenge and get the cookie, and then just be subject to the regular rate limit.

sarayourfriend · 2024-12-11T23:14:31Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+of going directly to the API, requests will go to the Nuxt server routes
+(`/api/**`). The main difference is that, when rate-limited, the client will
+receive a Cloudflare challenge response that will be handled by
+[Nuxt Turnstile Module](https://github.com/nuxt-modules/turnstile).


It seems like turnstile is mostly geared towards websites that want to use Cloudflare challenges without proxying requests through Cloudflare itself. At least, that's what the Turnstile marketing page indicates as the primary target use case. Openverse proxies requests through Cloudflare already. Can you clarify the goal behind using Turnstile rather than regular Cloudflare managed challenges, exactly?

sarayourfriend · 2024-12-11T23:17:22Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+- **Multi-language Support**: Challenges support multiple languages [^2],
+  although the number of supported languages is lower than the number of
+  languages supported by Openverse.


FWIW, it's possible to implement custom versions of Cloudflare's error and challenge pages: https://developers.cloudflare.com/support/more-dashboard-apps/cloudflare-custom-pages/configuring-custom-pages-error-and-challenge/

I'd recommend doing this for the rate-limit response, for example, so that users get an Openverse branded page explaining that they are making requests too quickly, and advising them how long they need to wait until they can make another request, or how to get in contact with the maintainers if they believe there is an error.

This should eliminate language-support issues for the error responses, ensure that the presentation language is consistent across the session (e.g., using Openverse's own methods for identifying which language to present the page in, which may differ from Cloudflare's methods), and ensure a cohesive presentation overall. I don't know if there is a way to configure the language of the challenge itself, but maybe Cloudflare will make a best-guess based on the lang-attribute on the custom page? In any case, it would be a better experience than the entire page being inconsistent with the rest of the website.

sarayourfriend · 2024-12-11T23:22:28Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+Originally, I thought of using Cloudflare managed challenges to protect the Nuxt
+server routes from abuse. However, this would have required issuing a challenge


A clarification here: "to protect the Nuxt server routes from abuse".

For the solution to work, it's necessary for all routes to be behind the same Cloudflare rate limit rules, not just the server routes. That way, if someone gets rate limited on the server routes, the frontend can reload the page and the user will actually get the managed challenge page for clearing the rate limiting (or the explanation page if they are going too fast and need to actually be limited).

If only the server routes are limited by the new configuration, then there's no way to present the user with a challenge that will clear that rate limit, and the user would be stuck.

I mention in another comment that I don't think this would require issuing a challenge to every new user and would not meaningfully increase friction in the vast majority of even extreme cases. Regardless, it is necessary for all routes to have the same rate limit rules for this Cloudflare-rate-limit strategy to work with the safe guard of users being able to clear the rate limit as needed via the challenge.

After reading through your comments and thinking more about this, I agree that we should have all routes behind the same Cloudflare rate limit rules.

That way, if someone gets rate limited on the server routes, the frontend can reload the page and the user will actually get the managed challenge page for clearing the rate limiting (or the explanation page if they are going too fast and need to actually be limited).

In the original implementation, the turnstile module was supposed to handle the cases when the users get the managed challenge instead of the results from the server routes. The module would temporarily show the "turnstile" page (this is not for the Cloudflare turnstile product, but the page that handles the HTML response with the challenge from the proxy and then saves the cookie to clear the challenge), and then allow to show the target page.
Here are the docs on Cloudflare integration of Turnstile with WAF rules: https://blog.cloudflare.com/integrating-turnstile-with-the-cloudflare-waf-to-challenge-fetch-requests/ It gives the JS snippet to handle the challenge response from the API.

However, it would be easier to simply use the rate limiting for all frontend paths.

Gotcha! I was not connecting the dots on the turnstile handling the flow of presenting the challenge for the subset of rate limited routes.

krysal

Thank you for writing this detailed plan, @obulat. It's a nice explanation of the steps to incorporate Cloudflare-provided rate-limiting for the frontend and better control the use of Openverse. It sounded quite challenging at first, but I now have a clearer idea of what it entails after reading it. I only have a couple of loose points, namely:

The cookie that will help identify each user
How will be the local usage of Openverse after these changes are applied

krysal · 2024-12-17T02:52:56Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+server routes from abuse. However, this would have required issuing a challenge
+to every new user, which would have been a poor user experience. Instead, we
+will use Cloudflare rate limiting to protect the Nuxt server routes from abuse.
+The rate limiting will be based on the IP and cookie combination, and will


Could you detail how the (session?) cookie will be generated and how will it help to identify individual requests and differentiate them from all those using the same IP?

krysal · 2024-12-17T03:01:06Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+This is the part of the plan I'm most unsure about. I couldn't understand the
+details of how this module works, so we would probably need to do a lot of
+testing to make sure it works as expected. Unfortunately, we cannot test
+Cloudflare challenge responses locally because the proxy is only set up in
+staging/production. We will have to use staging for this, and to make the
+testing easier, we can set the staging rate limits to be very low.


Understandably, this module needs further research. Also, preferably, there should be a way to keep the frontend working locally, so maybe turning this module off or bypassing it?

krysal · 2024-12-17T03:19:21Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+This implementation plan outlines how we will route all frontend traffic to the
+API through Nuxt server routes — effectively “authenticating” these requests at
+the API level — and configure Cloudflare rate-limiting to protect these routes


Would requests from the same user also be identified at the API level? I understand that everything will be seen as "coming from the frontend." For example, is the cookie value passed to the API, too?

krysal · 2024-12-17T03:41:01Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+- From the
+  [k6 implementation](https://github.com/WordPress/openverse/pull/4908), copy
+  the helper function that signs the request with HMAC.


@sarayourfriend, your bullet points helped me picture this and made me realize that the cookie is probably an important part I didn't get. Thank you for summarizing it.

openverse-bot · 2024-12-20T00:00:10Z

Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR:

@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 6 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)².

@obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

dhruvkb

I don't have much to add beyond what @sarayourfriend and @krysal have already covered. So 👍 from me to move into the next round of incorporating their feedback.

The turnstile module was a confusing aspect of the plan for me but it seems from existing feedback that a second version of the IP will not be needing it. (I will still check out the module, in case it still ends up being used for the plan).

dhruvkb · 2024-12-20T09:05:55Z

...s/proxy_frontend_api_requests/20241202-implementation_plan_proxy_frontend_api_requests_ip.md

+- **Proxying thumbnail requests** The thumbnail requests from the frontend will
+  remain anonymous on the API level. Since controlling search and related
+  requests naturally throttles excessive thumbnail retrieval, we consider it
+  unnecessary to further load the Nuxt server.


Is this also similarly applicable for audio waveforms?

Yes, I forgot about the waveforms 🤦

obulat requested a review from a team as a code owner December 11, 2024 09:09

obulat requested review from krysal and dhruvkb and removed request for a team December 11, 2024 09:09

obulat added 🟧 priority: high Stalls work on the project or its dependents 🧭 project: implementation plan An implementation plan for a project labels Dec 11, 2024

openverse-bot added 🧱 stack: documentation Related to Sphinx documentation 🏷 status: label work required Needs proper labelling before it can be worked on labels Dec 11, 2024

obulat added 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository and removed 🏷 status: label work required Needs proper labelling before it can be worked on 🧱 stack: documentation Related to Sphinx documentation labels Dec 11, 2024

obulat self-assigned this Dec 11, 2024

obulat added 🧱 stack: mgmt Related to repo management and automations 🧱 stack: documentation Related to Sphinx documentation and removed 🧱 stack: mgmt Related to repo management and automations labels Dec 11, 2024

obulat mentioned this pull request Dec 5, 2024

Proxy frontend API requests through Nuxt #3473

Open

2 tasks

obulat force-pushed the add/proxy-ip branch from fd3612d to 880c8b5 Compare December 11, 2024 18:32

sarayourfriend reviewed Dec 11, 2024

View reviewed changes

krysal reviewed Dec 17, 2024

View reviewed changes

dhruvkb reviewed Dec 20, 2024

View reviewed changes

obulat force-pushed the add/proxy-ip branch from 880c8b5 to 3840d02 Compare December 24, 2024 14:27

obulat force-pushed the add/proxy-ip branch from 3840d02 to c463217 Compare January 7, 2025 11:59

obulat added 2 commits January 7, 2025 21:21

Init

d34b6c5

Update

985bfdc

obulat force-pushed the add/proxy-ip branch from c463217 to 985bfdc Compare January 8, 2025 04:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Plan: Proxying frontend API requests through the Nuxt server #5265

Implementation Plan: Proxying frontend API requests through the Nuxt server #5265

obulat commented Dec 11, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024

sarayourfriend left a comment

sarayourfriend Dec 11, 2024

obulat Dec 12, 2024

krysal Dec 17, 2024

sarayourfriend Dec 11, 2024

obulat Dec 13, 2024

sarayourfriend Dec 11, 2024

obulat Dec 13, 2024

sarayourfriend Dec 16, 2024

sarayourfriend Dec 11, 2024

sarayourfriend Dec 11, 2024

sarayourfriend Dec 11, 2024

sarayourfriend Dec 11, 2024

obulat Dec 16, 2024

sarayourfriend Dec 16, 2024

krysal left a comment

krysal Dec 17, 2024

krysal Dec 17, 2024

krysal Dec 17, 2024

krysal Dec 17, 2024

openverse-bot commented Dec 20, 2024

dhruvkb left a comment

dhruvkb Dec 20, 2024

obulat Dec 20, 2024

		Originally, I thought of using Cloudflare managed challenges to protect the Nuxt
		server routes from abuse. However, this would have required issuing a challenge

Implementation Plan: Proxying frontend API requests through the Nuxt server #5265

Are you sure you want to change the base?

Implementation Plan: Proxying frontend API requests through the Nuxt server #5265

Conversation

obulat commented Dec 11, 2024 • edited Loading

Due date:

Assigned reviewers

Description

Current round

Revision details

github-actions bot commented Dec 11, 2024

sarayourfriend left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krysal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openverse-bot commented Dec 20, 2024

Footnotes

dhruvkb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obulat commented Dec 11, 2024 •

edited

Loading