Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a cron job that fetches all Webdx feature IDs #4719

Merged
merged 5 commits into from
Jan 23, 2025
Merged

Conversation

KyleJu
Copy link
Collaborator

@KyleJu KyleJu commented Jan 21, 2025

A part of #4666. Create a cron job that fetches all Webdx feature IDs from https://api.webstatus.dev/v1/features API. The list of changes are:

  • cron.yaml scheduled the job once a day at 9, a slight higher frequency than the web-feature repo release schedule
  • gen/ is generated by openapi-webstatus in package.json file. It vendors the webstatus.dev's OpenAPI APIs and generates an API client for us to use.
  • internals/maintenance_scripts.py and internals/maintenance_scripts_test.py are the actual job and its tests
  • main.py adds the routing for the job
  • requirements.txt - openapi generator creates a pip package under gen/py/webstatus_openapi and this package needs to be added to for requirements.txt the list of packages to install

how to manage gen/py/webstatus_openapi going forward

Currently the vendored OpenAPI files from webstatus.dev is pinned to this commit. Per James' comments, npm run openapi-webstatus only needs to be run when:

  1. When there's a new version of the API that we want.
  2. When we update the generator version.

Assume that webstatus.dev APIs will have backward compatibility, only option 2 should be addressed.

what happens if OpenAPI versions between Chromestatus and webstatus.dev are out of sync

Per discussions below, ChromeStatus should pin to the commit where webstatus.dev still uses OpenAPI 3.0, and update the commit sha for openapi-webstatus in package.json

Follow-up

TODOs will be addressed in the follow-up PRs

@jcscottiii
Copy link
Collaborator

jcscottiii commented Jan 21, 2025

I see you're using the webstatus API (like we discussed)! I have a few suggestions:

Generating a Client from the OpenAPI File

Instead of writing the code to make the request yourself, you can generate a client using the webstatus's OpenAPI file.

Here's how to do it:

  1. Obtain the OpenAPI file: You can either vendor the OpenAPI file from the webstatus repository or use a GitHub URL that points to a commit.
  2. Run the openapi-generator tool: Modify package.json like so (in this case, we use the GitHub URL):
"openapi": "npm run openapi-backend && npm run openapi-frontend",
+"openapi-webstatus": ". cs-env/bin/activate; rm -rf gen/py/webstatus_openapi && openapi-generator-cli generate  --reserved-words-mappings field=field -i https://raw.githubusercontent.com/GoogleChrome/webstatus.dev/31c79b507f4de108d08ce60c63f9f3771d459732/openapi/backend/openapi.yaml -g python  -o gen/py/webstatus_openapi --additional-properties=packageName=webstatus_openapi && pip install -r requirements.txt",
  1. Import and use the client: Once you've generated the client, you can import it and use it to make requests to the webstatus API.

Here's an example of how your code could change:

from webstatus_openapi import ApiClient, DefaultApi, Configuration, ApiException, Feature

# ...

# You can get the typing for Feature
all_data_list: list[Feature] = []
client = DefaultApi(ApiClient(Configuration(host=WEBSTATUS_BASE_URL)))
page_token: str | None = None
while is_first or page_token:
    try:
        resp = client.v1_features_get(page_size=100, page_token=page_token)
        all_data_list.extend(resp.data)
        page_token = resp.metadata.next_page_token
        is_first = False
    except ApiException as e:
        logging.error('Could not fetch features %s', e) # __str__ for ApiException already provides a custom message
        return None

return [feature_data.feature_id for feature_data in all_data_list]

Pros

  • Type safety: The generated client provides type safety, which can help prevent errors and improve code readability.
  • Code documentation: The generated client includes code documentation on how to use the Client, which can be helpful for understanding how to use the webstatus API. Sample from the openapi generator repo.
  • Automatic request handling: The generated client handles all the details of making HTTP requests, such as setting headers and parsing responses.

Cons

  • File size: The generated client can be large, especially since webstatus has many endpoints. But we will have generate code anyhow when we add this new API to chromestatus's own OpenAPI file.
    • Ways to mitigate in the future: we don't actually commit the files from the gen directory. But then everyone will need to also regenerate the code. And in that case, I would vendor webstatus.dev's OpenAPI file.
  • Initial setup time: It takes some time to set up the OpenAPI file and generate the client.

@KyleJu
Copy link
Collaborator Author

KyleJu commented Jan 21, 2025

I see you're using the webstatus API (like we discussed)! I have a few suggestions:

Generating a Client from the OpenAPI File

Instead of writing the code to make the request yourself, you can generate a client using the webstatus's OpenAPI file.

Here's how to do it:

  1. Obtain the OpenAPI file: You can either vendor the OpenAPI file from the webstatus repository or use a GitHub URL that points to a commit.
  2. Run the openapi-generator tool: Modify package.json like so (in this case, we use the GitHub URL):
"openapi": "npm run openapi-backend && npm run openapi-frontend",
+"openapi-webstatus": ". cs-env/bin/activate; rm -rf gen/py/webstatus_openapi && openapi-generator-cli generate  --reserved-words-mappings field=field -i https://raw.githubusercontent.com/GoogleChrome/webstatus.dev/31c79b507f4de108d08ce60c63f9f3771d459732/openapi/backend/openapi.yaml -g python  -o gen/py/webstatus_openapi --additional-properties=packageName=webstatus_openapi && pip install -r requirements.txt",
  1. Import and use the client: Once you've generated the client, you can import it and use it to make requests to the webstatus API.

Here's an example of how your code could change:

from webstatus_openapi import ApiClient, DefaultApi, Configuration, ApiException, Feature

# ...

# You can get the typing for Feature
all_data_list: list[Feature] = []
client = DefaultApi(ApiClient(Configuration(host=WEBSTATUS_BASE_URL)))
page_token: str | None = None
while is_first or page_token:
    try:
        resp = client.v1_features_get(page_size=100, page_token=page_token)
        all_data_list.extend(resp.data)
        page_token = resp.metadata.next_page_token
        is_first = False
    except ApiException as e:
        logging.error('Could not fetch features %s', e) # __str__ for ApiException already provides a custom message
        return None

return [feature_data.feature_id for feature_data in all_data_list]

Pros

  • Type safety: The generated client provides type safety, which can help prevent errors and improve code readability.
  • Code documentation: The generated client includes code documentation on how to use the Client, which can be helpful for understanding how to use the webstatus API. Sample from the openapi generator repo.
  • Automatic request handling: The generated client handles all the details of making HTTP requests, such as setting headers and parsing responses.

Cons

  • File size: The generated client can be large, especially since webstatus has many endpoints. But we will have generate code anyhow when we add this new API to chromestatus's own OpenAPI file.

    • Ways to mitigate in the future: we don't actually commit the files from the gen directory. But then everyone will need to also regenerate the code. And in that case, I would vendor webstatus.dev's OpenAPI file.
  • Initial setup time: It takes some time to set up the OpenAPI file and generate the client.

Thanks for the comment. I gave it a try locally and pretty cool! After I chatted with Jason, I do have questions and a few thoughts about the pros and cons:

Questions

  • We have to add npm run openapi-webstatus to the build commands right?
  • Could we only generate clients for v1_features_get only instead of all APIs

Thoughts
Another con with this approach is that it adds complexities to managing generated files from OpenAPI. Since it is vendored from a different repository, it means that these files need to be updated whenever there are changes to the openapi.yaml in webstatus.dev. Most of the time, I suspect the changes will be unrelated to the v1_features_get. Also would the discrepancy of OpenAPI versions between Chromestatus and webstatus affect these file generations?

This approach will be more beneficial if two apps are more tightly integrated. The only field we need so far is the feature_id field, and the actual fetch will live in the cron job (I need to make changes in this PR). I am inclined to add a TODO for now and explore this approach later again.

My experience with OpenAPI is limited so let me know what you think!

@KyleJu KyleJu requested review from Varryme and removed request for Varryme January 21, 2025 23:44
@jcscottiii
Copy link
Collaborator

jcscottiii commented Jan 22, 2025

We have to add npm run openapi-webstatus to the build commands right?

Since we commit the files into the gen folder, we do not have to add it to the build commands. The only time we would need to run npm run openapi-webstatus is:

  1. When there's a new version of the API that we want.
  2. When we update the generator version.

Could we only generate clients for v1_features_get only instead of all APIs

We can do this as well as minimize the number of extra files generated too. Here are a few steps that need to happen:

Step 1: Create a file at gen/py/webstatus_openapi/.openapi-generator-ignore

The contents for this will be used to filter out extra supportingFiles like .travis.yml that are generated:

.gitignore
.travis.yml
.gitlab-ci.yml
git_push.sh
test/
setup.cfg
test-requirements.txt
tox.ini
.github/

Step 2: Modify the openapi-webstatus npm command

Long term, you can use this:

    "openapi-webstatus": ". cs-env/bin/activate; find gen/py/webstatus_openapi -not -path \"gen/py/webstatus_openapi/.openapi-generator-ignore\" -not -path \"gen/py/webstatus_openapi\" -delete && openapi-generator-cli generate  --reserved-words-mappings field=field -i https://raw.githubusercontent.com/GoogleChrome/webstatus.dev/refs/heads/main/openapi/backend/openapi.yaml -g python  -o gen/py/webstatus_openapi --additional-properties=packageName=webstatus_openapi --global-property supportingFiles,apis,models,apiDocs=false,modelTests=false,modelDocs=false,models=\"FeaturePage:PageMetadataWithTotal:BasicErrorModel:Feature:FeatureSpecInfo:BrowserImplementation:BaselineInfo:BrowserUsage:FeatureWPTSnapshots:ChromiumUsageInfo:SpecLink:WPTFeatureData\" --openapi-normalizer \"FILTER=operationId:listFeatures\"  && pip install -r requirements.txt",

This will:

  • Remove all the files except the ignore file inside the gen/py/webstatus_openapi folder.
  • Use the generator's global properties to not output as many files and only output the relevant models needed for the /v1/features API.
  • Use openapi-normalizer to filter only by the API we want.

HOWEVER, the API filtering only works by operationId. Currently, the webstatus.dev OpenAPI document does not have an operationId for /v1/features. Until this PR is merged, you will need to pull from the PR's branch. This can be done by changing the branch in the GitHub URL:

-    "openapi-webstatus": ". cs-env/bin/activate; find gen/py/webstatus_openapi -not -path \"gen/py/webstatus_openapi/.openapi-generator-ignore\" -not -path \"gen/py/webstatus_openapi\" -delete && openapi-generator-cli generate  --reserved-words-mappings field=field -i https://raw.githubusercontent.com/GoogleChrome/webstatus.dev/refs/heads/main/openapi/backend/openapi.yaml -g python  -o gen/py/webstatus_openapi --additional-properties=packageName=webstatus_openapi --global-property supportingFiles,apis,models,apiDocs=false,modelTests=false,modelDocs=false,models=\"FeaturePage:PageMetadataWithTotal:BasicErrorModel:Feature:FeatureSpecInfo:BrowserImplementation:BaselineInfo:BrowserUsage:FeatureWPTSnapshots:ChromiumUsageInfo:SpecLink:WPTFeatureData\" --openapi-normalizer \"FILTER=operationId:listFeatures\"  && pip install -r requirements.txt",
+    "openapi-webstatus": ". cs-env/bin/activate; find gen/py/webstatus_openapi -not -path \"gen/py/webstatus_openapi/.openapi-generator-ignore\" -not -path \"gen/py/webstatus_openapi\" -delete && openapi-generator-cli generate  --reserved-words-mappings field=field -i https://raw.githubusercontent.com/GoogleChrome/webstatus.dev/refs/heads/add-missing-operation-ids/openapi/backend/openapi.yaml -g python  -o gen/py/webstatus_openapi --additional-properties=packageName=webstatus_openapi --global-property supportingFiles,apis,models,apiDocs=false,modelTests=false,modelDocs=false,models=\"FeaturePage:PageMetadataWithTotal:BasicErrorModel:Feature:FeatureSpecInfo:BrowserImplementation:BaselineInfo:BrowserUsage:FeatureWPTSnapshots:ChromiumUsageInfo:SpecLink:WPTFeatureData\" --openapi-normalizer \"FILTER=operationId:listFeatures\"  && pip install -r requirements.txt",

This should help mitigate this factor: "Most of the time, I suspect the changes will be unrelated to the v1_features_get"


Also would the discrepancy of OpenAPI versions between Chromestatus and webstatus affect these file generations?

This is the factor that could cause problems in the future. Right now, it does not matter. Both projects are on the same minor version of OpenAPI, and the generation works. But if webstatus.dev moves to 3.1, this could be a problem. However, because 1) most of the changes to the OpenAPI document won't impact /v1/features and 2) our API should be backwards compatible, ChromeStatus could also pin to the commit where webstatus.dev still uses OpenAPI 3.0.x


I am inclined to add a TODO for now and explore this approach later again.

If you decide not to generate the client and continue using the requests library, it's crucial to include request and response examples in your unit tests that assert the behavior defined in the OpenAPI document. In particular, you should test for the following scenarios:

  1. When there's a next_page_token available.
  2. When there is no next_page_token available (i.e., it's the last page).

This will help ensure that your code correctly handles various scenarios and edge cases. For example, the next_page_token is optional. So you would need to make this type of change:

-      page_token = data['metadata']['next_page_token']
+     page_token = data['metadata'].get('next_page_token', None)

(whereas the generated client will handle that as next_page_token's type is Optional[str].)

@KyleJu KyleJu force-pushed the pipeline-webdx branch 2 times, most recently from dc3e7e2 to 5e23492 Compare January 23, 2025 05:28
@KyleJu KyleJu marked this pull request as ready for review January 23, 2025 05:28
@KyleJu KyleJu changed the title [Draft] WebDX feature IDs API Create a cron job that fetches all Webdx feature IDs Jan 23, 2025
@KyleJu KyleJu requested review from jcscottiii and jrobbins January 23, 2025 05:29
@KyleJu
Copy link
Collaborator Author

KyleJu commented Jan 23, 2025

Failing mypy:

Run npm run mypy

> mypy
> . cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code "annotation-unchecked" .

gen/py/webstatus_openapi/webstatus_openapi/__init__.py: error: Duplicate module named "webstatus_openapi" (also at "./gen/py/webstatus_openapi/build/lib/webstatus_openapi/__init__.py")
gen/py/webstatus_openapi/webstatus_openapi/__init__.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
gen/py/webstatus_openapi/webstatus_openapi/__init__.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)

@jcscottiii @jrobbins Feel free to go ahead with the review as both tests and build work locally. I will look into this meanwhile

@jcscottiii
Copy link
Collaborator

Failing mypy:

Run npm run mypy

> mypy
> . cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code "annotation-unchecked" .

gen/py/webstatus_openapi/webstatus_openapi/__init__.py: error: Duplicate module named "webstatus_openapi" (also at "./gen/py/webstatus_openapi/build/lib/webstatus_openapi/__init__.py")
gen/py/webstatus_openapi/webstatus_openapi/__init__.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
gen/py/webstatus_openapi/webstatus_openapi/__init__.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)

@jcscottiii @jrobbins Feel free to go ahead with the review as both tests and build work locally. I will look into this meanwhile

For the mypy situation, you will need to add some exclusions to the mypy command (like we do for the chromestatus_openapi):

+    "mypy": ". cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/webstatus_openapi/build/ --exclude gen/py/webstatus_openapi/setup.py --exclude gen/py/webstatus_openapi/test/  --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code \"annotation-unchecked\" .",
-    "mypy": ". cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code \"annotation-unchecked\" .",

Note: Currently, you're generating code for all of the webstatus APIs. If you want, you can follow the steps in this comment to limit the code and
supporting files generated to what is absolutely necessary (and remove the generated documentation and test files).

  • We could use that same tactic for the main chromestatus_openapi package too if we want (if a future PR)

return 'Running FetchWebdxFeatureId() job failed.'

# TODO(kyleju): save it to datastore.
feature_ids_list = [feature_data.feature_id for feature_data in all_data_list]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: While webstatus.dev supports sorting by name, it does not support sorting by id. Filed GoogleChrome/webstatus.dev#1087

Copy link
Collaborator

@jcscottiii jcscottiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one suggestion about moving the API URL to the settings.py file.

Also, feel free to take a look at this comment to 1) reduce the number of files generated and 2) fix the mypy issue.

@KyleJu
Copy link
Collaborator Author

KyleJu commented Jan 23, 2025

Failing mypy:

Run npm run mypy

> mypy
> . cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code "annotation-unchecked" .

gen/py/webstatus_openapi/webstatus_openapi/__init__.py: error: Duplicate module named "webstatus_openapi" (also at "./gen/py/webstatus_openapi/build/lib/webstatus_openapi/__init__.py")
gen/py/webstatus_openapi/webstatus_openapi/__init__.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
gen/py/webstatus_openapi/webstatus_openapi/__init__.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)

@jcscottiii @jrobbins Feel free to go ahead with the review as both tests and build work locally. I will look into this meanwhile

For the mypy situation, you will need to add some exclusions to the mypy command (like we do for the chromestatus_openapi):

+    "mypy": ". cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/webstatus_openapi/build/ --exclude gen/py/webstatus_openapi/setup.py --exclude gen/py/webstatus_openapi/test/  --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code \"annotation-unchecked\" .",
-    "mypy": ". cs-env/bin/activate; mypy --ignore-missing-imports --exclude cs-env/ --exclude appengine_config.py --exclude gen/py/chromestatus_openapi/build/ --exclude gen/py/chromestatus_openapi/chromestatus_openapi/test --exclude appengine_config.py --no-namespace-packages --disable-error-code \"annotation-unchecked\" .",

Note: Currently, you're generating code for all of the webstatus APIs. If you want, you can follow the steps in this comment to limit the code and supporting files generated to what is absolutely necessary (and remove the generated documentation and test files).

  • We could use that same tactic for the main chromestatus_openapi package too if we want (if a future PR)

Let me experiment with smaller file generation in a follow-up PR!

@KyleJu KyleJu merged commit 8197ba9 into main Jan 23, 2025
7 checks passed
@KyleJu KyleJu deleted the pipeline-webdx branch January 23, 2025 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants