Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of missing vocab_file attribute in HFTokenizerConverter #677

Merged
merged 2 commits into from
Mar 26, 2024

Conversation

kazssym
Copy link
Contributor

@kazssym kazssym commented Mar 24, 2024

This commit updates HFTokenizerConverter to handle cases where the hf_tokenizer object might not have a vocab_file attribute.

Changes:

  • Uses getattr to retrieve the vocab_file attribute for flexibility
  • Stores the retrieved value in a separate variable vocab_file for clarity
  • Checks if vocab_file is None before checking its existence

This ensures the converter works correctly even with tokenizers that don't define a vocab_file attribute.

…erter

This commit updates `HFTokenizerConverter` to handle cases where the `hf_tokenizer` object might not have a `vocab_file` attribute.

Changes:

* Uses `getattr` to retrieve the `vocab_file` attribute for flexibility
* Stores the retrieved value in a separate variable `vocab_file` for clarity
* Checks if `vocab_file` is `None` before checking its existence

This ensures the converter works correctly even with tokenizers that don't define a `vocab_file` attribute.
@kazssym kazssym requested a review from a team as a code owner March 24, 2024 08:43
@kazssym
Copy link
Contributor Author

kazssym commented Mar 24, 2024

I'm not sure but it looks like GPT2Tokenizer/-Fast lacks the vocab_file attribute.

Copy link
Member

@wenbingl wenbingl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixing!

@sayanshaw24
Copy link
Contributor

/azp run onnxruntime-extensions.CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wenbingl wenbingl merged commit 31f129c into microsoft:main Mar 26, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants