This repo contains a Jupyter notebook to calculate the number of tokens in text, files, and folders using tokenizers from Hugging Face and OpenAI.
uv sync
Select the model to use for tokenization in the Jupyter notebook. You can choose either a model from the Hugging Face model hub or OpenAI. Set the model's name in the model_name
variable.
- For Hugging Face models, use the
user/model name
from the Hugging Face model hub, eg.mixedbread-ai/mxbai-embed-large-v1
- For OpenAI models, use the model name from the OpenAI API, eg.
gpt-4o
. Available models.
- Set the
text
variable to your text. - Run all cells.
- Set the
file_path
variable to the path of your file. - Run all cells.
- Set the
folder_path
variable to the path of your folder. - Optionally, specify a filter for which files to include.
- Run all cells.