Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization? #146

Open
ndrean opened this issue Sep 6, 2024 · 4 comments
Open

Quantization? #146

ndrean opened this issue Sep 6, 2024 · 4 comments

Comments

@ndrean
Copy link
Collaborator

ndrean commented Sep 6, 2024

Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?

Screenshot 2024-09-06 at 11 10 05

Extract from the blog:

get_quantized_phi = fn ->
  {:ok, %{params: model_state, model: model} = model_info} =
    Bumblebee.load_model({:hf, "microsoft/Phi-3-mini-4k-instruct"})

  IO.inspect(model_state, label: "Unquantized")
  {quantized_model, quantized_model_state} = Axon.Quantization.quantize(model, model_state)
  IO.inspect(quantized_model_state, label: "Quantized")
  %{model_info | model: quantized_model, params: quantized_model_state}
end

quantized_model = get_quantized_phi.()

:ok
@LuchoTurtle
Copy link
Member

This seems like a really great feature, it seems really useful, especially when deploying to fly.io 👀

@ndrean
Copy link
Collaborator Author

ndrean commented Sep 6, 2024

https://elixirforum.com/t/how-to-save-quantitized-model/65910

@ndrean
Copy link
Collaborator Author

ndrean commented Sep 6, 2024

So Sean Moriarity responded "be patient":

What I am saying is you cannot currently save the quantized model and then reload it. The current way of quantizing a model in Axon uses a custom Axon struct that would take some extra work to serialize to safetensors and then some custom code to deserialize from a safetensors file. This will be better when we have full quantization support in Nx, because then we can support loading of generic quantized types from safetensors and other pretrained HF models

Note that the "right" channel for these questions is the EEF Slack (Erlang Ecosystem Foundation) and not Elixirforum nor the Elixir Slack. You can't invent .

@ndrean
Copy link
Collaborator Author

ndrean commented Sep 7, 2024

Safetensors ? 🤔 Huggingface is my friend:

Screenshot 2024-09-07 at 17 32 01
Screenshot 2024-09-07 at 17 33 14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants