Quantization? #146

ndrean · 2024-09-06T09:19:14Z

Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?

Extract from the blog:

get_quantized_phi = fn ->
  {:ok, %{params: model_state, model: model} = model_info} =
    Bumblebee.load_model({:hf, "microsoft/Phi-3-mini-4k-instruct"})

  IO.inspect(model_state, label: "Unquantized")
  {quantized_model, quantized_model_state} = Axon.Quantization.quantize(model, model_state)
  IO.inspect(quantized_model_state, label: "Quantized")
  %{model_info | model: quantized_model, params: quantized_model_state}
end

quantized_model = get_quantized_phi.()

:ok

LuchoTurtle · 2024-09-06T09:31:45Z

This seems like a really great feature, it seems really useful, especially when deploying to fly.io 👀

ndrean · 2024-09-06T09:41:13Z

https://elixirforum.com/t/how-to-save-quantitized-model/65910

ndrean · 2024-09-06T18:11:35Z

So Sean Moriarity responded "be patient":

What I am saying is you cannot currently save the quantized model and then reload it. The current way of quantizing a model in Axon uses a custom Axon struct that would take some extra work to serialize to safetensors and then some custom code to deserialize from a safetensors file. This will be better when we have full quantization support in Nx, because then we can support loading of generic quantized types from safetensors and other pretrained HF models

Note that the "right" channel for these questions is the EEF Slack (Erlang Ecosystem Foundation) and not Elixirforum nor the Elixir Slack. You can't invent .

ndrean · 2024-09-07T15:34:35Z

Safetensors ? 🤔 Huggingface is my friend:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization? #146

Quantization? #146

ndrean commented Sep 6, 2024 •

edited

Loading

LuchoTurtle commented Sep 6, 2024

ndrean commented Sep 6, 2024 •

edited

Loading

ndrean commented Sep 6, 2024 •

edited

Loading

ndrean commented Sep 7, 2024 •

edited

Loading

Quantization? #146

Quantization? #146

Comments

ndrean commented Sep 6, 2024 • edited Loading

LuchoTurtle commented Sep 6, 2024

ndrean commented Sep 6, 2024 • edited Loading

ndrean commented Sep 6, 2024 • edited Loading

ndrean commented Sep 7, 2024 • edited Loading

ndrean commented Sep 6, 2024 •

edited

Loading

ndrean commented Sep 6, 2024 •

edited

Loading

ndrean commented Sep 6, 2024 •

edited

Loading

ndrean commented Sep 7, 2024 •

edited

Loading