Standard function-call data spec for Emacs LLM clients #124

karthink · 2024-12-17T20:24:22Z

karthink
Dec 17, 2024

I'm adding tool-use to gptel and wanted to coordinate with you on the tool definition format. I think it would be good to have a community-maintained bank of commonly useful tool calls that can plug in easily into all Emacs LLM clients. gptel uses a different internal data structure to manage tools from llm, so what do you think of defining tools as loosely-structured plists that we can both use?

I can explain why. Here's an example tool definition that can be read by both llm and gptel:

(:function #'get-weather
 :name "get_weather"
 :description "Get the current weather in a given location"
 :args ((:name "location"
         :type "string"
         :description "The city and state, e.g. San Francisco, CA"
         :required t)
        (:name "unit"
         :type "string"
         :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
         :required nil)))

The repo would contain this piece of data along with an implementation of get-weather. This example is useless, but you can imagine commonly useful tools, like ones that fetch web video or google scholar results, or results from info manuals.

Here's how llm could import this:

(let* ((args (plist-get weather-tool :args)))
  ;; First convert args to `llm-function-arg's
  (plist-put weather-tool :args
             (mapcar (lambda (arg)
                       (apply #'make-llm-function-arg arg))
                     args))
  ;; Convert weather-tool to an `llm-function-call'
  (apply #'make-llm-function-call weather-tool))

gptel can do something similar to convert the data into its internal tool structure.

If you are interested in this idea, we can decide on a plist format. I have two points of feedback on the current implementation of tool definitions in llm, one minor and one major:

(Minor, aesthetic) I think the :required key can be inverted to :optional, with a default value of nil. This way defining an argument works like in emacs-lisp, and :required does not need to be specified, since the shorter declaration:

(make-llm-function-arg
 :name "location"
 :type "string"
 :description "The city and state, e.g. San Francisco, CA")

will imply that it's a required argument, and

(make-llm-function-arg
 :name "unit"
 :type "string"
 :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
 :optional t)

explicitly specifies that it's optional, like &optional in an elisp function. I would expect optional arguments to be rarer across tool definitions than required ones.

(Major) I think specifying the fields that args can have is tricky, and it doesn't make sense to restrict them. In the above example, the arg definition in the OpenAI and Anthropic documentation actually involves an :enum field:

:args
((:name "location"
  :type "string"
  :description "The city and state, e.g. San Francisco, CA"
  :required t)
 (:name "unit"
  :type "string"
  :enum ["celsius" "farenheit"]  ;<---- THIS RIGHT HERE
  :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
  :required nil))

The :enum field is currently not allowed by make-llm-function-arg. I don't know what fields the full JSON schema allows here, but I'm guessing restricting them in make-llm-function-arg might cause issues. In gptel I'm currently just using a plist for the function arg spec.

ahyatt · 2024-12-18T04:35:12Z

ahyatt
Dec 18, 2024
Maintainer

Thanks, it's interesting that you posted this at this time. I've been thinking of moving away from my current implementation for function calling, and making an API-breaking change. First of all, I wrote it when it was still called "function calling", but now everyone calls it "tool use", so my naming is confusing. I also think the way I did it requires too much struct-building. Your note about :required is also a good observation. Enums, btw, we can do with :type (enum "foo" "bar").

I've recently added json mode, which also uses JSON schema, but with some differences in how it's used than tool calling, and it ends up looking a bit more like what you have proposed. An example, taken from my integration tests is:

(llm-chat
                   provider
                   (llm-make-chat-prompt
                    "List the 3 largest cities in France in order of population, giving the results in JSON."
                    :response-format
                    '(:type object
                            :properties
                            (:cities (:type array :items (:type string)))
                            :required (cities))))

This isn't released yet, but I think it's a lighter-weight and easier to read way to specify schema. I could change it to some other format before release, so now's a good time to change it to something that would work well for both it and using the same structure for tool use.

One question is how much you want to support the JSON schema - for example, can arguments be more than just strings, integers, etc, and be objects that have their own structure?

0 replies

karthink · 2024-12-18T04:55:55Z

karthink
Dec 18, 2024
Author

I've been thinking of moving away from my current implementation for function calling, and making an API-breaking change. First of all, I wrote it when it was still called "function calling", but now everyone calls it "tool use", so my naming is confusing.

Are you planning to remove the tool-use feature entirely and implement it via :response-format instead? If so, I'm not sure how that would work. I'm assuming you mentioned :response-format primarily to highlight the new schema you're considering.

I also think the way I did it requires too much struct-building. Your note about :required is also a good observation.

I think the top level struct for a tool/function-call spec makes sense, that API is stable. It was only the component struct (like args) that I thought might be too constraining.

Enums, btw, we can do with :type (enum "foo" "bar").

Not quite, because the examples in the OpenAI/Anthropic API are using both :type and :enum as top-level keys:

(:name "unit"
  :type "string"
  :enum ["celsius" "farenheit"]
  :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
  :required nil)

I've recently added json mode, which also uses JSON schema, but with some differences in how it's used than tool calling, and it ends up looking a bit more like what you have proposed. An example, taken from my integration tests is:

(llm-chat
 provider
 (llm-make-chat-prompt
  "List the 3 largest cities in France in order of population, giving the results in JSON."
  :response-format
  '(:type object
    :properties
    (:cities (:type array :items (:type string)))
    :required (cities))))

This isn't released yet, but I think it's a lighter-weight and easier to read way to specify schema. I could change it to some other format before release, so now's a good time to change it to something that would work well for both it and using the same structure for tool use.

This method looks good for arbitrary JSON. Tool-use requires a more constrained schema, so a struct actually makes sense there.

One question is how much you want to support the JSON schema - for example, can arguments be more than just strings, integers, etc, and be objects that have their own structure?

How would you communicate composite types to the API?

1 reply

ahyatt Dec 18, 2024
Maintainer

I'm not planning to remove tool use completely, just rename it and change how it is specified, to be more similar to what I've done for the json-mode.

BTW, the enum thing does work, although you are right in that it is constrained to strings:

(llm-chat ash/llm-gemini
          (llm-make-chat-prompt
           "What is weather in Prague in Celsius?"
           :functions
           (list (make-llm-function-call
                  :function (lambda (location unit) (format "Call to lookup weather for %s in %s" location unit))
                  :name "weather_lookup"
                  :description "Return a description of the weather in a given location."
                  :args (list (make-llm-function-arg
                               :name "location"
                               :description "The location to look up the weather for."
                               :type 'string
                               :required t)
                              (make-llm-function-arg
                               :name "unit"
                               :description "The unit to return the temperature in."
                               :type '(enum "Celsius" "Fahrenheit")
                               :required t))))))

gives

(("weather_lookup" . "Call to lookup weather for Prague in Celsius"))

This was a bit of a simplification on my part, but personally non-string enums don't make a lot of sense to me.

About communicating custom types, the JSON schema permits nesting objects, so you have the normal top-level object containing the args, but one of those args can be an object as well, with its own typed fields. I haven't implemented that, and I'm not sure how well the LLMs handle this, but it is possible to express in the tool use APIs, which use the JSON schema spec.

karthink · 2024-12-18T06:05:27Z

karthink
Dec 18, 2024
Author

I'm not planning to remove tool use completely, just rename it and change how it is specified, to be more similar to what I've done for the json-mode.

Could you let me know when you decide on a schema for specifying tools? I can try to stay close to it so we can share tools between LLM clients in the future.

This was a bit of a simplification on my part, but personally non-string enums don't make a lot of sense to me.

I agree.

1 reply

ahyatt Dec 18, 2024
Maintainer

SG, let me work on this weekend. I'm not sure what your timeframe is, but I'll work on something and share here before I merge anything in, so we can make sure we agree on it.

karthink · 2024-12-19T18:13:22Z

karthink
Dec 19, 2024
Author

@ahyatt, how do you handle async tool-use? I've settled on a rather clumsy API and was wondering if you have a better solution:

A synchronous tool-use function is defined as:

(make-tool
 :function #'foo
 :description "Return ..."
 :args '((:name "arg1"
          :description "..."
          :type "string")))

and an async one as

(make-tool
 :function #'bar
 :description "Return ..."
 :args '((:name "arg1"
          :description "..."
          :type "string"))
 :async t)

The synchronous tool is called as (foo arg1), while the asynchronous tool is called as (bar cb arg1). When the result is ready, bar must run the provided callback: (funcall cb result).

On a related note, how do you handle the difference between tools whose return value should be fed back to the LLM, and tools run for side-effects or not run at all because the LLM's tool call JSON is all that was needed?

3 replies

ahyatt Dec 20, 2024
Maintainer

In the llm library we use a paradigm where async calls get a success callback. When it's a normal text LLM call, that callback gets text. If it is a function call, it gets an alist of the function name and the result of the function call. In other words, we separate out the callback part with the function calling part, basically running the function then calling the callback. It looks like you are combining them together. Both could work, but I think in your case if the client wanted a callback when #'bar is called, they could just do that themselves, so do you even need the callback?

karthink Dec 20, 2024
Author

In the llm library we use a paradigm where async calls get a success callback. When it's a normal text LLM call, that callback gets text. If it is a function call, it gets an alist of the function name and the result of the function call.

How do you handle the case where the API produces both text and a function call in a single response? (Anthropic models do this.)

In other words, we separate out the callback part with the function calling part, basically running the function then calling the callback.

That's pretty clean.

It looks like you are combining them together. Both could work, but I think in your case if the client wanted a callback when #'bar is called, they could just do that themselves, so do you even need the callback?

It's not the client that needs the callback to #'bar, it's gptel. Consider the get_weather tool example above. Suppose this is an asynchronous web request. For it to be useful, it has to take an additional callback argument:

(defun get-weather (callback location &optional unit)
  (url-retrieve "https://api.weather-service.com/..."
                (lambda (_)
                  (let ((weather (parse buffer)))
                    (funcall callback weather)))))

So gptel/llm will have to provide callback here, right? For that gptel/llm needs to know if get_weather is a synchronous or asynchronous tool. Should it be called as (get-weather location unit), or as (get-weather callback location unit)?

IIUC, in the llm API the "transaction" is complete when you call the success callback. How do you feed the tool result back to the LLM and continue the exchange (if required)? I'm guessing the client needs to make another request?

In gptel this is flipped -- the result is fed back to the LLM unless it's indicated that this is not required.

ahyatt Dec 20, 2024
Maintainer

Thanks for the explanation; yes - your method is actually better for the cases in which the function that is getting called needs to itself by async. llm doesn't have a good way to handle that yet. We just wait until the function is finished, then update the prompt so that the next time the user calls (they have to re-use the same prompt struct) it has the right information.

As to how we handle both text & functions for Claude, we append the text to the prompt, after the function call results, so that the information is there for when the client calls Claude again.

But I may be forgetting some detail here, the whole dance you have to do with tool use in conversations is complicated, under-documented, and has several non-standard in different providers. I think automatically feeding it back like you will be doing would be reasonable, but I think most of the time it isn't needed. If I rethink the tool use interface this weekend, I'll consider adopting your way.

karthink · 2024-12-21T02:21:33Z

karthink
Dec 21, 2024
Author

Thanks for the explanation; yes - your method is actually better for the cases in which the function that is getting called needs to itself by async. `llm` doesn't have a good way to handle that yet. We just wait until the function is finished, then update the prompt so that the next time the user calls (they have to re-use the same prompt struct) it has the right information.

I decided at the start that gptel will never block Emacs, and have since paid dearly with my time to uphold that principle!

As to how we handle both text & functions for Claude, we append the text to the prompt, after the function call results, so that the information is there for when the client calls Claude again.

Cool. In gptel I'm calling the callback twice, once with the text and again with the tool result.

But I may be forgetting some detail here, the whole dance you have to do with tool use in conversations is complicated, under-documented, and has several non-standard in different providers.

I finished implementing tool use for all the major APIs with and without streaming responses, and it was a big ol' mess. A lot of the demo code online, and even the official API documentation in the case of Gemini, is just flat out wrong. All the idiosyncracies are fresh in my mind at the moment, so let me know if you need help with the details.

I think automatically feeding it back like you will be doing would be reasonable, but I think most of the time it isn't needed. If I rethink the tool use interface this weekend, I'll consider adopting your way.

All right.

1 reply

ahyatt Dec 21, 2024
Maintainer

BTW, about Claude's text & function call dual result, I was a bit surprised to learn of a new wrinkle that just showed up: in Gemini 2.0 Flash Thinking model, it also returns two results at the same time. One is the "thinking" part, and the other is the "result" part. It makes me think that it might be good to return a richer thing from llm calls than just "the result". In at least two cases now, there's "the result" and some sort of extra information that might be useful.

karthink · 2024-12-23T17:04:41Z

karthink
Dec 23, 2024
Author

@ahyatt I saw that json output has been added to llm. Did you settle on a schema for declaring tools in the process? (Tool use is now working in gptel, but I'm some distance from merging it into master as it needs a lot of testing.)

1 reply

ahyatt Dec 23, 2024
Maintainer

I'm in the middle of a change - it will combine what you proposed almost exactly (changing my function call args to how you are defining args in your first post here) with the same style as I did for json output for arguments that are themselves objects. That later part might have to be tweaked a bit, we'll see as I develop it further. I'm also renaming function calling to tool use across the board.

I'm not sure how I'm going to roll this out. This will be a API-breaking changing as-is, and I need to figure out if that's OK or whether I should make it backward compatible. I'm not aware of anyone actually using the function calling yet, though, and I've long included a note in the README that this is liable to change.

Once I have it in a branch, I'll update this thread, and possibly alert any authors I think might be affected.

ahyatt · 2024-12-24T21:55:46Z

ahyatt
Dec 24, 2024
Maintainer

@karthink one question I'm curious about - how are you dealing with multiple ("parallel" as some call it) tool use calls? Callbacks seem like a good solution, but it gets pretty messy when you have more than one callback that must return before adding the info to the prompt and sending it back to the LLM. FWIW, I think not dealing with it would be a reasonable choice.

1 reply

karthink Dec 24, 2024
Author

Parallel async tool use is supported, the handling is here.

Basically I use a counter to track how many tools have reported in. This is not very robust -- at the very least I need to guard against errors when calling the tools, but it works otherwise. Thankfully the order in which the tool use results are reported in the messages array doesn't matter.

karthink · 2024-12-30T20:17:52Z

karthink
Dec 30, 2024
Author

I'm not sure how I'm going to roll this out. This will be a API-breaking changing as-is, and I need to figure out if that's OK or whether I should make it backward compatible. I'm not aware of anyone actually using the function calling yet, though, and I've long included a note in the README that this is liable to change.

@ahyatt I'm getting close to merging the tool-use branch of gptel into master, so I wanted to check in -- have you made more progress on the tool specification format? As mentioned at the start, I'm hoping we can stay close to a universal tool spec format for Emacs LLM clients and share a single repository of tools. The schema currently used by gptel is the same as in my examples above: - A tool type is a cl-defstruct with the slots :name, :function, :description and :args, along with some optional keys that I need to integrate tools into gptel's UI. - Each argument in :args is a plist of arguments with the (expected) keys :name, :description, :type and optionally :enum. (This is not final.)

3 replies

ahyatt Dec 31, 2024
Maintainer

Thanks for the update on progress. I've got my branch working for Open AI, I just need to make it work for everything else now. Well, almost working - a few things still need ironing out. One thing I have yet to do, and wanted to coordinate is how you were going to represent array types. I'd do this via another optional :items, that would be required if :type was array.

karthink Dec 31, 2024
Author

Can you give me an example of specifying an array type?

ahyatt Dec 31, 2024
Maintainer

A basic example in json-schema is here, but here's an example in elisp I wrote yesterday for my own use:

(defun ash/anki-pt-vocab-cloze (word)
  "Capture a cloze deletion for WORD, in Portuguese."
  (interactive "sPalavra: ")
  (llm-chat
   ash/llm-openai
   (llm-make-chat-prompt
    word
    :context "You are a Portuguese tutor, and can help students by creating useful Anki flashcards.  The user will provide a word, and you will provide several alternative clozes with the word in it in a way that only that specific word fits.  An example cloze is 'Os {{c1::ingressos}} do show esgotaram em minutos.'"
    :tools
    (list (llm-make-tool-function
     :function (lambda (callback clozes)
                 (kill-new (completing-read "Cloze to add: " (append nil clozes) nil t))
                 (org-capture nil "P"))
     :name "create_clozes"
     :description "Function to let the user choose between clozes and add them to flashcards"
     :args '((:name "clozes" :description "An array of clozes"
              :type "array" :required t)))))))

As is, this won't work, the type of array must be specified.

"LLM request failed with code 400: Bad Request (additional information: ((error (message . Invalid schema for function 'create_clozes': In context=('properties', 'clozes'), array schema missing items.) (type . invalid_request_error) (param . tools[0].function.parameters) (code . invalid_function_parameters))))"

So it really should be :args '((:name "clozes" :description "An array of clozes" :type array :items (:type string))).

ahyatt · 2025-01-01T23:29:04Z

ahyatt
Jan 1, 2025
Maintainer

I've started a pull request in #133, and I'll announce this non-backwards-compatible change to give any clients the change to read and respond. @karthink , I see you've put your version out there, which is great. Let's make sure we're compatible on the details. Two things to watch out for:

I have array types, as mentioned above.
I'm allowing types to be object, with a spec that then conforms to the JSON spec, which is already what I'm doing for structured output. This allows function called arguments to have structure (right now, as alists, but changing this to plists would be good).
I output / send to the final callback a list of conses of tools and their results (from the elisp function call's output / callback). This isn't an alist, because one tool can be used multiple times.

2 replies

karthink Jan 1, 2025
Author

I see you've put your version out there, which is great. Let's make sure we're compatible on the details.

Yes. It's still not merged, so we can continue to discuss the schema. I'm looking for testers right now.

I have array types, as mentioned above.

I'm using the same format as you (plist). However the :type value is itself a string, i.e.

(:name "clozes" :description "An array of clozes" :type "array" :items (:type "string"))

and not :type array, :type string. This is translated to

{
  "name": "clozes",
  "description": "An array of clozes",
  "type": "array",
  "items": {
    "type": "string"
  }
}

Does that look okay to you?

I'm allowing types to be object, with a spec that then conforms to the JSON spec, which is already what I'm doing for structured output. This allows function called arguments to have structure (right now, as alists, but changing this to plists would be good).

Yes, I need to solve this problem in gptel. Right now everything is passed as a string.

Do you have an example of a call like this? (Sorry, I have trouble with abstraction and learn better from examples.)

I output / send to the final callback a list of conses of tools and their results (from the elisp function call's output / callback). This isn't an alist, because one tool can be used multiple times.

So this can be something like:

(("tool_1" . "result_1")
 ("tool_1" . "result_2")
 ("tool_1" . "result_3"))

which has the structure of an alist but not unique keys?

ahyatt Jan 2, 2025
Maintainer

I'm using the same format as you (plist). However the :type value is itself a string, i.e.
(:name "clozes" :description "An array of clozes" :type "array" :items (:type "string"))
and not :type array, :type string. This is translated to
{
  "name": "clozes",
  "description": "An array of clozes",
  "type": "array",
  "items": {
    "type": "string"
  }
}
Does that look okay to you?

Yes, this looks right for what we should be producing. For the elisp tool definition, I prefer symbols for values that are predefined, but in my implementation, both should be accepted. When emacs transforms structures into json, strings and symbols both wind up as strings.

I'm allowing types to be object, with a spec that then conforms to the JSON spec, which is already what I'm doing for structured output. This allows function called arguments to have structure (right now, as alists, but changing this to plists would be good).

Yes, I need to solve this problem in gptel. Right now everything is passed as a string.

Do you have an example of a call like this? (Sorry, I have trouble with abstraction and learn better from examples.)

Here's an example from my function included in the llm package to make a tool use definition from an elisp function. So the arguments themselves correspond to the tool definitions:

                       :args '((:name "args"
                                :type array
                                :items (:type object
                                        :properties (:name
                                                     (:type string
                                                            :description "The name of the argument")
                                                     :type
                                                     (:type string
                                                            :enum (string number integer boolean)
                                                            :description "The type of the argument.  It could be 'string', 'number', 'integer', 'boolean', or the more special forms.")
                                                     :description
                                                     (:type string
                                                            :description "The description of the argument")
                                                     :required
                                                     (:type boolean
                                                            :description "Whether the argument is required or not"))))
                               (:name "description"
                                :type string
                                :description "The documentation of the function to transform.")))))

I output / send to the final callback a list of conses of tools and their results (from the elisp function call's output / callback). This isn't an alist, because one tool can be used multiple times.

So this can be something like:
(("tool_1" . "result_1")
 ("tool_1" . "result_2")
 ("tool_1" . "result_3"))
which has the structure of an alist but not unique keys?

Yes, exactly.

karthink · 2025-01-02T01:07:09Z

karthink
Jan 2, 2025
Author

I prefer symbols for values that are predefined, but in my implementation, both should be accepted. When emacs transforms structures into json, strings and symbols both wind up as strings.

This is not the case with json-serialize:

(json-serialize
 '(:name "clozes" :description "An array of clozes" :type "array" :items (:type "string"))
 :null-object :null :false-object :json-false)

;; => "{\"name\":\"clozes\",\"description\":\"An array of clozes\",\"type\":\"array\",\"items\":{\"type\":\"string\"}}"

(json-serialize
 '(:name "clozes" :description "An array of clozes" :type array :items (:type string))
 :null-object :null :false-object :json-false)

;; => Debugger entered--Lisp error: (wrong-type-argument json-value-p array)

Are you implementing this behavior on top, converting symbols to their symbol-names first? I would prefer to be able to just use the built-in json-serialize.

Here's an example from my function included in the llm package to make a tool use definition from an elisp function.

I have questions about this example too, but I'll wait to clear up the symbol/string issue first since I can't json-serialize your example yet.

2 replies

ahyatt Jan 3, 2025
Maintainer

I see the issue: you are using json-serialize, I use json-encode.

(json-encode
 '(:name "clozes" :description "An array of clozes" :type array :items (:type string)))

"{"name":"clozes","description":"An array of clozes","type":"array","items":{"type":"string"}}"

I don't know if there's a good reason to prefer one over the other, although I like using symbols for clarity, as I mentioned before. Still, it would make sense if our inputs would actually be compatible, so I can switch. This makes some amount of sense, since I already use json-parse-string, which is the inverse of json-serialize.

karthink Jan 3, 2025
Author

Thanks for switching!

I don't know if there's a good reason to prefer one over the other

The reason I prefer it is that json-serialize is implemented in C -- provided by libjansson or the new JSON parsing library coming in Emacs 30. So it's faster and produces no intermediate garbage, the only lisp objects allocated are what's in the final JSON.

json-encode is part of json.el, and the only reason to use it unless your Emacs is older than 27.1 or if it's not compiled with libjansson support. Even this latter reason won't apply in the future now that Emacs ships with Herman Geza's native JSON parser. So there is no reason to use it anymore except as a fallback. This is what I do in gptel:

(defmacro gptel--json-encode (object)
  (if (fboundp 'json-serialize)
      `(json-serialize ,object
        :null-object :null
        :false-object :json-false)
    (require 'json)
    (defvar json-false)
    (defvar json-null)
    (declare-function json-encode "json" (object))
    `(let ((json-false :json-false)
           (json-null  :null))
      (json-encode ,object))))

I agree that json-serialize should handle symbols. But I guess we don't have that option at the moment.

karthink · 2025-01-05T04:28:02Z

karthink
Jan 5, 2025
Author

@ahyatt here is the current state of the tool defintion process in gptel, using an example:

(gptel-make-tool
 :name "some_name"
 :function #'some-func
 :description "Some description"
 :args '((:name "arg1"
          :description "arg1 description"
          :type "array"
          :items (:type "object"
                  :properties (:key1 (:type "string"
                                      :description "...")
                               :key2 (:type "string"
                                      :enum ["option1" "option2"]
                                      :description "..."))))
         (:name "arg2"
          :description "arg2 description"
          :type "string"))
 :async t
 ;; The following keys are optional
 :category "web"
 :confirm  t
 :include nil)

Many notes!

Differences in specification

All plist values are strings, not symbols. As discussed above, this is so I can use json-serialize.
I have

:type "string"
:enum ["option1" "option2"]

instead of

:type "string"
:enum ("string" "option1" "option2")

(I've converted all symbols to strings here.)

This is again because json-serialize converts the former directly into JSON the way OpenAI/Anthropic/Ollama expect it:

(json-serialize '(:type "string"
                  :enum ["option1" "option2"]))
;; => 

"{\"type\":\"string\",\"enum\":[\"option1\",\"option2\"]}"

I cannot get json-serialize to work with your specification (whether I use symbols or strings).

Extra keys

There are four extra keys:

:async: We discussed this above. Summarized here again,
- async tools are called as (apply tool-func tool-callback args), where tool-func's own callback must fun (funcall tool-callback result) with the result.
- synchronous tools are called as (apply tool-func args), and the result is used to continue the LLM exchange.

The next three keys are only useful if you're creating a UI, as gptel does, so they don't have to be part of the shared tool repository. I'll describe them here in case you think they might be useful.

:category: This is a string specifying the category the tool belongs to, it can be anything. This is useful because individual tools can be too granular. You often want to send a collection of tools at once, for example tools to read/write/append to files/directories:

"filesystem", "web" and "emacs" are the :category the corresponding tools were defined with. It helps to have a group/category you can toggle at once here.

:confirm: Whether the tool call requires confirmation from the user to run. Again, only useful in a context with an interactive UI.
:include: Whether the tool result should be included with the LLM response. Useful primarily in a multi-turn conversation, where otherwise the tool call is not captured in the buffer.

I demonstrate the effect of :confirm and :include in this demo. (I set the confirmation and include flags globally for the demo, normally it would be controlled by the tool specification):

gptel-tool-use-filesystem-confirm-demo.mp4

Let me know what you think.

3 replies

ahyatt Jan 5, 2025
Maintainer

Thanks! In my latest PR in my branch, I've also moved to strings instead of symbols, although I still need to convert the example discussed above. Vectors intead of lists for enums is fine, I'll make sure we also accept that.

I could add category to my tool struct, although I don't do anything with it. Without a user of this that's also in GNU ELPA, it seems hard to justify. I'd also prefer not to have confirm or include. confirm gets into the business of UI decisions, which I think in the context of my library, is left to the client. They can get the same functionality by having a consistent function wrapping of whatever function is being executed, that confirms and executes, or does not. This makes more sense for GTPel, which also has specific UI associated with it. include I could go either way with, but I feel like I don't see the downside of always including the response, and if the client doesn't want to deal with, it could just ignore.

karthink Jan 5, 2025
Author

Thanks! In my latest PR in my branch, I've also moved to strings instead of symbols, although I still need to convert the example discussed above. Vectors intead of lists for enums is fine, I'll make sure we also accept that.

Cool. I think the problem is that

(json-encode '(:enum (string "option1" "option2")))
;; =>
"{\"enum\":[\"string\",\"option1\",\"option2\"]}"

works, but json-serialize is stricter about its types. You can't have (string "option1" "option2") or ("string" "option1" "option2"), because only Elisp vectors are translated to JSON arrays. Elisp lists are always read as objects/dicts.

I could add category to my tool struct, although I don't do anything with it. Without a user of this that's also in GNU ELPA, it seems hard to justify. I'd also prefer not to have confirm or include. confirm gets into the business of UI decisions, which I think in the context of my library, is left to the client.

Yup, these are UI-specific and not relevant to llm, I just wanted to make you aware of them.

They can get the same functionality by having a consistent function wrapping of whatever function is being executed, that confirms and executes, or does not. This makes more sense for GTPel, which also has specific UI associated with it.

Agreed.

include I could go either way with, but I feel like I don't see the downside of always including the response, and if the client doesn't want to deal with, it could just ignore.

I think since you call the callback with the list of tool names and results in llm, the callback can decide if it should be treated as part of the LLM response. So :include is indeed unnecessary for a library.

gptel actually does the same, it calls the callback with the ((name . result) ...) alist of tool results. :include is only used by gptel's default callback, which handles chat buffer interaction.

Anyway these keys are optional so gptel should work fine with the tool spec used by llm (and hopefully other clients.)

To confirm, I take it we are agreed on the keys :name, :function, :description, :args and :async.

ahyatt Jan 6, 2025
Maintainer

Yes, the issue you mention with using vectors instead of lists has been dealt with in my code. One note, though - :args themselves are in both of our code is a list and not a vector. That seems a little inconsistent with the use of vectors everywhere else to represent lists. Should we rethink this?

And confirmed that :name, :function, :description, :args and :async are the keys.

BTW, one thing I've already noted in my README, and you may want to include in yours if you haven't already, is that I strongly suggest that all tool names use underscores instead of dashes. I find the training effect of using python & javascript is so great that the LLM will frequently just transform all dashes in tool names into underscores when returning the tool use, which messes everything up.

Back to the example I gave you earlier, it's part of a utility I have packaged that can turn elisp functions into tools. The output of that tool, from my PR with all the changes discussed, produces the following tool:

(llm-make-tool-function :function
                          'switch-to-buffer
                          :name
                          "switch_to_buffer"
                          :args
                          '((:name "buffer_or_name"
                                  :type
                                  "string"
                                  :description
                                  "The buffer or buffer name to switch to. Can be a buffer, a string (buffer name), or nil."
                                  :required
                                  t)
                           (:name "norecord"
                                  :type
                                  "boolean"
                                  :description
                                  "If true, do not put the buffer at the front of the buffer list, and do not make the window displaying it the most recently selected one."
                                  :required
                                  t)
                           (:name "force_same_window"
                                  :type
                                  "boolean"
                                  :description
                                  "If true, the buffer must be displayed in the selected window when called non-interactively; signals an error if impossible."
                                  :required
                                  t))
                          :description
                          "Display buffer BUFFER_OR_NAME in the selected window. This function switches to the specified buffer, with options to control recording and window behavior. It can create a new buffer if the specified name doesn't exist. WARNING: For temporary buffer switching in Lisp programs, use 'set_buffer' instead."
                          :async
                          nil)

karthink · 2025-01-05T19:08:16Z

karthink
Jan 5, 2025
Author

@ahyatt, another issue to consider before you merge your PR: take a look at the gymnastics required to translate a tool provided by MCP to the format used by llm/gptel. I don't understand the MCP tool specification yet, but I think this might be something to consider.

7 replies

karthink Jan 7, 2025
Author

If I'm reading this correctly, the only difference is that :args is an object/dictionary instead of an array?

lizqwerscott Jan 7, 2025

Based on the MCP servers I have tested so far, it is correct that :args is an object/dictionary rather than an array.

I also discovered that official MCP servers return the JSON schema they use, such as:

(:name "echo"
 :description "Echoes back the input"
 :inputSchema (:type "object"
                     :properties (:message ...)
                     :required ["message"]
                     :additionalProperties :json-false
                     :$schema "http://json-schema.org/draft-07/schema#"))

However, some third-party MCP servers do not even return a :description for each parameter.

Like this mcp server arxiv-mcp-server

(:type "object"
 :properties (:query (:type "string") 
                     :max_results (:type "integer")
                     :date_from (:type "string")
                     :date_to (:type "string"))
 :required ["query"])

So, we need some handling, at the very least to ensure that the basic elements are in place.

ahyatt Jan 7, 2025
Maintainer

It's more than it's a dict instead of an array, it also has required as a single property on the object itself, rather than the component attributes. There is no name for the args, instead the arg names are keys on the object.

Another bad thing about specifying the tool in this way is that it gives the false impression that you could have whatever type you wanted, not just object. But some or maybe all tool calling providers want an object and nothing else for that top level.

karthink Jan 10, 2025
Author

@ahyatt I noticed the other two changes, but I didn't understand the change to the required property. Is the point that required is not specified at the "top level", i.e. on the same level as :name and :description?

Independent of that, do you think we should switch the value of :args to

an array of plists, where each plist is of the form (:name ... :description ... :type ...) etc,
the MCP format where it is a plist that is translated to a JSON object by json-serialize,
or retain the current list-of-plists format?

I'm okay with either of the first two if you think they're better than the current format. That said, I have a slight preference for the current implementation since the ergonomics of specifying a list of plists is slightly better than the others.

ahyatt Jan 11, 2025
Maintainer

@ahyatt I noticed the other two changes, but I didn't understand the change to the required property. Is the point that required is not specified at the "top level", i.e. on the same level as :name and :description?

So, in what we have implemented, required is per-argument, which I'd call at the lower-level, the level of the arguments. The way MCP and JSON specification does generally is have it be what I'd call the higher-level, the object container of all the arguments.

Independent of that, do you think we should switch the value of :args to

an array of plists, where each plist is of the form (:name ... :description ... :type ...) etc,

the MCP format where it is a plist that is translated to a JSON object by json-serialize,

or retain the current list-of-plists format?

I'm okay with either of the first two if you think they're better than the current format. That said, I have a slight preference for the current implementation since the ergonomics of specifying a list of plists is slightly better than the others.

After thinking about it, I don't think the MCP format is good for us; using it would be optimizing for what's essentially an implementation detail, as opposed to a natural elisp interface. Now that I think about it, I think I prefer that we retain the current list-of-plist format with the changes of types being symbols, and of having the enum specification be a list and not an array. We should just convert from the interrace ourselves, because otherwise it's another instance of the interface being something that's convenient for us as implementors as opposed to a natural elisp interface.

I don't feel strongly about this, since I think anything would work in practice. But it'd be an improvement. Do you agree?

karthink · 2025-01-11T03:20:56Z

karthink
Jan 11, 2025
Author

So, in what we have implemented, required is per-argument, which I'd call at the lower-level, the level of the arguments. The way MCP and JSON specification does generally is have it be what I'd call the higher-level, the object container of all the arguments.

Got it. On that note, did you switch from specifying :required to specifying :optional, like in some of my examples above? As mentioned in my opening post, I think it's more elispy and less often needed, since optional args are rarer than required args. We are focusing now on a natural elisp interface anyway.

Now that I think about it, I think I prefer that we retain the current list-of-plist format with the changes of types being symbols, and of having the enum specification be a list and not an array.

:args as list-of-plists format: Okay.
:type as symbols: I think strings is actually more uniform, as :name and :description are also strings, and the enum is a list of strings. Now all attributes of a function argument are specified the same way.
:enum as list instead of array: I don't know if you mean :enum (string "option1" "option2") or :enum ("option1" "option2"). I'm okay with the second version, since :type has to be specified separately for all LLM APIs anyway. Also :enum ("string" "option1" "option2") is potentially confusing if types are specified as strings.

So we're on the same page, here is an :args specification. Let me know if it looks good to you:

:args
(list '(:name "query"
        :description "A space separated string of query terms.  For example, \"programming elixir beam\""
        :type "string")

      '(:name "is_video"
        :description "A boolean indicating if the query is for videos."
        :type "boolean")

      '(:name "daterange"
        :description "A range of publication dates to search between, in the format YYYY-MM-DD--YYYY-MM-DD.
For example, \"2023-12-01--2024-01-10\"."
        :type "string"
        :optional t)

      '(:name "tag"
        :description "tag group of feed entries"
        :type "string"
        :enum ("arxiv" "cs" "dyn" "prog")
        :optional t))

Note above

the use of :optional (not :required),
:enum as a list, not array,
and that the value of :type is a string, not symbol.

EDIT: I'm having trouble using json-serialize because the :enum list inside this object is not serializable. How do you plan to deal with it?

:args '((:name "arg1"
         :description "arg1 description"
         :type "array"
         :items (:type "object"
                 :properties (:key1 (:type "string"
                                     :description "...")
                              :key2 (:type "string"
                                     :enum ("option1" "option2")
                                     :description "..."))))
        (:name "arg2"
         :description "arg2 description"
         :type "string"))

5 replies

ahyatt Jan 11, 2025
Maintainer

optional instead of required is fine with me, I'll make the change.

:type as "string"... actually, now that you ask that second question, I think there's two possibilities. One, we go all the way with making things elisp like. In that case, we just convert enum lists to vectors when we encounter them. The other possibility is that we say that the :type, :enum, :items and :properties are expected to be directly usable in json-serialize to a representation conformant with the JSON schema spec. This makes some amount of sense, since these are named and structured to follow the spec.

The first possibility looks like your example above. Let me explain further why I think symbols are better. I agree that strings are more consistent, but in elisp symbols are typically used to denote set enum-like values. For example, it is (setq display-buffer-alist '((".*" display-buffer-at-bottom))), not (setq display-buffer-alist '((".*" "display-buffer-at-bottom"))). So the most naturally elisp way to do this would be to have symbols, because you can't just set the type to anything. Again, my feeling isn't that strong here, so strings would work. But I wanted to have a complete explanation of why I wanted symbols in case it convinces you.

So, assuming we stick with strings, the second possibility looks more like how we were initially thinking:

'((:name "arg1"
         :description "arg1 description"
         :type "array"
         :items (:type "object"
                 :properties (:key1 (:type "string"
                                     :description "...")
                              :key2 (:type "string"
                                     :enum ["option1" "option2"]
                                     :description "..."))))
        (:name "arg2"
         :description "arg2 description"
         :type "string"))

Both of these are pretty easy to explain. Of the two, the first is best for ordinary use cases, the second makes more sense IMHO for more complicated use-cases. I'm a bit torn between the two, but slightly before the first in which things are as elisp-like as possible.

karthink Jan 11, 2025
Author

@ahyatt Let's call the two proposals schemes 1 and 2:

Scheme 1: elisp-like format, looks natural in elisp.
Scheme 2: JSON-friendly format, makes json-serialize directly useful.

I'm leaning towards scheme 2, but neither one works without a bunch of extra processing.

The reasons are illustrated by this example from the Anthropic API docs. I'm having trouble translating it from/to elisp with both conceptual and implementation aspects. Could you take a look at the following?

This is what we'd like to produce:

{
    "name": "record_summary",
    "description": "Record summary of an image using well-structured JSON.",
    "input_schema": {
        "type": "object",
        "properties": {
            "key_colors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "r": { "type": "number", "description": "red value [0.0, 1.0]" },
                        "g": { "type": "number", "description": "green value [0.0, 1.0]" },
                        "b": { "type": "number", "description": "blue value [0.0, 1.0]" },
                        "name": { "type": "string", "description": "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\"" }
                    },
                    "required": [ "r", "g", "b", "name" ]
                },
                "description": "Key colors in the image. Limit to less then four."
            },
            "description": {
                "type": "string",
                "description": "Image description. One to two sentences max."
            },
            "estimated_year": {
                "type": "integer",
                "description": "Estimated year that the images was taken, if is it a photo. Only set this if the image appears to be non-fictional. Rough estimates are okay!"
            }
        },
        "required": [ "key_colors", "description" ]
    }
}

Here's the specification of everything except the :args:

(gptel-make-tool                        ;or `llm-make-tool'
 :name "record_summary"
 :description "Record summary of an image using well-structured JSON."
 :function #'identity                   ;or anything
 :args ...)                             ;see below

That was the easy part. Now the :args:

:args (list '(:name "key_colors"
              :description "Key colors in the image.  Limit to less than four."
              :type array
              :items (:type object
                      :properties (:r (:type number :description "red value [0.0, 1.0]")
                                   :g (:type number :description "green value [0.0, 1.0]")
                                   :b (:type number :description "blue value [0.0, 1.0]")
                                   :name (:type string :description: "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\""))))

            '(:name "description"
              :type string
              :description "Image description.  One to two sentences max.")

            '(:name "estimated_year"
              :type integer
              :optional t
              :description "Estimated year that the images was taken, if is it a photo. Only set this if the image appears to be non-fictional. Rough estimates are okay!"))

Here I've used symbols for types.

The description of the argument names is different from the description of the property keys. That is, we have :name "key_colors", but the object keys are specified as :r, :g, :b and :name, and not as :name "r", :name "g" and so on. Essentially we're using different representations at different levels of nesting.

It seems that a more consistent representation would be (for the "key_colors" argument)

:name "key_colors"
:description "Key colors in the image.  Limit to less than four."
:type array
:items (:type object
        :properties
        ((:name "r" :type number :description "red value [0.0, 1.0]")
         (:name "g" :type number :description "green value [0.0, 1.0]")
         (:name "b" :type number :description "blue value [0.0, 1.0]")
         (:name "name" :type string :description: "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\"")))

Alternatively, we could go the other way and make the name of the argument a plist-key, i.e. :key_colors instead of :name "key_colors".

In either scheme we use, we can't just feed the object specification into json-serialize, because we first need to walk through the full tree for each argument and make various changes:
- append :required ["r" "g" "b" "name"] to the argument spec. (Unless you think the :required [...] clause needs to be part of the specification in elisp? If that's the case, we again have a difference in how :required is specified at different levels of nesting.)
- Find all instances of :type foo and change them to :type "foo", as json-serialize can't handle :type foo.
- If we use the (:name "r" :type ...) scheme instead of (:r (:type ...)), then we have to replace it with the latter everywhere.

After doing this the JSON parser walks through the tree again to serialize it.

Is this what llm does right now? gptel doesn't yet handle function arguments that are JSON objects, so I can follow your lead in the implementation.
What do you think about these inconsistencies? Is it okay to have one convention at the level of :args, and another when specifying an argument that's a JSON object?

ahyatt Jan 12, 2025
Maintainer

The only truly consistent way to do this would be the way we talked about in previous messages, which would be to have the object specification at the highest level of :args:

:type object
:properties (:key_colors (:type array
                           :items (:type object
                                         :properties (:r (:type number :description "red value [0.0, 1.0]")
                                                         :g (:type number :description "green value [0.0, 1.0]")
                                                         :b (:type number :description "blue value [0.0, 1.0]")
                                                         :name (:type string :description: "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\"")))                               
                                             :description "Key colors in the image.  Limit to less than four.")
                         :description (:type string
                                             :description "Image description.  One to two sentences max.")
                         :estimated_year (:type integer :description "Estimated year that the images was taken, if is it a photo. Only set this if the image appears to be non-fictional. Rough estimates are okay!"))
:required ["key_colors" "description"]

But, as I mentioned before, I think this is needlessly complicated for the simple case, and also gives users the false impression that :args can be anything, when in fact it must be an object. Because of this key fact, that the args are forced to be an object (by at least some LLMs), I think it's OK to have a bit of inconsistency.

In your more consistent approach, it isn't perfectly consistent because :args is different than :properties, and there's no initial :type object.

I think perfect consistency isn't needed. To me, it's pretty easy to explain in the docs our solution we've been discussing above: you have a type, that can be string, integer, and if it's array, enum or object, it needs the same sibling properties as in the way types are specified in llm for JSON output. I think the case of arguments that themselves are objects is an outlier, and won't normally happen. The vast majority of elisp functions aren't like that.

If you are leaning towards the JSON-friendly format, then let's keep what we have now, which is like what I describe in the paragraph above, with vector representations for enums. I think it's also pretty easy to explain. If we do that, it would be maximally consistent to use strings instead of symbols for types, with the explanation being that type information is just using the JSON schema specification and will be fed directly to json-serialize.

karthink Jan 12, 2025
Author

Okay, let's do that. Here's a summary of the decisions so far:

:args is a list of plists, one for each argument.
Each argument in this plist requires the keys :name, :description and :type, with values as strings.
For each argument, the keys :optional, :enum, :properties, :required and :items are conditional or optional.
- The value of :optional is t if the argument is optional in the function call.
- The value of :enum is a vector of strings representing allowed values.
- :required is required if the :type is "object". Its value is a vector of strings listing the required object keys.
- :properties is required if the :type is "object". Its value is the JSON object schema as a plist, as would be produced by json-parse-buffer.
- :items is required when the :type is "array". The value of :items is a plist with the :type and any supporting keys required for that type. For example, if the item :type is "object", the :items plist should include :properties and :required.
As a check, the values of :properties and :items are passed to json-serialize, so they should be appropriate to json-serialize.

Under these rules, the following tool call JSON spec:

{
    "name": "record_summary",
    "description": "record summary of an image using well-structured json.",
    "input_schema": {
        "type": "object",
        "properties": {
            "key_colors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "r": { "type": "number", "description": "red value [0.0, 1.0]" },
                        "g": { "type": "number", "description": "green value [0.0, 1.0]" },
                        "b": { "type": "number", "description": "blue value [0.0, 1.0]" },
                        "name": { "type": "string", "description": "human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\"" }
                    },
                    "required": [ "r", "g", "b", "name" ]
                },
                "description": "key colors in the image. limit to less then four."
            },
            "description": {
                "type": "string",
                "description": "image description. one to two sentences max."
            },
            "estimated_year": {
                "type": "integer",
                "description": "estimated year that the images was taken, if is it a photo. only set this if the image appears to be non-fictional. rough estimates are okay!"
            }
        },
        "required": [ "key_colors", "description" ]
    }
}

is constructed via llm-make-tool or gptel-make-tool as

(gptel-make-tool                        ;or `llm-make-tool'
 :name "record_summary"
 :description "Record summary of an image using well-structured JSON."
 :function #'identity                   ;or anything
 :args
 (list '(:name "key_colors"
         :description "Key colors in the image.  Limit to less than four."
         :type "array"
         :items (:type "object"
                 :properties (:r (:type "number" :description "red value [0.0, 1.0]")
                              :g (:type "number" :description "green value [0.0, 1.0]")
                              :b (:type "number" :description "blue value [0.0, 1.0]")
                              :name (:type string :description: "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\""))
                 :required ["r" "g" "b" "name"]))

       '(:name "description"
         :type "string"
         :description "Image description.  One to two sentences max.")

       '(:name "estimated_year"
         :type "integer"
         :optional t
         :description "Estimated year that the images was taken, if is it a photo. Only set this if the image appears to be non-fictional. Rough estimates are okay!")))

Note the following:

:args is a list of argument plists.
The values of :name, :type and :description are always strings.
The type of the "key_colors" argument is an array, whose items are of type object.
The :properties of this object are specified as a plist whose keys are :r, :g and :b. This property plist is converted by json-serialize into:

{
  "r": {
    "type": "number",
    "description": "red value [0.0, 1.0]"
  },
  "g": {
    "type": "number",
    "description": "green value [0.0, 1.0]"
  },
  "b": {
    "type": "number",
    "description": "blue value [0.0, 1.0]"
  },
  "name": {
    "type": "string",
    "description:": "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\""
  }
}

which has the form we want.

ahyatt Jan 14, 2025
Maintainer

Yes, agreed. Thanks for the summary!

karthink · 2025-01-16T22:27:49Z

karthink
Jan 16, 2025
Author

@ahyatt: @ultronozm provided a function (convert-tool-types) to recursively convert :type values from symbols to strings in a tool spec, so that we can do something like:

(thread-last
  tool-spec
  (convert-tool-types)
  (apply #'gptel-make-tool)
  (json-serialize))

after specifying :types as symbols.

Here is convert-tool-spec:

(defun convert-tool-types (spec)
  "Convert symbol :type values in tool SPEC to strings destructively."
  (cond
   ((not (listp spec)) spec)
   ((keywordp (car spec))
    (let ((tail spec))
      (while tail
        (when (and (eq (car tail) :type)
                   (symbolp (cadr tail)))
          (setcar (cdr tail) (symbol-name (cadr tail))))
        (when (listp (cadr tail))
          (convert-tool-types (cadr tail)))
        (setq tail (cddr tail)))
      spec))
   (t (dolist (element spec)
        (when (listp element)
          (convert-tool-types element)))
      spec)))

Note that it actually handles :type specified as strings OR symbols.
It's destructive on purpose -- I didn't want to create a copy of tool-spec, as tool-spec will already be a temporary variable.

Here is their argument:

For what it's worth, since the proposed tool format may affect the whole ecosystem going forward, I'm persuaded by arguments in favor of using symbols for :type: being idiomatic, eq comparison, and typographic distinction against the name/description strings. The json-serialize argument is understandable, but seems like an implementation detail that can be handled once in core packages at API boundaries, rather than every time someone develops or manipulates a tool.

What do you think?

I have a weak preference for strings, mostly to keep the implementation simpler (for us) and the description uniform (for the tool spec author). The tool spec author does not have to remember that :type must be a symbol but everything else must be strings or plists.

However, with convert-tool-types in hand I'm also okay changing it to symbols if you would prefer that.

CC @lizqwerscott: Please feel free to pipe in, as you're writing the adapter from MCP -> the elisp tool spec format.

0 replies

ahyatt · 2025-01-17T03:48:23Z

ahyatt
Jan 17, 2025
Maintainer

I can deal with strings or symbols without the preprocessing step, and it wouldn't be any significant complication. So to me, it's just about how the API is.

As an API, as I think most people would use it, symbols are better for the reasons I've mentioned before, and it seems like @ultronozm is agreeing with those reasons. The only awkward thing is that we basically have part s of our API being mostly but not completely compatible with JSON serialization of JSON schema without it, which is a bit weird. Still, the symbols will feel most natural to an elisp programmer, so I think I'd prefer that if it you are also good with it.

One thing that might be reasonable is to accept both strings and symbols. That's easily done, and we can have symbols be the norm in the documentation, but accept strings as well.

9 replies

ahyatt Jan 18, 2025
Maintainer

If possible, I'd like to decide on this in the next few days because I feel like we're bikeshedding here and I'd like to finalize and merge my PR moving the llm package to this new tool spec. To summarize what I think everyone's position's are, we are agreed on everything except still discussing whether types should be strings or symbols.

Me: preference for symbols, but OK with either.
karthink: preference for strings (maybe still thinking about it)
ultronozm: preference for symbols

I don't think anyone here has a very strong opinion on this, so I propose we agree that types are symbols and finalize this. If that's ok, please let me know, and once we agree, we can all proceed.

karthink Jan 18, 2025
Author

It's done, :type must be a symbol 👍

Let me update the full example from above as a top-level comment in this thread, for reference.

karthink Jan 18, 2025
Author

@ahyatt Are you okay with ignoring additional keys in the tool spec? I mean keys other than :name, :function, :description, :args and :async. I ask because gptel uses additional keys (like :confirm, discussed above), and it would be good for interoperability if llm simply ignores keys not meaningful to it instead of throwing an error.

ahyatt Jan 19, 2025
Maintainer

Thanks for the confirmation on the decision! About additional keys in the keyspec, there's two designs that can work for the llm package.

So in one possible design, the client just reads the tool spec and constructs the llm tools based on the slots that the llm library cares about only. Since the client can control it's behavior based on the fuller spec, and the llm only needs the more restrictive spec, it's fine, and the client knows about the mapping between the two.

The other design is that the llm will store the extra slots but not do anything with it. Then the client can construct the structs once, and use them for its own logic, and for passing to llm.

Both will work, but I think the former one makes the most sense, since it doesn't require the llm to care about client values at all. So for now, I think the best thing is to not ignore additional keys, because I don't think they will be passed in. However, if that turns out to be incorrect, I can always change to the latter design.

karthink Jan 19, 2025
Author

Thanks for the confirmation on the decision!

Bikeshedding is a slippery slope!

Both will work, but I think the former one makes the most sense, since it doesn't require the llm to care about client values at all.

That sounds good to me, thanks for the accommodation! Being able to run llm-make-tool and gptel-make-tool on the same plist is the best case scenario for interoperability.

karthink · 2025-01-18T19:24:27Z

karthink
Jan 18, 2025
Author

Explanation of the final tool spec format

Each tool spec is a plist with the keys :name, :function, :description, :args and :async.

The corresponding values:

:name: The name of the tool, recommended to be in Javascript style snake_case.

:function: The function itself (lambda or symbol) that runs the tool.

:description: A verbose description of what the tool does, how to
call it and what it returns.

:args: A list of plists specifying the arguments, or nil for a function that
takes no arguments. Each plist in ARGS requires the following keys:

argument :name and :description, as strings.
argument :type, as a symbol. Allowed types are those understood by the JSON
schema: string, number, integer, boolean, array, object or null

The following plist keys are conditional/optional:

:optional, boolean indicating if argument is optional
:enum for enumerated types, whose value is a vector of strings representing
allowed values. Note that :type is still required for enums.
:items, if the :type is array. Its value must be a plist including at least
the item's :type, as a symbol.
:properties, if the :type is object. Its value must be a plist that can be
serialized into a JSON object specification by json-serialize, with the exception that :type specifications in this plist must be symbols.
:required, if the :type is "object". Its value is a vector of strings listing the required object keys.

:async: boolean indicating if the elisp function is asynchronous. If :async is t, the function should take a callback as its first argument, along with the arguments specified in :args, and run the callback with the tool call result when it's ready. (This callback is an implementation detail and must not be specified in :args.)

Example with `object` type

Under these rules, the following tool call JSON spec:

{
    "name": "record_summary",
    "description": "record summary of an image using well-structured json.",
    "input_schema": {
        "type": "object",
        "properties": {
            "key_colors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "r": { "type": "number", "description": "red value [0.0, 1.0]" },
                        "g": { "type": "number", "description": "green value [0.0, 1.0]" },
                        "b": { "type": "number", "description": "blue value [0.0, 1.0]" },
                        "name": { "type": "string", "description": "human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\"" }
                    },
                    "required": [ "r", "g", "b", "name" ]
                },
                "description": "key colors in the image. limit to less then four."
            },
            "description": {
                "type": "string",
                "description": "image description. one to two sentences max."
            },
            "estimated_year": {
                "type": "integer",
                "description": "estimated year that the images was taken, if is it a photo. only set this if the image appears to be non-fictional. rough estimates are okay!"
            }
        },
        "required": [ "key_colors", "description" ]
    }
}

is constructed via llm-make-tool or gptel-make-tool as

(gptel-make-tool                        ;or `llm-make-tool'
 :name "record_summary"
 :description "Record summary of an image using well-structured JSON."
 :function #'identity                   ;or anything
 :args
 (list '(:name "key_colors"
         :description "Key colors in the image.  Limit to less than four."
         :type array
         :items (:type object
                 :properties (:r (:type number :description "red value [0.0, 1.0]")
                              :g (:type number :description "green value [0.0, 1.0]")
                              :b (:type number :description "blue value [0.0, 1.0]")
                              :name (:type string :description: "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\""))
                 :required ["r" "g" "b" "name"]))

       '(:name "description"
         :type string
         :description "Image description.  One to two sentences max.")

       '(:name "estimated_year"
         :type integer
         :optional t
         :description "Estimated year that the images was taken, if is it a photo. Only set this if the image appears to be non-fictional. Rough estimates are okay!")))

Note the following:

:args is a list of argument plists.
The values of :name and :description are always strings.
The values of :type are always symbols.
The type of the "key_colors" argument is an array, whose items are of type object.
The :properties of this object are specified as a plist whose keys are :r, :g and :b. This property plist is converted by json-serialize into:

{
  "r": {
    "type": "number",
    "description": "red value [0.0, 1.0]"
  },
  "g": {
    "type": "number",
    "description": "green value [0.0, 1.0]"
  },
  "b": {
    "type": "number",
    "description": "blue value [0.0, 1.0]"
  },
  "name": {
    "type": "string",
    "description:": "Human-readable color name in snake_case, e.g. \"olive_green\" or \"turquoise\""
  }
}

which has the form we want.

Example with `:enum` and `:optional` argument

(gptel-make-tool
 :function (lambda (location unit)
             (url-retrieve-synchronously (format "api.weather.com/..." location unit)))
 :name "get_weather"
 :description "Get the current weather in a given location"
 :args (list '(:name "location"
               :type string
               :description "The city and state, e.g. San Francisco, CA")
             '(:name "unit"
               :type string
               :enum ["celsius" "farenheit"]
               :description
               "The unit of temperature, either 'celsius' or 'fahrenheit'"
               :optional t)))

Note the following:

The value of :enum is an array of strings.
:type is still required when :enum is provided.
Specifying the "unit" argument is optional, as indicated by :optional t.

0 replies

karthink · 2025-01-19T20:35:10Z

karthink
Jan 19, 2025
Author

@ahyatt Have you given thought to handling structured tool results? I know that by default llm does not send the tool results back to the LLM, but if you want to chain tools that produce compound result types (like a hash-table or a list of structs), how would you handle converting the result of the first tool into an input for the second? I put some thoughts down about this here, let me know if you have any input.

3 replies

ahyatt Jan 20, 2025
Maintainer

I think things are pretty flexible, although I haven't tried chaining results. Once a tool is called, the response to that is serialized and sent back to the LLM once the client decides to do so, so presumably the LLM can then choose to call a different tool if the prompting tells it that this would be the right thing to do. One thing we don't do is allow the client to send a mix of text and structured results back. If that turns out to be necessary, it can be added later.

karthink Jan 20, 2025
Author

The part that I'm concerned about is the serialization of the results.

Suppose tool1 returns a hash-table that we want to send as the tool call result to the LLM, possibly so it can call tool2.

We have to convert the hash-table into a JSON object. Whose job is that? The tool's? The LLM client's? If it's the tool's job, then every tool's :function can only ever return a JSON type. If it's the LLM client's, it's not clear how all opaque structures should be serialized. For example, a hash table whose keys are symbols and values can be cl-structs of different types.
Do all LLMs even accept arbitrary JSON as tool results? I've had mixed results, especially with Llama and Gemini models. Right now gptel always sends results back as strings.

ahyatt Jan 21, 2025
Maintainer

In llm the expectation is that the tool functions will need to return something serializable as JSON or formattable as a string. If that doesn't happen, and the client wants to send the result back, they will get an error, at least for some models (our Open AI implementation will just convert the result to a string and send it back, so technically anything will work). I'm sure this will have to be tweaked, I also have a variety of complications that prevent a simple answer.

karthink · 2025-01-20T18:17:26Z

karthink
Jan 20, 2025
Author

The tool-use branch of gptel is now merged into master. There are still some issues remaining to be addressed (like a convention for result serialization) but the existing functionality should not be affected. if any minor tweaks are needed to the tool spec format I think it should be fine, as there are no consumers yet on the gptel side. Thanks for the discussion and help @ahyatt.

1 reply

ahyatt Jan 21, 2025
Maintainer

Thanks, it will be interesting to see what people come up with!

Standard function-call data spec for Emacs LLM clients #124

karthink Dec 17, 2024

Replies: 18 comments · 43 replies

ahyatt Dec 18, 2024 Maintainer

karthink Dec 18, 2024 Author

ahyatt Dec 18, 2024 Maintainer

karthink Dec 18, 2024 Author

ahyatt Dec 18, 2024 Maintainer

karthink Dec 19, 2024 Author

ahyatt Dec 20, 2024 Maintainer

karthink Dec 20, 2024 Author

ahyatt Dec 20, 2024 Maintainer

karthink Dec 21, 2024 Author

ahyatt Dec 21, 2024 Maintainer

karthink Dec 23, 2024 Author

ahyatt Dec 23, 2024 Maintainer

ahyatt Dec 24, 2024 Maintainer

karthink Dec 24, 2024 Author

karthink Dec 30, 2024 Author

ahyatt Dec 31, 2024 Maintainer

karthink Dec 31, 2024 Author

ahyatt Dec 31, 2024 Maintainer

ahyatt Jan 1, 2025 Maintainer

karthink Jan 1, 2025 Author

ahyatt Jan 2, 2025 Maintainer

karthink Jan 2, 2025 Author

ahyatt Jan 3, 2025 Maintainer

karthink Jan 3, 2025 Author

karthink Jan 5, 2025 Author

Many notes!

Differences in specification

Extra keys

ahyatt Jan 5, 2025 Maintainer

karthink Jan 5, 2025 Author

ahyatt Jan 6, 2025 Maintainer

karthink Jan 5, 2025 Author

karthink Jan 7, 2025 Author

lizqwerscott Jan 7, 2025

ahyatt Jan 7, 2025 Maintainer

karthink Jan 10, 2025 Author

ahyatt Jan 11, 2025 Maintainer

karthink Jan 11, 2025 Author

ahyatt Jan 11, 2025 Maintainer

karthink Jan 11, 2025 Author

ahyatt Jan 12, 2025 Maintainer

karthink Jan 12, 2025 Author

ahyatt Jan 14, 2025 Maintainer

karthink Jan 16, 2025 Author

ahyatt Jan 17, 2025 Maintainer

ahyatt Jan 18, 2025 Maintainer

karthink Jan 18, 2025 Author

karthink Jan 18, 2025 Author

ahyatt Jan 19, 2025 Maintainer

karthink Jan 19, 2025 Author

karthink Jan 18, 2025 Author

Explanation of the final tool spec format

Example with object type

Example with :enum and :optional argument

karthink Jan 19, 2025 Author

ahyatt Jan 20, 2025 Maintainer

karthink
Dec 17, 2024

Replies: 18 comments 43 replies

ahyatt
Dec 18, 2024
Maintainer

karthink
Dec 18, 2024
Author

ahyatt Dec 18, 2024
Maintainer

karthink
Dec 18, 2024
Author

ahyatt Dec 18, 2024
Maintainer

karthink
Dec 19, 2024
Author

ahyatt Dec 20, 2024
Maintainer

karthink Dec 20, 2024
Author

ahyatt Dec 20, 2024
Maintainer

karthink
Dec 21, 2024
Author

ahyatt Dec 21, 2024
Maintainer

karthink
Dec 23, 2024
Author

ahyatt Dec 23, 2024
Maintainer

ahyatt
Dec 24, 2024
Maintainer

karthink Dec 24, 2024
Author

karthink
Dec 30, 2024
Author

ahyatt Dec 31, 2024
Maintainer

karthink Dec 31, 2024
Author

ahyatt Dec 31, 2024
Maintainer

ahyatt
Jan 1, 2025
Maintainer

karthink Jan 1, 2025
Author

ahyatt Jan 2, 2025
Maintainer

karthink
Jan 2, 2025
Author

ahyatt Jan 3, 2025
Maintainer

karthink Jan 3, 2025
Author

karthink
Jan 5, 2025
Author

ahyatt Jan 5, 2025
Maintainer

karthink Jan 5, 2025
Author

ahyatt Jan 6, 2025
Maintainer

karthink
Jan 5, 2025
Author

karthink Jan 7, 2025
Author

ahyatt Jan 7, 2025
Maintainer

karthink Jan 10, 2025
Author

ahyatt Jan 11, 2025
Maintainer

karthink
Jan 11, 2025
Author

ahyatt Jan 11, 2025
Maintainer

karthink Jan 11, 2025
Author

ahyatt Jan 12, 2025
Maintainer

karthink Jan 12, 2025
Author

ahyatt Jan 14, 2025
Maintainer

karthink
Jan 16, 2025
Author

ahyatt
Jan 17, 2025
Maintainer

ahyatt Jan 18, 2025
Maintainer

karthink Jan 18, 2025
Author

karthink Jan 18, 2025
Author

ahyatt Jan 19, 2025
Maintainer

karthink Jan 19, 2025
Author

karthink
Jan 18, 2025
Author

Example with `object` type

Example with `:enum` and `:optional` argument

karthink
Jan 19, 2025
Author

ahyatt Jan 20, 2025
Maintainer