Replies: 18 comments 43 replies
-
Thanks, it's interesting that you posted this at this time. I've been thinking of moving away from my current implementation for function calling, and making an API-breaking change. First of all, I wrote it when it was still called "function calling", but now everyone calls it "tool use", so my naming is confusing. I also think the way I did it requires too much struct-building. Your note about I've recently added json mode, which also uses JSON schema, but with some differences in how it's used than tool calling, and it ends up looking a bit more like what you have proposed. An example, taken from my integration tests is: (llm-chat
provider
(llm-make-chat-prompt
"List the 3 largest cities in France in order of population, giving the results in JSON."
:response-format
'(:type object
:properties
(:cities (:type array :items (:type string)))
:required (cities)))) This isn't released yet, but I think it's a lighter-weight and easier to read way to specify schema. I could change it to some other format before release, so now's a good time to change it to something that would work well for both it and using the same structure for tool use. One question is how much you want to support the JSON schema - for example, can arguments be more than just strings, integers, etc, and be objects that have their own structure? |
Beta Was this translation helpful? Give feedback.
-
Are you planning to remove the tool-use feature entirely and implement it via
I think the top level struct for a tool/function-call spec makes sense, that API is stable. It was only the component struct (like args) that I thought might be too constraining.
Not quite, because the examples in the OpenAI/Anthropic API are using both (:name "unit"
:type "string"
:enum ["celsius" "farenheit"]
:description "The unit of temperature, either 'celsius' or 'fahrenheit'"
:required nil)
(llm-chat
provider
(llm-make-chat-prompt
"List the 3 largest cities in France in order of population, giving the results in JSON."
:response-format
'(:type object
:properties
(:cities (:type array :items (:type string)))
:required (cities))))
This method looks good for arbitrary JSON. Tool-use requires a more constrained schema, so a struct actually makes sense there.
How would you communicate composite types to the API? |
Beta Was this translation helpful? Give feedback.
-
Could you let me know when you decide on a schema for specifying tools? I can try to stay close to it so we can share tools between LLM clients in the future.
I agree. |
Beta Was this translation helpful? Give feedback.
-
@ahyatt, how do you handle async tool-use? I've settled on a rather clumsy API and was wondering if you have a better solution: A synchronous tool-use function is defined as: (make-tool
:function #'foo
:description "Return ..."
:args '((:name "arg1"
:description "..."
:type "string"))) and an async one as (make-tool
:function #'bar
:description "Return ..."
:args '((:name "arg1"
:description "..."
:type "string"))
:async t) The synchronous tool is called as On a related note, how do you handle the difference between tools whose return value should be fed back to the LLM, and tools run for side-effects or not run at all because the LLM's tool call JSON is all that was needed? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation; yes - your method is actually better for the cases in which the function that is getting called needs to itself by async. `llm` doesn't have a good way to handle that yet. We just wait until the function is finished, then update the prompt so that the next time the user calls (they have to re-use the same prompt struct) it has the right information.
I decided at the start that gptel will never block Emacs, and have since paid dearly with my time to uphold that principle!
As to how we handle both text & functions for Claude, we append the text to the prompt, after the function call results, so that the information is there for when the client calls Claude again.
Cool. In gptel I'm calling the callback twice, once with the text and again with the tool result.
But I may be forgetting some detail here, the whole dance you have to do with tool use in conversations is complicated, under-documented, and has several non-standard in different providers.
I finished implementing tool use for all the major APIs with and without streaming responses, and it was a big ol' mess. A lot of the demo code online, and even the official API documentation in the case of Gemini, is just flat out wrong. All the idiosyncracies are fresh in my mind at the moment, so let me know if you need help with the details.
I think automatically feeding it back like you will be doing would be reasonable, but I think most of the time it isn't needed. If I rethink the tool use interface this weekend, I'll consider adopting your way.
All right.
|
Beta Was this translation helpful? Give feedback.
-
@ahyatt I saw that json output has been added to llm. Did you settle on a schema for declaring tools in the process?
(Tool use is now working in gptel, but I'm some distance from merging it into master as it needs a lot of testing.)
|
Beta Was this translation helpful? Give feedback.
-
@karthink one question I'm curious about - how are you dealing with multiple ("parallel" as some call it) tool use calls? Callbacks seem like a good solution, but it gets pretty messy when you have more than one callback that must return before adding the info to the prompt and sending it back to the LLM. FWIW, I think not dealing with it would be a reasonable choice. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure how I'm going to roll this out. This will be a API-breaking changing as-is, and I need to figure out if that's OK or whether I should make it backward compatible. I'm not aware of anyone actually using the function calling yet, though, and I've long included a note in the README that this is liable to change.
@ahyatt I'm getting close to merging the tool-use branch of gptel into master, so I wanted to check in -- have you made more progress on the tool specification format? As mentioned at the start, I'm hoping we can stay close to a universal tool spec format for Emacs LLM clients and share a single repository of tools.
The schema currently used by gptel is the same as in my examples above:
- A tool type is a cl-defstruct with the slots :name, :function, :description and :args, along with some optional keys that I need to integrate tools into gptel's UI.
- Each argument in :args is a plist of arguments with the (expected) keys :name, :description, :type and optionally :enum.
(This is not final.)
|
Beta Was this translation helpful? Give feedback.
-
I've started a pull request in #133, and I'll announce this non-backwards-compatible change to give any clients the change to read and respond. @karthink , I see you've put your version out there, which is great. Let's make sure we're compatible on the details. Two things to watch out for:
|
Beta Was this translation helpful? Give feedback.
-
This is not the case with (json-serialize
'(:name "clozes" :description "An array of clozes" :type "array" :items (:type "string"))
:null-object :null :false-object :json-false)
;; => "{\"name\":\"clozes\",\"description\":\"An array of clozes\",\"type\":\"array\",\"items\":{\"type\":\"string\"}}"
(json-serialize
'(:name "clozes" :description "An array of clozes" :type array :items (:type string))
:null-object :null :false-object :json-false)
;; => Debugger entered--Lisp error: (wrong-type-argument json-value-p array) Are you implementing this behavior on top, converting symbols to their symbol-names first? I would prefer to be able to just use the built-in
I have questions about this example too, but I'll wait to clear up the symbol/string issue first since I can't json-serialize your example yet. |
Beta Was this translation helpful? Give feedback.
-
@ahyatt here is the current state of the tool defintion process in gptel, using an example: (gptel-make-tool
:name "some_name"
:function #'some-func
:description "Some description"
:args '((:name "arg1"
:description "arg1 description"
:type "array"
:items (:type "object"
:properties (:key1 (:type "string"
:description "...")
:key2 (:type "string"
:enum ["option1" "option2"]
:description "..."))))
(:name "arg2"
:description "arg2 description"
:type "string"))
:async t
;; The following keys are optional
:category "web"
:confirm t
:include nil) Many notes!Differences in specification
:type "string"
:enum ["option1" "option2"] instead of :type "string"
:enum ("string" "option1" "option2") (I've converted all symbols to strings here.) This is again because (json-serialize '(:type "string"
:enum ["option1" "option2"]))
;; =>
"{\"type\":\"string\",\"enum\":[\"option1\",\"option2\"]}" I cannot get Extra keysThere are four extra keys:
The next three keys are only useful if you're creating a UI, as gptel does, so they don't have to be part of the shared tool repository. I'll describe them here in case you think they might be useful.
I demonstrate the effect of gptel-tool-use-filesystem-confirm-demo.mp4Let me know what you think. |
Beta Was this translation helpful? Give feedback.
-
@ahyatt, another issue to consider before you merge your PR: take a look at the gymnastics required to translate a tool provided by MCP to the format used by llm/gptel. I don't understand the MCP tool specification yet, but I think this might be something to consider. |
Beta Was this translation helpful? Give feedback.
-
Got it. On that note, did you switch from specifying
So we're on the same page, here is an :args
(list '(:name "query"
:description "A space separated string of query terms. For example, \"programming elixir beam\""
:type "string")
'(:name "is_video"
:description "A boolean indicating if the query is for videos."
:type "boolean")
'(:name "daterange"
:description "A range of publication dates to search between, in the format YYYY-MM-DD--YYYY-MM-DD.
For example, \"2023-12-01--2024-01-10\"."
:type "string"
:optional t)
'(:name "tag"
:description "tag group of feed entries"
:type "string"
:enum ("arxiv" "cs" "dyn" "prog")
:optional t)) Note above
EDIT: I'm having trouble using :args '((:name "arg1"
:description "arg1 description"
:type "array"
:items (:type "object"
:properties (:key1 (:type "string"
:description "...")
:key2 (:type "string"
:enum ("option1" "option2")
:description "..."))))
(:name "arg2"
:description "arg2 description"
:type "string")) |
Beta Was this translation helpful? Give feedback.
-
@ahyatt: @ultronozm provided a function ( (thread-last
tool-spec
(convert-tool-types)
(apply #'gptel-make-tool)
(json-serialize)) after specifying Here is (defun convert-tool-types (spec)
"Convert symbol :type values in tool SPEC to strings destructively."
(cond
((not (listp spec)) spec)
((keywordp (car spec))
(let ((tail spec))
(while tail
(when (and (eq (car tail) :type)
(symbolp (cadr tail)))
(setcar (cdr tail) (symbol-name (cadr tail))))
(when (listp (cadr tail))
(convert-tool-types (cadr tail)))
(setq tail (cddr tail)))
spec))
(t (dolist (element spec)
(when (listp element)
(convert-tool-types element)))
spec)))
Here is their argument:
What do you think? I have a weak preference for strings, mostly to keep the implementation simpler (for us) and the description uniform (for the tool spec author). The tool spec author does not have to remember that However, with CC @lizqwerscott: Please feel free to pipe in, as you're writing the adapter from MCP -> the elisp tool spec format. |
Beta Was this translation helpful? Give feedback.
-
I can deal with strings or symbols without the preprocessing step, and it wouldn't be any significant complication. So to me, it's just about how the API is. As an API, as I think most people would use it, symbols are better for the reasons I've mentioned before, and it seems like @ultronozm is agreeing with those reasons. The only awkward thing is that we basically have part s of our API being mostly but not completely compatible with JSON serialization of JSON schema without it, which is a bit weird. Still, the symbols will feel most natural to an elisp programmer, so I think I'd prefer that if it you are also good with it. One thing that might be reasonable is to accept both strings and symbols. That's easily done, and we can have symbols be the norm in the documentation, but accept strings as well. |
Beta Was this translation helpful? Give feedback.
-
Explanation of the final tool spec formatEach tool spec is a plist with the keys The corresponding values:
The following plist keys are conditional/optional:
Example with
|
Beta Was this translation helpful? Give feedback.
-
@ahyatt Have you given thought to handling structured tool results? I know that by default |
Beta Was this translation helpful? Give feedback.
-
The tool-use branch of gptel is now merged into master. There are still some issues remaining to be addressed (like a convention for result serialization) but the existing functionality should not be affected. if any minor tweaks are needed to the tool spec format I think it should be fine, as there are no consumers yet on the gptel side. Thanks for the discussion and help @ahyatt. |
Beta Was this translation helpful? Give feedback.
-
Hi @ahyatt,
I'm adding tool-use to gptel and wanted to coordinate with you on the tool definition format. I think it would be good to have a community-maintained bank of commonly useful tool calls that can plug in easily into all Emacs LLM clients. gptel uses a different internal data structure to manage tools from llm, so what do you think of defining tools as loosely-structured plists that we can both use?
I can explain why. Here's an example tool definition that can be read by both llm and gptel:
The repo would contain this piece of data along with an implementation of
get-weather
. This example is useless, but you can imagine commonly useful tools, like ones that fetch web video or google scholar results, or results from info manuals.Here's how
llm
could import this:gptel can do something similar to convert the data into its internal tool structure.
If you are interested in this idea, we can decide on a plist format. I have two points of feedback on the current implementation of tool definitions in
llm
, one minor and one major::required
key can be inverted to:optional
, with a default value ofnil
. This way defining an argument works like in emacs-lisp, and:required
does not need to be specified, since the shorter declaration:will imply that it's a required argument, and
explicitly specifies that it's optional, like
&optional
in an elisp function. I would expect optional arguments to be rarer across tool definitions than required ones.:enum
field:The
:enum
field is currently not allowed bymake-llm-function-arg
. I don't know what fields the full JSON schema allows here, but I'm guessing restricting them inmake-llm-function-arg
might cause issues. In gptel I'm currently just using a plist for the function arg spec.Beta Was this translation helpful? Give feedback.
All reactions