Guides

You can find guides for commonly used features from our API on this page.


Function calling enables language models to use external tools, which can intimately connect models to digital and physical worlds.

This is a powerful capability that can be used to enable a wide range of use cases.

  • Calling public APIs for actions ranging from looking up football game results to getting real-time satellite positioning data
  • Analyzing internal databases
  • Browsing web pages
  • Executing code
  • Interacting with the physical world (e.g. booking a flight ticket, opening your tesla car door, controlling robot arms)

You can start by designing a function interface for the language model. Some questions you should aim to think about are:

  • What do you want the assistant to be able to do?
  • What functions should it be able to use?

This is very similar to designing helper functions when you write code.

Let's walk through a simple example together. Assume, you want an assistant for navigating a website. You want the assistant to be able to do the following:

  1. Navigate to a specific page
  2. Click on a button
  3. Fill in a form
  4. Submit a form

Your straw-man implementation might look like this:

python

def open_website(url):
    pass
   
def click(html, button):
    pass
   
def assistant():
    input = Input("Hello! I am here helping you to navigate a website! What do you want to do?")
    html = open_website(url)
    button1 = ... # parse out a button from input magically
    html = click(html, button1)
    button2 = ... # parse out another button from input magically
    html = click(html, button2)
    input2 = Input("Voilà we are done! Do you want to join xAI? :)")
    button3 = ... # parse out a button from input2 magically
    html = click(html, button3)
    ...
    input3 = Input("We will get back to you soon!")

Instead of having such hard-coded Python methods for your assistant, you can use xAI's language models to prompt the LLM to carry out the task for you.

Some tips to keep in mind when designing functions:

  1. Factor out anything that is not text understanding to functions
  2. Functions need to be implementable and executable (i.e.it can call another assistant!)
  3. Select a proper level of abstraction so that it is not too high level that it makes assistant’s job trivial compared to implementing the function(e.g. make_me_money) and not too low level that it takes the model a really long time to do anything (e.g. move_cursor_left_by_one_pixel)

To make this design concrete, you need to write down abstract function definitions similar to defining abstract helper functions via signatures:

python

functions = [
    {
        "name": "open_website",
        "description": "Open a website and return the HTML as a string",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "A URL",
                    "example_value": "https://x.ai/",
                },
            },
            "required": ["url"],
            "optional": [],
        },
    },
    {
        "name": "click",
        "description": "Click any button on a website and returns the new HTML",
        "parameters": {
            "type": "object",
            "properties": {
                "html": {
                    "type": "string",
                    "description": "A HTML",
                },
                "button": {
                    "type": "string",
                    "description": "A text description of a button on the html page",
                },
            },
            "required": ["html", "button"],
            "optional": [],
        },
    },
]

Descriptions should be written in detail so the LLM can understand what it does and how to call it just by looking at this.

python

import requests

def open_website(url):
    return requests.get(url).text

def click(html, button):
    ...
    return html

Write a system prompt describing what your assistant should do. You can also don’t write any system prompt and just leave it to Grok!

Let's start by getting a tool execution request from the assistant. To do this, we need to start passing the function definitions, system prompts, and any prior dialogues to the assistant.

python

from openai import OpenAI
import json

MODEL_NAME = "grok-preview"
XAI_API_KEY = os.getenv("XAI_API_KEY")

client = OpenAI(
    api_key=XAI_API_KEY,
    base_url="https://api.x.ai/v1",
)

tools = [{"type": "function", "function": f} for f in functions]

messages = [
    {"role": "system", "content": "You are a helpful webpage navigation assistant. Use the supplied tools to assist the user."},
    {"role": "user", "content": "Hi, can you go to the career page of xAI website?"}
]

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=tools,
)

If the assistant decides no tool call is necessary. In that case, the response will contain a direct reply to the user in the normal way that Chat Completions does. This typically happens when the assistant wants clarification from the user or user is asking some questions that do not need function calls.

Alternatively, if the assistant decides a tool call is necessary, the response will contain a function call request. You can see an example below:

python

Choice(
    finish_reason='tool_calls', 
    index=0, 
    logprobs=None, 
    message=chat.completionsMessage(
        content="I am opening the xAI website to navigate to the career page.", 
        role='assistant', 
        function_call=None, 
        tool_calls=[
            chat.completionsMessageToolCall(
                id='call_1234', 
                function=Function(
                    arguments='{"url":"https://x.ai/"}', 
                    name='open_website'), 
                type='function')
        ])
)

Next, we need to handle the tool execution request from the model. Let's implement the function that will be called by the model.

python

tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call['function']['arguments'])

url = arguments.get('url')

# Call the open_website function with the extracted url

html = open_website(url)

Lastly, we need to provide the tool execution result to the model. Consider the possibility that the model can decide more function calls are needed. Then, the model will generate another function call request, or generate a response to the user based on the function call results.

python

# Create a message containing the result of the function call

function_call_result_message = {
    "role": "tool",
    "content": "<html>...<html>",
    "tool_call_id": response['choices'][0]['message']['tool_calls'][0]['id']
}

# Prepare the chat completion call payload

messages = [
    {"role": "system", "content": "You are a helpful webpage navigation assistant. Use the supplied tools to assist the user."},
    {"role": "user", "content": "Hi, can you go to the career page of xAI website?"}
    response['choices'][0]['message'],
    function_call_result_message
]

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=tools,
)

By default, the model will automatically decide whether a function call is necessary and select which functions to call, as determined by the tool_choice: "auto" setting.

We offer three ways to customize the default behavior:

  1. To force the model to always call one or more functions, you can set tool_choice: "required". The model will then always call function. Note this could force the model to hallucinate parameters.
  2. To force the model to call a specific function, you can set tool_choice: {"type": "function", "function": {"name": "my_function"}}.
  3. To disable function calling and force the model to only generate a user-facing message, you can either provide no tools, or set tool_choice: "none".

The vision model has the capability to receive both text and image inputs. For example, you can load a local image, encode it in Base64 and use it as part of your conversation:

python

import os
from openai import OpenAI
import os
import base64

MODEL_NAME = "grok-2v-mini"
XAI_API_KEY = os.getenv("XAI_API_KEY")
image_path = "..."

client = OpenAI(
    api_key=XAI_API_KEY,
    base_url="https://api.x.ai/v1",
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
    return encoded_string

# Getting the base64 string

base64_image = encode_image(image_path)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "high",
                },
            },
            {
                "type": "text",
                "text": "What is on this image ?",
            },
        ],
    },
]

stream = client.chat.completions.create(
    model="grok-2v-mini",
    messages=messages,
    stream=True,
    temperature=0.01,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="", flush=True)

print()

The model also support web URL as input for images. It then fetch the image from the public URL and handle it as part of the chat. It gives you something as simple as:

python

import os
from openai import OpenAI

MODEL_NAME = "grok-2v-mini"
XAI_API_KEY = os.getenv("XAI_API_KEY")
image_url = "..."

client = OpenAI(
    api_key=XAI_API_KEY,
    base_url="https://api.x.ai/v1",
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                    "detail": "high",
                },
            },
            {
                "type": "text",
                "text": "What's funny about this image?",
            },
        ],
    },
]

stream = client.chat.completions.create(
    model="grok-2v-mini",
    messages=messages,
    stream=True,
    temperature=0.01,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="", flush=True)

print()