Access xAI’s frontier reasoning and agentic models through Azure AI Foundry with enterprise-grade security, governance, and unified billing.

This guide walks through setting up and using Grok models on Microsoft Foundry. Grok models on Foundry give you strong reasoning, native tool use, enterprise authentication through Microsoft Entra ID, Azure-native monitoring, and an OpenAI-compatible API.

Usage is billed through the Azure Marketplace / your Azure subscription. Grok models are delivered through the xAI–Microsoft partnership with Azure-managed endpoints and optional Azure AI Content Safety layers. Review the specific model card in the Foundry catalog for the latest details on data processing, retention, and terms.

Grok on Foundry works with the official OpenAI Python/TypeScript SDKs, azure-ai-projects, LangChain, Semantic Kernel, LlamaIndex, and most OpenAI-compatible frameworks. Streaming, tool calling, and structured outputs are supported.

Prerequisites

Before you begin, ensure you have:

An active Azure subscription.
Access to Azure AI Foundry.
Sufficient permissions to create or manage Foundry resources/projects and deploy models, typically Contributor or custom roles with model deployment rights.
Optional but recommended: the Azure CLI installed for resource management and authentication testing.
Python 3.10+ for the examples in this guide.

Install required packages

Bash

pip install -U openai azure-identity

Optional, for higher-level project client patterns:

Bash

pip install azure-ai-projects

Provisioning

Foundry organizes work into resources for security, billing, and networking, and projects for deployments and collaboration. Create a resource/project first, then deploy one or more Grok model instances inside it.

The deployment name you choose becomes the value passed in the model parameter of your API requests.

Create or select a Foundry resource and project

Navigate to the Foundry portal.
Create a new Foundry resource, or select an existing one.
Within the resource, create a new project if your workflow uses projects.
Configure access management:
- Use Microsoft Entra ID with role-based access control (RBAC).
- Assign the Cognitive Services OpenAI User role, or equivalent, to identities that will call models.
- Optionally configure private networking through Azure Virtual Network.
Note your resource name and project name for later.

The resulting endpoint base will be:

Text

https://{resource-name}.services.ai.azure.com/api/projects/{project-name}/openai/v1

Deploy a Grok model

In the Foundry portal, go to your resource or project, then Models + endpoints.
Click + Deploy model → Deploy base model, or browse the Model catalog directly and search for “Grok”.
Browse or search the catalog for the desired Grok model, for example, grok-4.3.
Review the model card for capabilities, context window, tool calling support, safety evaluations, pricing, and deployment options.
Click Deploy.
Configure deployment settings:
- Deployment name: Choose a clear, stable name, such as grok-4.3. This name cannot be changed after creation and is the value you use in the model parameter.
- Deployment type / SKU: Select Serverless for pay-as-you-go workloads, or Provisioned Throughput Units (PTU) for predictable high-volume performance.
Review and select Deploy. Wait for the deployment to reach Ready / Running status.

Once deployed, you can test in the built-in Playground, view generated code snippets, manage keys/endpoints if API key auth is enabled, and monitor usage and metrics.

Authentication

Grok on Foundry uses Azure-native authentication. The recommended approach is Microsoft Entra ID (keyless) with DefaultAzureCredential. API keys from the portal may also be supported depending on your resource configuration.

All requests go to your Foundry project’s OpenAI-compatible endpoint:

Text

https://{resource-name}.services.ai.azure.com/api/projects/{project-name}/openai/v1

Recommended: Entra ID authentication

Use azure.identity and get_bearer_token_provider. This enables seamless RBAC, managed identities, and avoids secret management.

Python (OpenAI)

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import OpenAI

project_endpoint = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/api/projects/YOUR_PROJECT_NAME"
base_url = project_endpoint.rstrip("/") + "/openai/v1"

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(
    credential,
    "https://ai.azure.com/.default",
)

client = OpenAI(
    base_url=base_url,
    api_key=token_provider,
)

response = client.responses.create(
    model="grok-4.3",  # your deployment name
    input="Explain the significance of Grok's tool-calling capabilities for building reliable agents. Be concise but insightful.",
    max_output_tokens=800,
)

print(response.output_text)

Important:

Assign the Cognitive Services OpenAI User, or appropriate role, to the identity running this code.
DefaultAzureCredential handles local development through Azure CLI / VS Code, managed identities, service principals, and other supported flows.
No api-version query parameter is needed; the /openai/v1 path handles compatibility.

Alternative: API key authentication

If your Foundry resource exposes keys under Keys and Endpoint, copy a primary or secondary key and use it directly as the api_key:

Python (OpenAI)

client = OpenAI(
    base_url=base_url,
    api_key="your-foundry-api-key-here",
)

Prefer Entra ID + RBAC in production. Never commit keys to source control, and rotate keys regularly.

Make your first API call

Simple reasoning call

Python (OpenAI)

response = client.responses.create(
    model="grok-4.3",
    input="Walk through the first-principles reasoning to determine why reusable rockets dramatically reduce the cost of space access.",
    max_output_tokens=1500,
)
print(response.output_text)

Tool calling example

Grok excels at tool use. Here is a pattern for parallel tool calling:

Python (OpenAI)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for recent information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                },
                "required": ["query"],
            },
        },
    },
]

response = client.responses.create(
    model="grok-4.3",
    input="What's the weather like in Palo Alto right now and any major tech news from today?",
    tools=tools,
    # parallel_tool_calls=True,  # enable if supported in your deployment
    max_output_tokens=1000,
)

print(response)

In a real agent loop, execute the tool calls and continue the conversation with the tool results.

Streaming response

Python (OpenAI)

stream = client.responses.create(
    model="grok-4.3",
    input="Write a short, helpful onboarding guide for a new engineer joining xAI.",
    max_output_tokens=600,
    stream=True,
)

for chunk in stream:
    if hasattr(chunk, "choices") and chunk.choices:
        delta = chunk.choices[0].delta
        if hasattr(delta, "content") and delta.content:
            print(delta.content, end="", flush=True)

Start with the Playground in the Foundry portal for rapid prompt iteration, then move to code.

Correlation IDs and debugging

Foundry includes standard Azure request identifiers in response headers, such as request-id, apim-request-id, and x-ms-request-id. When contacting Microsoft or xAI support, include these IDs with your deployment name and approximate timestamp.

Feature support and capabilities

Capability	Notes
Reasoning	Strong first-principles reasoning. “Think mode” style prompting works well.
Tool / function calling	Native support for reliable agentic workflows.
Structured outputs / JSON mode	Supported. Ask for `response_format`, or instruct JSON explicitly.
Streaming	Supported for low-latency user experiences.
Long context	Check the specific model card for current context windows.
Code generation	Strong performance for code generation and editing tasks.

Safety and responsible AI

Grok models include xAI’s safety training and alignment. On Foundry, Azure AI Content Safety is available and often enabled by default or easily integrated.

Before production deployment:

Review the full model card and safety benchmark tab in the Foundry catalog.
Use clear system prompts that define safety boundaries and desired behavior.
Implement Azure Content Safety filters for input/output where appropriate.
Conduct your own red-teaming and evaluations.
Monitor usage and any required mitigations.

Limitations

Feature parity with the direct xAI API at api.x.ai may differ slightly, especially for the latest experimental features.
Validate vision/multimodal support and exact parameter availability for your chosen model/deployment.
Rate limits and quotas are managed at the Azure resource level.

For the authoritative list of supported parameters and behaviors, consult the model card inside Azure AI Foundry and xAI Grok documentation linked from the catalog.

Best practices for production

Model selection

Use full Grok reasoning models for maximum reasoning depth and capability.
Use balanced reasoning settings for simpler tasks.
Choose the deployment settings that match your latency, throughput, and cost requirements.

Prompting Grok effectively

Encourage step-by-step reasoning when needed.
Specify the desired output format.
Use clear tool schemas.

Cost management

Monitor spend in Azure Cost Management + Billing.
Use serverless for spiky or experimental workloads; use PTU for steady high throughput.
Right-size the model and deployment type for your expected traffic pattern.

Security and compliance

Prefer Entra ID + RBAC over long-lived keys.
Use private endpoints / VNet injection where required.
Log requests with correlation IDs for auditability.

Observability

Integrate Azure Monitor, Application Insights, or Log Analytics.
Track token usage, latency, and error rates per deployment.

Troubleshooting

Issue	What to check
401 Unauthorized	Missing or incorrect Entra role; wrong token scope; check the `DefaultAzureCredential` chain.
404 Not Found / model not found	Wrong deployment name; it must match exactly what you created in the portal.
Deployment stuck in “Running”	Check region quotas, resource health, portal notifications, or try redeploying.
Slow responses or high latency	Consider Provisioned Throughput. Check the network path to the Azure region.
Tool calls not executing as expected	Verify the tool schema and whether parallel tool calling is enabled/supported for the deployment.
Content filtered / blocked	Review Azure Content Safety configuration and your system prompt. Adjust safety thresholds if needed.

Next steps

Use the Playground inside your Foundry project.
Combine Grok with Azure AI Agent Service or popular frameworks such as LangChain, Semantic Kernel, and CrewAI.
Add retrieval, memory, and orchestration layers for production systems.
Use Foundry tracing and your internal eval harness to evaluate and improve behavior.
When migrating from the direct xAI API, update authentication and endpoint configuration. Most prompts and tool schemas transfer with minimal changes.

Last updated: June 26, 2026