OpenAI Responses

Overview

OpenAIResponsesLLMService provides chat completion capabilities using OpenAI’s Responses API, supporting streaming text responses, function calling, usage metrics, and out-of-band inference. This service works with the universal LLMContext and LLMContextAggregatorPair.

The Responses API is a newer OpenAI API designed for conversational AI applications. It differs from the Chat Completions API in its request/response structure and streaming format. See OpenAI Responses API documentation for more details.

OpenAI Responses API Reference

Pipecat’s API methods for OpenAI Responses integration

Example Implementation

Interruptible conversation example

OpenAI Documentation

Official OpenAI Responses API documentation

OpenAI Platform

Access models and manage API keys

Installation

To use OpenAI services, install the required dependencies:

pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI Responses LLM services, you need:

OpenAI Account: Sign up at OpenAI Platform
API Key: Generate an API key from your account dashboard
Model Selection: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
Usage Limits: Set up billing and usage limits as needed

Required Environment Variables

OPENAI_API_KEY: Your OpenAI API key for authentication

Configuration

api_key

str

default:"None"

OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.

base_url

str

default:"None"

Custom base URL for the OpenAI API. Override for proxied or self-hosted deployments.

organization

str

default:"None"

OpenAI organization ID.

project

str

default:"None"

OpenAI project ID.

default_headers

Mapping[str, str]

default:"None"

Additional HTTP headers to include in every request.

service_tier

str

default:"None"

Service tier to use (e.g., “auto”, “flex”, “priority”).

settings

OpenAIResponsesLLMService.Settings

default:"None"

Runtime-configurable model settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using OpenAIResponsesLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"gpt-4.1"`	OpenAI model identifier. (Inherited from base settings.)
`system_instruction`	`str`	`None`	System instruction/prompt for the model. (Inherited from base settings.)
`temperature`	`float`	`NOT_GIVEN`	Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
`top_p`	`float`	`NOT_GIVEN`	Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
`frequency_penalty`	`float`	`None`	Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition.
`presence_penalty`	`float`	`None`	Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics.
`seed`	`int`	`None`	Random seed for deterministic outputs.
`max_completion_tokens`	`int`	`NOT_GIVEN`	Maximum completion tokens to generate.

NOT_GIVEN values are omitted from the API request entirely, letting the OpenAI API use its own defaults. This is different from None, which would be sent explicitly.

Usage

Basic Setup

from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMService.Settings(
        model="gpt-4.1",
        system_instruction="You are a helpful assistant.",
    ),
)

With Custom Settings

from pipecat.services.openai.responses.llm import (
    OpenAIResponsesLLMService,
    OpenAIResponsesLLMSettings,
)

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMSettings(
        model="gpt-4.1",
        temperature=0.7,
        max_completion_tokens=1000,
        frequency_penalty=0.5,
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:

from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=OpenAIResponsesLLMSettings(
            temperature=0.3,
            max_completion_tokens=500,
        )
    )
)

Out-of-Band Inference

Run a one-shot inference without pushing frames through the pipeline:

from pipecat.processors.aggregators.llm_context import LLMContext

context = LLMContext()
context.add_user_message("What is the capital of France?")

response = await llm.run_inference(
    context=context,
    max_tokens=100,
    system_instruction="You are a helpful geography assistant.",
)
print(response)  # "The capital of France is Paris."

Notes

Responses API vs Chat Completions API: The Responses API has a different request/response structure compared to the Chat Completions API. Use OpenAILLMService for the Chat Completions API and OpenAIResponsesLLMService for the Responses API.
Universal LLM Context: This service works with the universal LLMContext and LLMContextAggregatorPair, making it easy to switch between different LLM providers.
Function calling: Supports OpenAI’s tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
Usage metrics: Automatically tracks token usage, including cached tokens and reasoning tokens.
Service tiers: Supports OpenAI’s service tier system for prioritizing requests.

Event Handlers

OpenAIResponsesLLMService supports the following event handlers, inherited from LLMService:

Event	Description
`on_completion_timeout`	Called when an LLM completion request times out
`on_function_calls_started`	Called when function calls are received and execution is about to start

@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")

API Reference

Services

Utilities

Frameworks

Pipeline

OpenAI Responses

Overview

OpenAI Responses API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Settings

Updating Settings at Runtime

Out-of-Band Inference

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

OpenAI Responses API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

​Installation

​Prerequisites

​OpenAI Account Setup

​Required Environment Variables

​Configuration

​Settings

​Usage

​Basic Setup

​With Custom Settings

​Updating Settings at Runtime

​Out-of-Band Inference

​Notes

​Event Handlers

Overview

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Settings

Updating Settings at Runtime

Out-of-Band Inference

Notes

Event Handlers