Overview
OpenAIResponsesLLMService provides chat completion capabilities using OpenAI’s Responses API, supporting streaming text responses, function calling, usage metrics, and out-of-band inference. This service works with the universal LLMContext and LLMContextAggregatorPair.
The Responses API is a newer OpenAI API designed for conversational AI applications. It differs from the Chat Completions API in its request/response structure and streaming format. See OpenAI Responses API documentation for more details.
OpenAI Responses API Reference
Pipecat’s API methods for OpenAI Responses integration
Example Implementation
Interruptible conversation example
OpenAI Documentation
Official OpenAI Responses API documentation
OpenAI Platform
Access models and manage API keys
Installation
To use OpenAI services, install the required dependencies:Prerequisites
OpenAI Account Setup
Before using OpenAI Responses LLM services, you need:- OpenAI Account: Sign up at OpenAI Platform
- API Key: Generate an API key from your account dashboard
- Model Selection: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
- Usage Limits: Set up billing and usage limits as needed
Required Environment Variables
OPENAI_API_KEY: Your OpenAI API key for authentication
Configuration
OpenAI API key. If
None, uses the OPENAI_API_KEY environment variable.Custom base URL for the OpenAI API. Override for proxied or self-hosted
deployments.
OpenAI organization ID.
OpenAI project ID.
Additional HTTP headers to include in every request.
Service tier to use (e.g., “auto”, “flex”, “priority”).
Runtime-configurable model settings. See Settings below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using OpenAIResponsesLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "gpt-4.1" | OpenAI model identifier. (Inherited from base settings.) |
system_instruction | str | None | System instruction/prompt for the model. (Inherited from base settings.) |
temperature | float | NOT_GIVEN | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
top_p | float | NOT_GIVEN | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
frequency_penalty | float | None | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. |
presence_penalty | float | None | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
seed | int | None | Random seed for deterministic outputs. |
max_completion_tokens | int | NOT_GIVEN | Maximum completion tokens to generate. |
NOT_GIVEN values are omitted from the API request entirely, letting the
OpenAI API use its own defaults. This is different from None, which would be
sent explicitly.Usage
Basic Setup
With Custom Settings
Updating Settings at Runtime
Model settings can be changed mid-conversation usingLLMUpdateSettingsFrame:
Out-of-Band Inference
Run a one-shot inference without pushing frames through the pipeline:Notes
- Responses API vs Chat Completions API: The Responses API has a different request/response structure compared to the Chat Completions API. Use
OpenAILLMServicefor the Chat Completions API andOpenAIResponsesLLMServicefor the Responses API. - Universal LLM Context: This service works with the universal
LLMContextandLLMContextAggregatorPair, making it easy to switch between different LLM providers. - Function calling: Supports OpenAI’s tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
- Usage metrics: Automatically tracks token usage, including cached tokens and reasoning tokens.
- Service tiers: Supports OpenAI’s service tier system for prioritizing requests.
Event Handlers
OpenAIResponsesLLMService supports the following event handlers, inherited from LLMService:
| Event | Description |
|---|---|
on_completion_timeout | Called when an LLM completion request times out |
on_function_calls_started | Called when function calls are received and execution is about to start |