Skip to main content

Overview

OpenAIResponsesLLMService provides chat completion capabilities using OpenAI’s Responses API, supporting streaming text responses, function calling, usage metrics, and out-of-band inference. This service works with the universal LLMContext and LLMContextAggregatorPair.
The Responses API is a newer OpenAI API designed for conversational AI applications. It differs from the Chat Completions API in its request/response structure and streaming format. See OpenAI Responses API documentation for more details.

OpenAI Responses API Reference

Pipecat’s API methods for OpenAI Responses integration

Example Implementation

Interruptible conversation example

OpenAI Documentation

Official OpenAI Responses API documentation

OpenAI Platform

Access models and manage API keys

Installation

To use OpenAI services, install the required dependencies:
pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI Responses LLM services, you need:
  1. OpenAI Account: Sign up at OpenAI Platform
  2. API Key: Generate an API key from your account dashboard
  3. Model Selection: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
  4. Usage Limits: Set up billing and usage limits as needed

Required Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key for authentication

Configuration

api_key
str
default:"None"
OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.
base_url
str
default:"None"
Custom base URL for the OpenAI API. Override for proxied or self-hosted deployments.
organization
str
default:"None"
OpenAI organization ID.
project
str
default:"None"
OpenAI project ID.
default_headers
Mapping[str, str]
default:"None"
Additional HTTP headers to include in every request.
service_tier
str
default:"None"
Service tier to use (e.g., “auto”, “flex”, “priority”).
settings
OpenAIResponsesLLMService.Settings
default:"None"
Runtime-configurable model settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using OpenAIResponsesLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"gpt-4.1"OpenAI model identifier. (Inherited from base settings.)
system_instructionstrNoneSystem instruction/prompt for the model. (Inherited from base settings.)
temperaturefloatNOT_GIVENSampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
top_pfloatNOT_GIVENTop-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
frequency_penaltyfloatNonePenalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition.
presence_penaltyfloatNonePenalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics.
seedintNoneRandom seed for deterministic outputs.
max_completion_tokensintNOT_GIVENMaximum completion tokens to generate.
NOT_GIVEN values are omitted from the API request entirely, letting the OpenAI API use its own defaults. This is different from None, which would be sent explicitly.

Usage

Basic Setup

from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMService.Settings(
        model="gpt-4.1",
        system_instruction="You are a helpful assistant.",
    ),
)

With Custom Settings

from pipecat.services.openai.responses.llm import (
    OpenAIResponsesLLMService,
    OpenAIResponsesLLMSettings,
)

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMSettings(
        model="gpt-4.1",
        temperature=0.7,
        max_completion_tokens=1000,
        frequency_penalty=0.5,
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=OpenAIResponsesLLMSettings(
            temperature=0.3,
            max_completion_tokens=500,
        )
    )
)

Out-of-Band Inference

Run a one-shot inference without pushing frames through the pipeline:
from pipecat.processors.aggregators.llm_context import LLMContext

context = LLMContext()
context.add_user_message("What is the capital of France?")

response = await llm.run_inference(
    context=context,
    max_tokens=100,
    system_instruction="You are a helpful geography assistant.",
)
print(response)  # "The capital of France is Paris."

Notes

  • Responses API vs Chat Completions API: The Responses API has a different request/response structure compared to the Chat Completions API. Use OpenAILLMService for the Chat Completions API and OpenAIResponsesLLMService for the Responses API.
  • Universal LLM Context: This service works with the universal LLMContext and LLMContextAggregatorPair, making it easy to switch between different LLM providers.
  • Function calling: Supports OpenAI’s tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
  • Usage metrics: Automatically tracks token usage, including cached tokens and reasoning tokens.
  • Service tiers: Supports OpenAI’s service tier system for prioritizing requests.

Event Handlers

OpenAIResponsesLLMService supports the following event handlers, inherited from LLMService:
EventDescription
on_completion_timeoutCalled when an LLM completion request times out
on_function_calls_startedCalled when function calls are received and execution is about to start
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")