Skip to main content

Overview

User turn strategies provide fine-grained control over how user speaking turns are detected in conversations. They determine when a user’s turn starts (user begins speaking) and when it stops (user finishes speaking and expects a response). By default, Pipecat uses a combination of VAD (Voice Activity Detection) and AI-powered turn detection:
  • Start: VAD detection or transcription received
  • Stop: AI-powered turn detection using LocalSmartTurnAnalyzerV3
You can customize this behavior by providing your own strategies for more sophisticated turn detection, such as requiring a minimum number of words before triggering a turn, or using AI-powered turn detection models.

How It Works

  1. Turn Start Detection: When any start strategy triggers, the user aggregator:
    • Marks the start of a user turn
    • Optionally emits UserStartedSpeakingFrame
    • Optionally emits an interruption frame (if the bot is speaking)
  2. During User Turn: The aggregator collects transcriptions and audio frames.
  3. Turn Stop Detection: When a stop strategy triggers, the user aggregator:
    • Marks the end of the user turn
    • Emits UserStoppedSpeakingFrame
    • Pushes the aggregated user message to the LLM context
  4. Timeout Handling: If no stop strategy triggers within user_turn_stop_timeout seconds (default: 5.0), the turn is automatically ended.

Configuration

User turn strategies are configured via LLMUserAggregatorParams when creating an LLMContextAggregatorPair:
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies

context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            start=[...],  # List of start strategies
            stop=[...],   # List of stop strategies
        ),
    ),
)

Start Strategies

Start strategies determine when a user’s turn begins. Multiple strategies can be provided, and the first one to trigger will signal the start of a user turn.

Base Parameters

All start strategies inherit these parameters:
enable_interruptions
bool
default:"True"
If True, the user aggregator will emit an interruption frame when the user turn starts, allowing the user to interrupt the bot.
enable_user_speaking_frames
bool
default:"True"
If True, the user aggregator will emit frames indicating when the user starts speaking. Disable this if another component (e.g., an STT service) already generates these frames.

VADUserTurnStartStrategy

Triggers a user turn start based on Voice Activity Detection. This is the most responsive strategy, detecting speech as soon as the VAD indicates the user has started speaking.
from pipecat.turns.user_start import VADUserTurnStartStrategy

strategy = VADUserTurnStartStrategy()

TranscriptionUserTurnStartStrategy

Triggers a user turn start when a transcription is received. This serves as a fallback for scenarios where VAD-based detection fails (e.g., when the user speaks very softly) but the STT service still produces transcriptions.
use_interim
bool
default:"True"
Whether to trigger on interim (partial) transcription frames for earlier detection.
from pipecat.turns.user_start import TranscriptionUserTurnStartStrategy

strategy = TranscriptionUserTurnStartStrategy(use_interim=True)

MinWordsUserTurnStartStrategy

Requires the user to speak a minimum number of words before triggering a turn start. This is useful for preventing brief utterances like “okay” or “yeah” from triggering responses.
min_words
int
required
Minimum number of spoken words required to trigger the start of a user turn.
use_interim
bool
default:"True"
Whether to consider interim transcription frames for earlier detection.
from pipecat.turns.user_start import MinWordsUserTurnStartStrategy

# Require at least 3 words to start a turn
strategy = MinWordsUserTurnStartStrategy(min_words=3)
When the bot is not speaking, this strategy will trigger after just 1 word. The min_words threshold only applies when the bot is actively speaking, preventing short affirmations from interrupting the bot.

WakePhraseUserTurnStartStrategy

Requires a wake phrase to be detected before allowing interaction. This strategy blocks subsequent strategies until a wake phrase is detected in a transcription, then allows interaction for a configurable timeout period.
phrases
List[str]
required
List of wake phrases to detect (e.g., ["hey pipecat", "ok pipecat"]).
timeout
float
default:"10.0"
Inactivity timeout in seconds before returning to IDLE state. In timeout mode, the timer resets on activity. In single activation mode, acts as a keepalive window after wake phrase detection.
single_activation
bool
default:"False"
If True, the wake phrase is required before every turn. The strategy returns to IDLE after each turn completes.
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy

# Timeout mode: wake phrase unlocks interaction for 10 seconds
strategy = WakePhraseUserTurnStartStrategy(
    phrases=["hey pipecat", "ok pipecat"],
    timeout=10.0,
)

# Single activation: wake phrase required before every turn
strategy = WakePhraseUserTurnStartStrategy(
    phrases=["hey pipecat"],
    single_activation=True,
)
Event Handlers The strategy provides event handlers for wake phrase detection:
EventSignatureDescription
on_wake_phrase_detectedasync def handler(strategy, phrase: str)Called when a wake phrase is matched
on_wake_phrase_timeoutasync def handler(strategy)Called when the inactivity timeout expires (timeout mode only)
@strategy.event_handler("on_wake_phrase_detected")
async def on_wake_phrase_detected(strategy, phrase):
    print(f"Wake phrase detected: {phrase}")

@strategy.event_handler("on_wake_phrase_timeout")
async def on_wake_phrase_timeout(strategy):
    print("Wake phrase timeout, returning to IDLE")
This strategy should be placed first in the start strategies list to properly gate all subsequent strategies. Use default_user_turn_start_strategies() to extend the defaults with wake phrase detection.

ExternalUserTurnStartStrategy

Delegates turn start detection to an external processor. This strategy listens for UserStartedSpeakingFrame frames emitted by other components in the pipeline (such as speech-to-speech services).
from pipecat.turns.user_start import ExternalUserTurnStartStrategy

strategy = ExternalUserTurnStartStrategy()
This strategy automatically sets enable_interruptions=False and enable_user_speaking_frames=False since these are expected to be handled by the external processor.

Stop Strategies

Stop strategies determine when a user’s turn ends and the bot should respond.

Base Parameters

All stop strategies inherit these parameters:
enable_user_speaking_frames
bool
default:"True"
If True, the aggregator will emit frames indicating when the user stops speaking. Disable this if another component already generates these frames.

SpeechTimeoutUserTurnStopStrategy

Signals the end of a user turn when transcription is received and VAD indicates silence. Waits for a configurable timeout after VAD detects silence before finalizing the turn, and supports finalized transcripts for earlier triggering.
user_speech_timeout
float
default:"0.6"
How long to wait (in seconds) after VAD detects silence before finalizing the user turn.
from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy

strategy = SpeechTimeoutUserTurnStopStrategy(user_speech_timeout=0.6)

TurnAnalyzerUserTurnStopStrategy

Uses an AI-powered turn detection model to determine when the user has finished speaking. This provides more intelligent end-of-turn detection that can understand conversational context.
turn_analyzer
BaseTurnAnalyzer
required
The turn detection analyzer instance to use for end-of-turn detection.
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy

strategy = TurnAnalyzerUserTurnStopStrategy(
    turn_analyzer=LocalSmartTurnAnalyzerV3()
)
See the Smart Turn Detection documentation for more information on available turn analyzers.

ExternalUserTurnStopStrategy

Delegates turn stop detection to an external processor. This strategy listens for UserStoppedSpeakingFrame frames emitted by other components in the pipeline.
timeout
float
default:"0.5"
A short delay in seconds used to handle consecutive or slightly delayed transcriptions.
from pipecat.turns.user_stop import ExternalUserTurnStopStrategy

strategy = ExternalUserTurnStopStrategy()

Helper Functions

Pipecat provides helper functions to compose custom strategy lists that extend the defaults.

default_user_turn_start_strategies()

Returns the default user turn start strategies: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]. Useful when building a custom strategy list that extends the defaults, such as adding wake phrase detection before the standard strategies.
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
from pipecat.turns.user_turn_strategies import default_user_turn_start_strategies

# Add wake phrase detection before the defaults
start_strategies = [
    WakePhraseUserTurnStartStrategy(phrases=["hey pipecat"]),
    *default_user_turn_start_strategies(),
]

default_user_turn_stop_strategies()

Returns the default user turn stop strategies: [TurnAnalyzerUserTurnStopStrategy(LocalSmartTurnAnalyzerV3)]. Useful when building a custom strategy list that extends or replaces the defaults.
from pipecat.turns.user_turn_strategies import default_user_turn_stop_strategies

# Use the defaults
stop_strategies = default_user_turn_stop_strategies()

UserTurnStrategies

Container for configuring user turn start and stop strategies.
start
List[BaseUserTurnStartStrategy]
default:"[VADUser...(), TranscriptionUser...()]"
List of strategies used to detect when the user starts speaking. The first strategy to trigger will signal the start of the user’s turn.
stop
List[BaseUserTurnStopStrategy]
List of strategies used to detect when the user stops speaking and expects a response. Defaults to AI-powered turn detection using LocalSmartTurnAnalyzerV3.

ExternalUserTurnStrategies

A convenience class that preconfigures UserTurnStrategies with external strategies for both start and stop detection. Use this when an external processor (such as a speech-to-speech service) controls turn management.
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=ExternalUserTurnStrategies(),
    ),
)

Usage Examples

Default Behavior

The default configuration uses VAD for turn start detection and AI-powered Smart Turn for turn end detection:
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_turn_strategies import UserTurnStrategies

# This is equivalent to the default behavior
strategies = UserTurnStrategies(
    start=[VADUserTurnStartStrategy(), TranscriptionUserTurnStartStrategy()],
    stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())],
)

Minimum Words for Interruption

Require users to speak at least 3 words before they can interrupt the bot:
from pipecat.turns.user_start import MinWordsUserTurnStartStrategy
from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            start=[MinWordsUserTurnStartStrategy(min_words=3)],
            stop=[SpeechTimeoutUserTurnStopStrategy()],
        ),
    ),
)

Wake Phrase Detection

Require a wake phrase before allowing interaction, then use the default turn strategies:
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
from pipecat.turns.user_turn_strategies import (
    UserTurnStrategies,
    default_user_turn_start_strategies,
)

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            start=[
                WakePhraseUserTurnStartStrategy(phrases=["hey pipecat"]),
                *default_user_turn_start_strategies(),
            ],
        ),
    ),
)

Local Smart Turn Detection

Use a local turn detection model instead of a cloud service:
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            stop=[
                TurnAnalyzerUserTurnStopStrategy(
                    turn_analyzer=LocalSmartTurnAnalyzerV3()
                )
            ]
        ),
    ),
)