Skip to content

Agent.default.tts_node hangs indefinitely when LLM generates only tool calls (empty text stream) #4480

@Bloooskiii

Description

@Bloooskiii

Bug Description

When an LLM generates only a tool call without any text output, usually happening in agent handoffs, the default tts_node implementation hangs indefinitely waiting for TTS provider events that never arrive. This causes the entire voice pipeline to freeze for the duration of the call. The user would be speaking, but nothing is being transcribed, and the user can't interrupt that state. We've seen 1 + minute hangs in the calls in production. This has been really hard to reproduce. I can send the session logs privately.

I was digging around and the issue might be in Agent.default.tts_node (agent.py):

async def tts_node(...) -> AsyncGenerator[rtc.AudioFrame, None]:
    async with wrapped_tts.stream(conn_options=conn_options) as stream:
        async def _forward_input() -> None:
            async for chunk in text:  # Empty stream = completes immediately
                stream.push_text(chunk)
            stream.end_input()

        forward_task = asyncio.create_task(_forward_input())
        try:
            async for ev in stream:  # ← HANGS HERE
                yield ev.frame
        finally:
            await utils.aio.cancel_and_wait(forward_task)

When text is empty:

  1. _forward_input() completes immediately (no chunks to iterate)
  2. stream.end_input() is called with no text having been pushed
  3. async for ev in stream: waits for TTS provider to emit events
  4. TTS provider receives empty input + end signal but doesn't emit any events or close the stream
  5. Hang - the async generator never yields and never completes

Why User Interruption Didn't Work

The hang becomes unrecoverable during agent handoffs due to the scheduling pause mechanism:

  1. Tool executes and returns a new agent (triggering handoff)
  2. update_agent()drain() is called on the OLD activity
  3. drain() sets _scheduling_paused = True (agent_activity.py:630)
  4. drain() then waits for the scheduling task to complete
  5. Scheduling task waits for all _speech_tasks to finish (agent_activity.py:1073-1076)
  6. Speech task is blocked on TTS (which is hung waiting for empty stream)

Meanwhile, user speech is discarded:

When _scheduling_paused = True, user input events are skipped before they can trigger an interrupt (agent_activity.py:1365-1370):

if self._scheduling_paused:
    logger.warning("skipping user input, speech scheduling is paused", ...)
    return  # ← User speech event discarded, _interrupt_fut never set

Since the interrupt signal is never triggered, wait_if_not_interrupted([*tasks]) has nothing to race against - it just waits for tasks that will never complete.

Expected Behavior

The Agent shouldn't hang in such scenario.

Reproduction Steps

Was really hard to reproduce. Happening to a very small amount of prod calls.

Operating System

Ubuntu

Models Used

ElevenLabs Flash V2.5, but might be happening to other providers

Package Versions

livekit==1.0.23
livekit-agents==1.3.10

Session/Room/Call IDs

sessionID: RM_sUFUPn4LRHeX
sessionID: RM_VHuGVDcqvtrJ

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions