Daily Note: Autonomous Turn-Taking Behavior for Robotic Dialogue Systems

These notes are a summary of concepts presented in “Improving a Robot’s Turn-Taking Behavior in Dynamic Multiparty Interactions.”

Maike Paetzel-Prüsmann and James Kennedy. 2023. Improving a Robot’s Turn-Taking Behavior in Dynamic Multiparty Interactions. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’23 Companion), March 13–16, 2023, Stockholm, Sweden. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3568294.3580117

  1. Challenges in Current Robotic Dialogue Systems
    • Lack of fundamental conversational behaviors in one-on-one and multi-party settings
    • Reliance on rule-based solutions or human-controlled systems limiting natural interaction
  2. Design Goals for Robust Autonomous Turn-Taking
    • Engage dynamically changing groups with adaptive conversational strategies
    • Consideration made to
      • Robot’s conversational goals and urgency
      • Norms of spoken conversation
      • Other participants’ goals and priorities
  3. Key Behavioral Decisions
    • Turn-taking dynamics
      • Wait for a participant to start or continue speaking
      • Use silence to take the floor
      • Interrupt when robot’s urgency is high
      • Abandon speech if human input takes precedence
    • Meta-conversations
      • Handle unique modes of interaction addressing the robot’s behavior
  4. Core Capabilities of the Dialogue System
    • Turn management
      • Identify turn-holding and turn-yielding cues
      • Recognize transitions between human participants
    • Addressing interruptions
      • Detect if human comments are directed to the robot
      • Prioritize content based on conversational goals
      • Resolve uncertain addressee situations with clarification or continuation
  5. Scenarios Requiring Special Consideration
    • Non-actionable inputs
      • Self-echo (technical artifact)
      • Background noise (initially ignored, commented on if persistent)
    • Backchannels and minor noises
      • Examples: “Oh,” coughing, throat clearing
      • No significant behavioral changes required
  6. System Capabilities for Effective Interaction
    • Predict continuity of the same speaker
    • Detect and interpret turn-yielding cues
    • Identify conversational urgency and importance
    • Filter irrelevant inputs such as noise and hallucinations
    • Use lexical, prosodic, and acoustic features (e.g., volume, pitch, speed)
    • Integrate classifiers for turn-holding and context-specific behaviors