These notes are a summary of concepts presented in “Improving a Robot’s Turn-Taking Behavior in Dynamic Multiparty Interactions.”
Maike Paetzel-Prüsmann and James Kennedy. 2023. Improving a Robot’s Turn-Taking Behavior in Dynamic Multiparty Interactions. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’23 Companion), March 13–16, 2023, Stockholm, Sweden. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3568294.3580117
- Challenges in Current Robotic Dialogue Systems
- Lack of fundamental conversational behaviors in one-on-one and multi-party settings
- Reliance on rule-based solutions or human-controlled systems limiting natural interaction
- Design Goals for Robust Autonomous Turn-Taking
- Engage dynamically changing groups with adaptive conversational strategies
- Consideration made to
- Robot’s conversational goals and urgency
- Norms of spoken conversation
- Other participants’ goals and priorities
- Key Behavioral Decisions
- Turn-taking dynamics
- Wait for a participant to start or continue speaking
- Use silence to take the floor
- Interrupt when robot’s urgency is high
- Abandon speech if human input takes precedence
- Meta-conversations
- Handle unique modes of interaction addressing the robot’s behavior
- Turn-taking dynamics
- Core Capabilities of the Dialogue System
- Turn management
- Identify turn-holding and turn-yielding cues
- Recognize transitions between human participants
- Addressing interruptions
- Detect if human comments are directed to the robot
- Prioritize content based on conversational goals
- Resolve uncertain addressee situations with clarification or continuation
- Turn management
- Scenarios Requiring Special Consideration
- Non-actionable inputs
- Self-echo (technical artifact)
- Background noise (initially ignored, commented on if persistent)
- Backchannels and minor noises
- Examples: “Oh,” coughing, throat clearing
- No significant behavioral changes required
- Non-actionable inputs
- System Capabilities for Effective Interaction
- Predict continuity of the same speaker
- Detect and interpret turn-yielding cues
- Identify conversational urgency and importance
- Filter irrelevant inputs such as noise and hallucinations
- Use lexical, prosodic, and acoustic features (e.g., volume, pitch, speed)
- Integrate classifiers for turn-holding and context-specific behaviors