GazePointAR
Voice assistants (VAs) like Siri and Alexa have transformed how humans interact with technology; however, their inability to consider a user’s spatiotemporal context, such as surrounding objects, dramatically limits natural dialogue. We introduce GazePointAR, a wearable augmented reality (AR) system that supports context-aware speech queries using eye gaze, pointing gesture, and conversation history. With GazePointAR, a user can ask “what’s over there?” or “how do I solve this math problem?” simply by looking and/or pointing. GazePointAR disambiguates queries using user inputs, real-time CV, and an LLM.
GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality
Jaewook Lee,
Jun Wang,
Elizabeth Brown,
Liam Gene Ping Chu,
Sebastian S. Rodriguez,
Jon E. Froehlich
Proceedings of CHI 2024
|
Acceptance Rate: 26.3% (1060 / 4028)
Keywords:
augmented reality,
llms,
chatgpt,
large language model,
pronoun disambiguation,
context-aware ar,
natural human agents
PDF
arXiv
doi
Cite
GazePointAR