When robots are involved in social sectors with humans, some of the components that robots have to possess, are: decision-making on current state, preferable actions,and engagement. Also the importance of good speech recognition and face detection linked with the ability of turn taking management and decision-making capabilities are taken as the major considerations in this project. Therefore the primary aim of this project is coupling all these factors in a model and developing a system that engages the multiple parties in the conversation.
For implementing this system I used the NAO robot (Torso) as platform. The robotic system comprises of 14 degrees of freedom with fully programmable architecture. The NAO robot platform is emerging in field of active research and development in the sectors of artificial intelligence, cognitive studies,and robotics.
The major objectives of this project are:
To develop a system by coupling speech and vision together in single unit as multi-modal system and to attain better user ratings for that system.
To effectively engage people in conversation and handle multiple parties in conversation.
I developed two systems - the former one with a functionality of processing through speech only and the latter one with the functionality of the combination of speech and vision. Finally I evaluated both the systems by testing them with multiple users and completed the initial survey. In order to effectively handle multiple parties in conversation, I developed a Conversation Manager which manages the turn-taking, and decision-making abilities of the robot.
NAO torso robot used in this project
Snap from the test session - Robot engages in conversation with two users