IROS Workshop on Machine Learning Methods for High-Level Cognitive Capabilities in Robotics 2016
Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words ○Akira Taniguchi, Tadahiro Taniguchi (Ritsumeikan University, Japan), Angelo Cangelosi (Plymouth University, UK)
Infants can acquire word meanings by estimating the relationships between multiple situations and words. For example, if infant grasps a red ball at hand, the parent may describe an action of infant and an object using a sentence. grasp ? front ? red ? ball ?
Getting visual information
“look at red apple”
apple ?
Object feature (SIFT)
Color (RBG histogram) Area detection of objects (Background subtraction)
Action “red”! “car”! “car”!
car ? red ?
k-means & normalization
11 11 21 31 41 51 61 71 81 91 1 11 11 21 21 31 31 41 41 51 51 61 61 71 71 81 81 91 91
red ? look at?
“grasp front red ball”
The procedure for getting and processing data
“right red car”
• looking at an object of attention • Reaching for an object • Grasping with random degree
1
2
1
Position (Homography)
3
1 1
2
2 2
3
3 3
4
4 4
5
5 5
6
6 6
7
7 7
8
8 8
9
10 9 10 9 10
ID : 𝑥 , 𝑦 1: -0.351, -0.175 2: -0.348, 0.184 3: -0.291, 0.007
right ? An object of attention:2
We consider that infant can learn that the word “red” represents the red color by observing the co-occurrence of the word “red” with objects of red color in multiple situations. It is same thing for the other words. This is called cross-situational learning [Smith et al. 2011], [Fontanari et al. 2009]. Position of objects The humanoid iCub robot
Color of objects
• Posture • Tactile information • Relative coordinates to the object from the hand
Word information grasp right green ball
Getting action information
Example of teaching sentences
Conditions
action position color object
The number of action trials: 20 trials The number of objects on the table: 1 – 3 objects The number of words for each trial: 4 words The number of kind of words: 14 words The word order for each category was 𝐹𝑑 = (a,p,c,o) in all of the sentences.
reach touch look at reach grasp
front right right front far
green green blue blue red
box cup box ball box
20 trials Visual feature of objects
?
Action information of the robot
grasp green front cup
Human tutor
Multiple categorization (action, object, color, position) and Learning Relationships between Multiple Modalities and Words
1.The robot is in front of the table with objects on it. 2.The robot selects an object. The robot performs visual attention and an action on an object. 3.The tutor speaks a sentence about the object and the action of the robot. 4.The robot processes the sentence to discover the meanings of the words. This process (steps 1-4) is carried out many times in different situations. The robot learns word meanings and multiple categories by using visual, tactile, and proprioceptive information, as well as words.
Ad
A categorization for each modality is represented by Gaussian mixture model (GMM). The word distribution 𝜃𝑙 is associated with a category 𝑧 on four modalities. l 𝐿 We assume that a word related to each Selection of modality is spoken only once in each sentence.
ad
p dm
pdm
o z dm
odm cdm
Selection of the modality
Fd
p
o
c
a
p
o
c
z z
z
a d
c dm
Object category
The results show that the proposed method can associate each word with each category. Higher probability values are represented by darker shades. touch grasp look at reach
far
left
front right
box
ball
cup green
red
blue
a0 a1 a2 a3 a4 p0 p1 p2 p3 p4 o0 o1 o2 o3 o4 c0 c1 c2 c3 c4
Word distribution
wdn 𝑁
a
5 categories for each modality Each dimension of data is normalized to [0,1]
an object
a
GMM (action)
p k
GMM (position)
𝐾𝑎
𝑀
The number of objects M
a k
p
𝐾𝑝
𝐷
o k
o
GMM (object)
c k
GMM (color)
𝐾𝑜 c
𝐾𝑐
The number of data D
Color category
• We have proposed a Bayesian probabilistic model that can learn multiple categories and the relationships between words and multiple modalities. • The experimental results showed that the robot can perform the categorization for each modality and the estimation of a modality related to a word in complex situations.
Future directions • Experiments using a real iCub • Learning by uncertain spoken sentences – Changing the number of words and order
• Action generation task, description task
a: action, p: position, c: color, o: object feature The humanoid iCub robot: http://icub.org/
E-mail:
[email protected]