Multiple Categorization by iCub: Learning ...

Viewer
Transcript

IROS Workshop on Machine Learning Methods for High-Level Cognitive Capabilities in Robotics 2016

Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words ○Akira Taniguchi, Tadahiro Taniguchi (Ritsumeikan University, Japan), Angelo Cangelosi (Plymouth University, UK)

Infants can acquire word meanings by estimating the relationships between multiple situations and words. For example, if infant grasps a red ball at hand, the parent may describe an action of infant and an object using a sentence. grasp ? front ? red ? ball ?

Getting visual information

“look at red apple”

apple ?

Object feature (SIFT)

Color (RBG histogram) Area detection of objects (Background subtraction)

Action “red”！ “car”！ “car”！

car ? red ?

k-means & normalization

11 11 21 31 41 51 61 71 81 91 1 11 11 21 21 31 31 41 41 51 51 61 61 71 71 81 81 91 91

red ? look at?

“grasp front red ball”

The procedure for getting and processing data

“right red car”

• looking at an object of attention • Reaching for an object • Grasping with random degree

1

2

1

Position (Homography)

3

1 1

2

2 2

3

3 3

4

4 4

5

5 5

6

6 6

7

7 7

8

8 8

9

10 9 10 9 10

ID : 𝑥 , 𝑦 1: -0.351, -0.175 2: -0.348, 0.184 3: -0.291, 0.007

right ? An object of attention：2

We consider that infant can learn that the word “red” represents the red color by observing the co-occurrence of the word “red” with objects of red color in multiple situations. It is same thing for the other words. This is called cross-situational learning [Smith et al. 2011], [Fontanari et al. 2009]. Position of objects The humanoid iCub robot

Color of objects

• Posture • Tactile information • Relative coordinates to the object from the hand

Word information grasp right green ball

Getting action information

Example of teaching sentences

Conditions

action position color object

The number of action trials: 20 trials The number of objects on the table: 1 – 3 objects The number of words for each trial: 4 words The number of kind of words: 14 words The word order for each category was 𝐹𝑑 = (a,p,c,o) in all of the sentences.

reach touch look at reach grasp

front right right front far

green green blue blue red

box cup box ball box

20 trials Visual feature of objects

？

Action information of the robot

grasp green front cup

Human tutor

Multiple categorization (action, object, color, position) and Learning Relationships between Multiple Modalities and Words

1.The robot is in front of the table with objects on it. 2.The robot selects an object. The robot performs visual attention and an action on an object. 3.The tutor speaks a sentence about the object and the action of the robot. 4.The robot processes the sentence to discover the meanings of the words. This process (steps 1-4) is carried out many times in different situations. The robot learns word meanings and multiple categories by using visual, tactile, and proprioceptive information, as well as words.

Ad

A categorization for each modality is represented by Gaussian mixture model (GMM). The word distribution 𝜃𝑙 is associated with a  category 𝑧 on four modalities. l 𝐿 We assume that a word related to each Selection of modality is spoken only once in each sentence.

ad



p dm

pdm



o z dm

odm cdm

Selection of the modality



Fd

 

p

 

o

c





a

p

o 

c

z z

z

a d

c dm

Object category

The results show that the proposed method can associate each word with each category. Higher probability values are represented by darker shades. touch grasp look at reach

far

left

front right

box

ball

cup green

red

blue

a0 a1 a2 a3 a4 p0 p1 p2 p3 p4 o0 o1 o2 o3 o4 c0 c1 c2 c3 c4

Word distribution

wdn 𝑁

a

5 categories for each modality Each dimension of data is normalized to [0,1]

an object

a

GMM (action)

p k



GMM (position)

𝐾𝑎

𝑀

The number of objects M

a k

p

𝐾𝑝

𝐷



o k

o

GMM (object)



c k



GMM (color)

𝐾𝑜 c

𝐾𝑐

The number of data D

Color category

• We have proposed a Bayesian probabilistic model that can learn multiple categories and the relationships between words and multiple modalities. • The experimental results showed that the robot can perform the categorization for each modality and the estimation of a modality related to a word in complex situations.

Future directions • Experiments using a real iCub • Learning by uncertain spoken sentences – Changing the number of words and order

• Action generation task, description task

a: action, p: position, c: color, o: object feature The humanoid iCub robot: http://icub.org/

E-mail：[email protected]

Automatic term categorization by extracting ... - Semantic Scholar