Vector Symbolic Architectures Answer Jackendoff’s Challenges for Cognitive Neuroscience Ross Gayler
[email protected]
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
1
Overview • Jackendoff’s Challenges • Vector Symbolic Architectures • Challenges & Responses
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
2
Origin of Jackendoff’s Challenges • Language is a mental phenomenon and neurally implemented • Cognitive neuroscience view of linguistic phenomena seems naive • Identifies core linguistic functionality:
– Not provided by or obviously inherent in current connectionist models (Marcus, 2001) – Arguably core cognitive functionality
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
3
Vector Symbolic Architectures • A family of connectionist approaches
– From Smolensky’s Tensor Binding Networks – Little-known and little-used • Gayler, Kanerva, Plate, Rachkovskij
• Provide ‘symbolic’-like functionality
– Represent & manipulate recursive structures
– Without training (because of architecture) – Practical basis for implementation 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
4
VSA Architectural Commitments • Network of connectionist units
– Single scalar output, multiple scalar inputs, fixed local numerical calculation
• High-dimensional vectors of unit outputs – 1,000s to 10,000s of units
• Distributed representations on vectors – Representations appear random
• Algebra-like operations on vectors – Statistical rather than exact
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
5
Vector Operators • Vector operators vary by VSA • MAP Coding (Gayler, 1998) • Multiply (elementwise multiplication) – Binding (structural composition)
• Add (elementwise addition)
– Superposition (set composition)
• Permute
– Quotation (protection)
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
6
Levels of Description Vector Representation Representational Architecture Cognitive Architecture
15/10/2004
VSA
Computer
High-dimensional vectors & Vector operators: , , P( )
Binary words & Logical operators: AND , XOR , …
Fixed network of operators
CPU
Mapping of task onto lower levels
Program
Ross Gayler, RNI seminar, Menlo Park
7
Approach Taken • Accept the challenges at face value • Skate over VSA details • Don’t compare VSA with other connectionist approaches
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
8
C1: Massiveness of Binding • Representations must be composed • A commonly used example of binding: – “red square” & “blue circle”
• Jackendoff’s example sentence:
– “The little star’s beside a big star”
– Encoded with approximately 130 tokens and 160 relations between tokens
• Must be rapid and cope with novelty 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
9
The little star’s beside a big star
© Ray Jackendoff 2002; reproduced by permission of Oxford University Press 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
10
R1: Massiveness of Binding (1) • VSAs do binding
– as an architectural primitive – as an untrained operation
– blind to the identity of the vectors
• Fast • Oblivious to novelty • Vectors being bound may be composites 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
11
R1: Massiveness of Binding (2) • bind(a,b) = a*b
• unbind(cue,trace) = bind(inverse(cue),trace)
• Able to deal with noisy cues and traces • In MAP and Kanerva’s Spatter Coding – each vector is its own binding inverse – bind(a,a) = a*a = 1
– unbind(cue,trace) = bind(cue,trace)
– removes the need for an inverse operator 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
12
R1: Massiveness of Binding (3) bind
unbind
a
a* b
b
b
bind a b 15/10/2004
a*b*b = a*(b*b) = a* 1 =a
p*q + r*s + ... a* b
unbind
a*b + ... b Ross Gayler, RNI seminar, Menlo Park
b*(a*b + …) = a*b*b + b*(…) = a + b*(p*q + ...) = a + (b*p*q + …) = a + noise 13
C2: The Problem of 2 • How are multiple instances represented? – “little star” & “big star”
• This is a problem for ‘localist’ connectionist representations
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
14
R2: The Problem of 2 (1) • VSAs do superposition of entities • Only values exist in a given vector space star + star = 2 star
• Distinct entities must have distinct values • Need a representation such that any difference creates unrelated values frame1 = {is-a:star size:little} frame2 = {is-a:star size:big} similarity(frame1,frame2) = 0 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
15
R2: The Problem of 2 (2) • Naive frame encodings don’t work (little*star) + (big*star) = (little + big)*star (is-a*star + size*little + colour*red) + (is-a*star + size*big + colour*blue) = (is-a*star + size*little + colour*blue) + (is-a*star + size*big + colour*red) 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
16
R2: The Problem of 2 (3) • Frames represented as OAV triples • Frame ID is permuted frame contents P(is-a*star + size*little)*(is-a*star + size*little) = P(is-a*star + size*little)*is-a*star + P(is-a*star + size*little)*size*little
• Cross-binds all role:filler pairs • Easy to implement
P(a)*a
a P( ) 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
17
C3: The Problem of Variables • Construe grammar ‘Rules’ as templates with variables • Productivity arising from variables • Typed variables take constrained values • How are typed variables implemented?
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
18
R3: The Problem of Variables (1) • Variables as placeholders (Smolensky) • Bind a vector representing the identity of the variable with a value vector – variable_id*value
• Value can be retrieved from the binding • Variable_id and Value are just vectors – Could atomic or composite
– Variable and Value are interchangeable 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
19
R3: The Problem of Variables (2) • Variables as targets for substitution • Bind a vector representing a substitution with a vector representing a structure – Substitute x for a: (x*a)
– Apply the substitution to (a*b) (x*a)*(a*b) = x*a*a*b = x*1*b = (x*b)
• Under substitution every component of a structure acts as a variable 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
20
R3: The Problem of Variables (3) • Types are constraints on variables • Lexicalised grammars shift complexity from the rules to the atomic structures • Types and other constraints can be realised as additional attributes on VSA structures • Substitutions apply systematically across all the superposed structures in a space • Suggests constraint-based formalisms/DOP 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
21
C4: Working Memory & LTM • Linguistic tasks require functional equivalence of Working Memory & LTM
– Different physical implementations in typical connectionist models argue against functional equivalence – Speed of learning
• Iterative learning (e.g. backpropagation) is implausibly slow
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
22
R4: Working Memory & LTM (1) • Problem cast as physical implementation (activations vs synaptic weights) • Recast in terms of logical representations • MLP: WM vectors vs LTM matrices – WM and LTM items exist in different incommensurable spaces
• VSA: WM & LTM items in same space items to store 15/10/2004
cue trace
Ross Gayler, RNI seminar, Menlo Park
retrieved items 23
R4: Working Memory & LTM (2) • VSA WM/LTM distinction comes from persistence, not representational space • WM/LTM distinction reflected in patterns of interaction – WM interacts directly with WM
– WM interacts directly with LTM
– LTM with LTM only via items in WM
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
24
R4: Working Memory & LTM (3) • Speed of learning in VSAs
– Binding in WM occurs in one pass
– Adding an item to a superposition in WM takes one pass
– Adding an item to a superposition in LTM takes one pass (logically)
• May take longer (physically) depending on method for implementing persistence
15/10/2004
Ross Gayler, RNI seminar, Menlo Park
25
Conclusion • VSAs meet Jackendoff’s challenges • Functionality comes from the algebraic properties of the representations – No training required
• Simple connectionist implementation • Harder to work with because of dependence on design instead of training 15/10/2004
Ross Gayler, RNI seminar, Menlo Park
26