On the Expressive Efficiency of Overlapping Architectures of Deep Learning Or Sharir

Amnon Shashua

The Hebrew University of Jerusalem

June 30, 2017 Deep Learning Summer School

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

1 / 21

Overlapping vs Non-Overlapping Architectures

Receptive Field > Stride Overlapping

Sharir & Shashua (HUJI)

Receptive Field = Stride Non-Overlapping

Expressiveness of Overlapping Architectures

30/06/17

2 / 21

The Merits of Non-overlapping Architectures Non-overlapping arch’s have theoretical merit: Universality: can approximate any func given sufficient resources Optimization: better convergence guarantees than overlapping arch1

1 Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

3 / 21

The Merits of Non-overlapping Architectures Non-overlapping arch’s have theoretical merit: Universality: can approximate any func given sufficient resources Optimization: better convergence guarantees than overlapping arch1 In practice: Non-overlapping arch’s are used in some applications, but only few! Modern arch’s use ever smaller receptive fields, including many non-overlapping layers, but never all layers!

1 Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

3 / 21

The Merits of Non-overlapping Architectures Non-overlapping arch’s have theoretical merit: Universality: can approximate any func given sufficient resources Optimization: better convergence guarantees than overlapping arch1 In practice: Non-overlapping arch’s are used in some applications, but only few! Modern arch’s use ever smaller receptive fields, including many non-overlapping layers, but never all layers! Questions 1) Why are non-overlapping arch’s so uncommon? 2) Why is having just a bit of overlapping sufficient for most tasks? 1 Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

3 / 21

Expressive Efficiency

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

4 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

5 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions Let: HA – space of func compactly representable by network arch A HB –

Sharir & Shashua (HUJI)

-”-

Expressiveness of Overlapping Architectures

network arch B

30/06/17

5 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions Let: HA – space of func compactly representable by network arch A HB –

-”-

network arch B

A is efficient w.r.t. B if HA is a strict superset of HB

HA

Sharir & Shashua (HUJI)

HB

Expressiveness of Overlapping Architectures

30/06/17

5 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions Let: HA – space of func compactly representable by network arch A HB –

-”-

network arch B

A is efficient w.r.t. B if HA is a strict superset of HB

HA

HB

A is completely efficient w.r.t. B if HB has zero “volume” inside HA

HA Sharir & Shashua (HUJI)

HB

Expressiveness of Overlapping Architectures

30/06/17

5 / 21

Expressive Efficiency

Efficiency – Formal Definition Network arch A is exponentially efficient w.r.t. network arch B if: (1) ∀func realized by B w/size1 rB can be realized by A w/size rA ∈ O(g(rB )), where g is polynomial. (2) ∃func realized by A w/size rA requiring B to have size rB ∈ Ω(f (rA )), where f is super-polynomial. A is completely efficient w.r.t. B if (2) holds for all of its func but a set of Lebesgue measure zero (in weight space).

1

Size depends on the measure of interest, e.g. # of neurons or # of parameters

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

6 / 21

Expressive Efficiency

Example: Efficiency of Depth Empirical Results: deep networks have an advantage

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

7 / 21

Expressive Efficiency

Example: Efficiency of Depth Empirical Results: deep networks have an advantage

Theory

Deep nets are exponentially efficient w.r.t. shallow ones Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

7 / 21

Expressive Efficiency

Other Works of Our Group Depth Efficiency: On the Expressive Power of Deep Learning: A Tensor Analysis N. Cohen, O. Sharir, and A. Shashua Conference on Learning Theory (COLT) 2016 Convolutional Rectifier Networks as Generalized Tensor Decompositions N. Cohen and A. Shashua International Conference on Machine Learning (ICML) 2016

Inductive Bias of Connectivity Patterns: Inductive Bias of Deep Convolutional Networks through Pooling Geometry N. Cohen and A. Shashua International Conference on Learning Representations (ICLR) 2017 Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions N. Cohen, R. Tamari and A. Shashua arXiv preprint 2017

Inductive Bias of the Widths of Layers: Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design Y. Levine, D. Yakira, N. Cohen and A. Shashua arXiv preprint 2017 Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

8 / 21

Convolutional Arithmetic Circuits

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

9 / 21

Convolutional Arithmetic Circuits

Convolutional Arithmetic Circuits To address raised Qs, we consider a special case of ConvNets: Convolutional Arithmetic Circuits (ConvACs)

1

Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML’16 Deep SimNets, CVPR’16 3 Tensorial Mixture Models, arXiv‘17 2

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

10 / 21

Convolutional Arithmetic Circuits

Convolutional Arithmetic Circuits To address raised Qs, we consider a special case of ConvNets: Convolutional Arithmetic Circuits (ConvACs) ConvACs are equivalent to hierarchical tensor decompositions: May be analyzed w/various mathematical tools Tools may be extended to additional types of ConvNets (e.g. ReLU)

1

1

Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML’16 Deep SimNets, CVPR’16 3 Tensorial Mixture Models, arXiv‘17 2

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

10 / 21

Convolutional Arithmetic Circuits

Convolutional Arithmetic Circuits To address raised Qs, we consider a special case of ConvNets: Convolutional Arithmetic Circuits (ConvACs) ConvACs are equivalent to hierarchical tensor decompositions: May be analyzed w/various mathematical tools Tools may be extended to additional types of ConvNets (e.g. ReLU)

1

Besides theoretical merits, ConvACs deliver promising results in practice: Excel in computationally constrained settings Classify optimally under missing data

2

3

1

Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML’16 Deep SimNets, CVPR’16 3 Tensorial Mixture Models, arXiv‘17 2

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

10 / 21

Convolutional Arithmetic Circuits

Baseline Architecture input X

representation

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv pooling

xi rep  i, d   fd  x i 

M

r0 pool0  j,   

conv0  j,    a0, , rep  j,:

r0



conv0  j ',  

j ' window j

poolL1   



rL1

rL1

convL1  j ',  

j ' covers space

dense (output) Y

out  y   a L, y , poolL1 :

Baseline ConvAC architecture: 2D ConvNet: conv −→ L × (conv → pool) −→ dense 1×1 convolutions, followed by linear activations (σ(z) = z) product pooling: P{cj } =

Sharir & Shashua (HUJI)

Q

j cj

(non-overlapping windows)

Expressiveness of Overlapping Architectures

30/06/17

11 / 21

Convolutional Arithmetic Circuits

Baseline Architecture input X

representation

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv pooling

xi rep  i, d   fd  x i 

M

r0 pool0  j,   

conv0  j,    a0, , rep  j,:

r0



conv0  j ',  

j ' window j

poolL1   



rL1

rL1

convL1  j ',  

j ' covers space

dense (output) Y

out  y   a L, y , poolL1 :

Baseline ConvAC architecture: 2D ConvNet: conv −→ L × (conv → pool) −→ dense 1×1 convolutions, followed by linear activations (σ(z) = z) product pooling: P{cj } =

Q

j cj

(non-overlapping windows)

Limitation: supports only non-overlapping architectures! Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

11 / 21

Convolutional Arithmetic Circuits

Generalized Convolutional Arithmetic Circuits Generalizing ConvACs to overlapping arch’s: input X

representa(on

hidden layer L-2

hidden layer 1

GC

GC

R(1)×R(1)

xi

hidden layer L-2

GC

GC (output)

S(1)

rep(i, d) = f✓d (xi )

D(1)

M GC(x(j) , w(c) , b(c) ) =

(1) (RY )

k=1

D(L-2)

2

(c)

bk +

M X

m=1

(c)

(j)

wmk xmk

D(L-1)

D(L)≡Y

!

Generalized Convolution: generalizes 1×1-conv and pooling

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

12 / 21

Convolutional Arithmetic Circuits

Generalized Convolutional Arithmetic Circuits Generalizing ConvACs to overlapping arch’s: input X

representa(on

hidden layer L-2

hidden layer 1

GC

GC

R(1)×R(1)

xi

hidden layer L-2

GC

GC (output)

S(1)

rep(i, d) = f✓d (xi )

D(1)

M GC(x(j) , w(c) , b(c) ) =

(1) (RY )

k=1

D(L-2)

2

(c)

bk +

M X

m=1

(c)

(j)

wmk xmk

D(L-1)

D(L)≡Y

!

Generalized Convolution: generalizes 1×1-conv and pooling Inspired by All Convolutional Net (pooling via stride > 1)

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

12 / 21

Convolutional Arithmetic Circuits

Generalized Convolutional Arithmetic Circuits Generalizing ConvACs to overlapping arch’s: input X

representa(on

hidden layer L-2

hidden layer 1

GC

GC

R(1)×R(1)

xi

hidden layer L-2

GC

GC (output)

S(1)

rep(i, d) = f✓d (xi )

D(1)

M GC(x(j) , w(c) , b(c) ) =

(1) (RY )

k=1

D(L-2)

2

(c)

bk +

M X

m=1

(c)

(j)

wmk xmk

D(L-1)

D(L)≡Y

!

Generalized Convolution: generalizes 1×1-conv and pooling Inspired by All Convolutional Net (pooling via stride > 1) Non-overlapping case is equivalent to standard ConvACs

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

12 / 21

Theoretical Analysis of ConvACs with Overlaps

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

13 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Architectures Are Just As Expressive Claim An overlapping arch can replicate any func realizable by a non-overlapping arch of similar size and same sequence of strides

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

14 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Architectures Are Just As Expressive Claim An overlapping arch can replicate any func realizable by a non-overlapping arch of similar size and same sequence of strides Conclusion Overlapping arch’s are just as expressive as non-overlapping arch’s!

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

14 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Architectures Are Just As Expressive Claim An overlapping arch can replicate any func realizable by a non-overlapping arch of similar size and same sequence of strides Conclusion Overlapping arch’s are just as expressive as non-overlapping arch’s! Question Could it be that overlapping arch’s are in fact more expressive?

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

14 / 21

Theoretical Analysis of ConvACs with Overlaps

Degree of Overlapping Input Layer

Total Stride

Layer L-1

Layer L

Total Receptive Field

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

15 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Efficiency Theorem Almost all func’s realizable by an overlapping arch cannot be replicated by a non-overlapping arch unless its size is exponential in the overlapping degree

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

16 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Efficiency Theorem Almost all func’s realizable by an overlapping arch cannot be replicated by a non-overlapping arch unless its size is exponential in the overlapping degree Common Case: alternating B×B-conv and 2×2-pooling input X

representa-on

Block 0

BxB GC

Block L-1

2x2 GC

BxB GC

R: BxB S: 1x1

W

2x2 GC

output

R: 2x2 S: 2x2

W

M

D(1)

D(1)

D(L-1)

D(L-1)

Claim Almost all func’s realizable by the above arch, cannot be replicated by a non-overlapping arch unless its size is at least M Sharir & Shashua (HUJI)

(2B−1)2 4

Expressiveness of Overlapping Architectures

30/06/17

16 / 21

Experiments on Standard ConvNets

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

17 / 21

Experiments on Standard ConvNets

Experiments on Standard ConvNets

100.00

100.00

93.33

93.33

86.67

86.67

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67 40.00 16

32

64

128

256

Number of Channels

Sharir & Shashua (HUJI)

512

1024

2048

Train Accuracy (%)

Train Accuracy (%)

ConvNets following the arch of last claim were trained on CIFAR10, while varying the number channels and size of receptive field, denoted by B.

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67

40.00 1.0e+03 4.1e+03 1.6e+04 6.6e+04 2.6e+05 1.0e+06 4.2e+06 1.7e+07

Expressiveness of Overlapping Architectures

Number of Parameters

30/06/17

18 / 21

Experiments on Standard ConvNets

Experiments on Standard ConvNets

100.00

100.00

93.33

93.33

86.67

86.67

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67 40.00 16

32

64

128

256

Number of Channels

512

1024

2048

Train Accuracy (%)

Train Accuracy (%)

ConvNets following the arch of last claim were trained on CIFAR10, while varying the number channels and size of receptive field, denoted by B.

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67

40.00 1.0e+03 4.1e+03 1.6e+04 6.6e+04 2.6e+05 1.0e+06 4.2e+06 1.7e+07

Number of Parameters

Conjecture Increasing the overlapping degree beyond a certain point brings little to no gains in expressive efficiency! Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

18 / 21

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

19 / 21

Summary Comparing different arch’s through expressive efficiency.

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Summary Comparing different arch’s through expressive efficiency. Overlapping arch’s are efficient w.r.t. non-overlapping ones:

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Summary Comparing different arch’s through expressive efficiency. Overlapping arch’s are efficient w.r.t. non-overlapping ones: Proven in the case of ConvACs Holds even for arch’s of small overlapping degree Experiments suggest analysis holds for standard ConvNets as well.

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Summary Comparing different arch’s through expressive efficiency. Overlapping arch’s are efficient w.r.t. non-overlapping ones: Proven in the case of ConvACs Holds even for arch’s of small overlapping degree Experiments suggest analysis holds for standard ConvNets as well.

Concolusions: Non-overlapping arch’s are uncommon out of lack of efficiency Conjecture: Small overlapping degree might be all we need

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Thank You

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Backup Slides

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Measure Efficiency via Grid Tensors Comparing functions directly can be ill-defined. Instead, compare functions via the grid tensors they induce: Denote by f (x1 , . . . , xN ) the function realized by the network. f (·) may be studied by discretizing each xi into one of {v(1) , . . . , v(M) }: A(f )d1 ...dN = f (v(d1 ) . . . v(dN ) ) , d1 . . .dN ∈ {1, . . ., M}

Efficiency: the minimal size required to induce a given grid-tensor. Universality of ConvACs: Any arch can induce any grid tensor, given sufficient number of channels. ⇒ Efficiency via grid tensors is well-defined!

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensorial Function Spaces We represent instances (images) as N-tuples of vectors (patches): X = (x1 , . . . , xN ) ∈ (Rs )N Example 32x32 RGB image represented via 5x5 patches around all pixels:

X # of patches N  32  32  1024 32

5

xi 5 32

patch dimension s  5  5  3  75

3 3

Let fθ1 . . .fθM : Rs → R be a basis of functions over patches, e.g. neurons: fθd =(wd ,bd ) (x) = σ(wd> x + bd ) Denote F = span{fθ1 . . .fθM } Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensorial Function Spaces (cont’) F ⊗N – extension of F from patches to images, i.e. the space of functions over images spanned by: (x1 , . . . , xN ) 7→

N Y

fθdi (xi ) , d1 . . .dN ∈ [M]

i=1

(formally known as the tensor product of F with itself N times) General function h ∈ F ⊗N can be written as: h (x1 , . . . , xN ) =

M X

Ad1 ,...,dN

d1 ...dN =1

N Y

fθdi (xi )

i=1

where A ∈ RM×···×M is the coefficient tensor of h

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M]

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M] Suppose we would like to draw an entry from tensor A: approach

computation complexity

storage complexity

naïve (lookup table)

constant

exponential (in N)

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M] Suppose we would like to draw an entry from tensor A: approach

computation complexity

storage complexity

naïve (lookup table)

constant

exponential (in N)

tensor decomposition

polynomial

polynomial

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M] Suppose we would like to draw an entry from tensor A: approach

computation complexity

storage complexity

naïve (lookup table)

constant

exponential (in N)

tensor decomposition

polynomial

polynomial

Special case N = 2 – low-rank matrix decomposition: computation lookup table decomposition

1 k 

M

storage

M  2

M k 

Sharir & Shashua (HUJI)

M U

VT

k

k << M

  r 1U irV jr k

ij

=

M M

Expressiveness of Overlapping Architectures

k

30/06/17

21 / 21

Tensor Decompositions (cont’) For general order N, tensor decomposition is realized by convolutional arithmetic circuit over coordinate (d1 . . .dN ) indicators: coordinate indicators

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv pooling

M

r0

r0

rL1

linear

rL1

1 , d  d i ind  i, d    0 , otherwise

d1...d N

1-1 correspondence between type of tensor decomposition and structure of network (# of layers, pooling schemes, layer widths etc)

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Computing Functions by Decomposing Coefficient Tensors h1 . . .hY – functions over images: hy (x1 , . . . , xN ) =

XM d1 ...dN =1

Ayd1 ,...,dN

YN

f (xi ) i=1 θdi

With tensor decomposition applied to Ay , functions hy are computed by convolutional arithmetic circuit over {fθd (xi )}d∈[M],i∈[N] (representation): input X

representation

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv

M rep  i, d   fd  x i 

dense (output)

pooling

xi r0

r0

rL1 hy  X    d ...d M 1

N

1

rL1 y d1 ...d N



N

f

i 1  di

Y

 xi 

decomposed coefficient tensor

Again: 1-1 correspondence between decomposition type and network structure Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

CP (CANDECOMP/PARAFAC) Decomposition ←→ Shallow Convolutional Arithmetic Circuit Classic CP decomposition of coefficient tensors Ay : Ay =

r0 X

aγ1,1,y · a0,1,γ ⊗ a0,2,γ ⊗ · · · ⊗ a0,N,γ

γ=1

{z

|

}

rank-1 tensor

(rank(Ay )≤r0 ) corresponds to shallow network (single hidden layer, global pooling): hidden layer input X

representation 1x1 conv global pooling

xi rep  i, d   fd  x i 

r0

M

conv  j ,    a0, j , , rep  j ,: pool   



conv  j ,  

j covers space

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

dense (output)

r0

Y

out  y   a1,1, y , pool : 30/06/17

21 / 21

Hierarchical Tucker Decomposition ←→ Deep Convolutional Arithmetic Circuit Hierarchical Tucker decomposition of coefficient tensors Ay : φ1,j,γ

=

Xr0 α=1

aα1,j,γ · a0,2j−1,α ⊗ a0,2j,α

··· φ

l,j,γ

=

Xrl−1 α=1

aαl,j,γ · φl−1,2j−1,α ⊗ φl−1,2j,α

··· y

A

=

XrL−1 α=1

aαL,1,y · φL−1,1,α ⊗ φL−1,2,α

corresponds to deep network (L = log2 N hidden layers, size-2 pooling): input X

representation

hidden layer 0

1x1 conv

hidden layer L-1 (L=log2N)

pooling

1x1 conv

pooling

xi M rep  i, d   fd  x i 

r0

conv0  j ,    a0, j , , rep  j ,:

Sharir & Shashua (HUJI)

pool0  j,   

r0

rL1

poolL1   



j '2 j 1,2 j

conv0  j ',  

Expressiveness of Overlapping Architectures



j '1,2

dense (output)

rL1

Y

convL1  j ',  

out  y   a L ,1, y , poolL 1 : 30/06/17

21 / 21

Sharir - Overlapping Architectures.pdf

1Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a. ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI) Expressiveness of Overlapping Architectures 30/06/17 3 / 21. Page 4 of 51. Sharir - Overlapping Architectures.pdf. Sharir - Overlapping Architectures.pdf. Open. Extract. Open with.

3MB Sizes 1 Downloads 264 Views

Recommend Documents

Functional Magnetic Resonance Imaging Investigation of Overlapping ...
Jan 3, 2007 - In contrast, multi-voxel analyses of variations in selectivity patterns .... Preprocessing and statistical analysis of MRI data were performed using ...

Overlapping Experiment Infrastructure: More ... - Research at Google
Jul 28, 2010 - Android, Chrome, etc. At a high level, users interact with Google by sending requests for web pages via their browser. For search results pages, ...

Overlapping Experiment Infrastructure:More ... - Research at Google
... and quick experiment set-up. Experimental data available quickly and automatically ... Layers: contain domains and experiments. Nesting: Allows for different ...

Overlapping Generations: The First Jubilee
exchange either: should I wish, say, to lend to members of my generation, so would .... the storage technology to its fullest extent and put e1 units of goods aside.

overlapping shapes - prompt sheet.pdf
There was a problem loading more pages. Retrying... overlapping shapes - prompt sheet.pdf. overlapping shapes - prompt sheet.pdf. Open. Extract. Open with.

Tracking Across Multiple Cameras with Overlapping ...
line is similar, we count on the color and edge clues which lead us to the correct results. There are .... and Automation, May 2006. [16] S. M. Khan and M. Shah, ...

ICA-based Identification of Overlapping Spatial Clusters ...
determine the influence of each IC on the observed signal at each electrode, thus allowing electrodes (and the .... an action verb (e.g., 'apple' → 'eat'). ECoG is ...

An Overlapping Generations Model of Habit Formation ...
when the tax rate is high enough (i.e., exceeds a ”critical” tax rate, which can be as low as zero ... Both savings and interest on savings are fully con- sumed. c2 t+1 = (1 + ..... be misleading if habit formation is taken into account. The intu

Overlapping multivoxel patterns for two levels of visual ... - ScienceOpen
Apr 25, 2013 - Christopher Summerfield*. Department ... e-mail: christopher.summerfield@ ..... during block-wise presentation of black-and-white photographs.

Detecting highly overlapping communities with Model ...
1Our C++ implementation of MOSES is available at http://sites.google.com/ ..... a) Edge expansion: In the initial phase of the algorithm, .... software. For the specification of overlapping NMI, see the appendix of .... development of the model.

Controlled School Choice with Soft Bounds and Overlapping Types
School choice programs are implemented to give students/parents an ... Computer simulation results illustrate that DA-OT outperforms an .... In the literature of computer science, ...... Online stochastic optimization in the large: Applica- ... in Bo

Stationary Markov equilibria for overlapping generations1
(xi,σt ,θi,σt ,φi,σt ) ∈ arg max{Ui,σt (x):(x, θ, φ) ∈ Bi,σt (p, q)}, yσt ∈ arg ..... by imposing market clearing and reporting the relative errors in individuals' first order ...

Optimal Fiscal Policy in Overlapping Generations Models
May 22, 2017 - We prove that for a large class of preferences, the optimal capital income tax ... In particular, if we interpret consumption at different ages as different ..... the relative weight between of present and future generations. ... subst

Overlapping multivoxel patterns for two levels of visual ... - ScienceOpen
Apr 25, 2013 - Christopher Summerfield*. Department of ... e-mail: christopher.summerfield@ ..... ison with a distribution of this value generated under the “null.

Efficiency and Lack of Commitment in an Overlapping ...
than a young agent. Endowments are perishable and there is no saving technology. At each date, every agent can transfer some part of his/her endowments to ...

Overlapping talk and the organization of turn-taking for conversation
pun; his wife (Anne) and adult daughter (Deb) are responding with one common mode of response ...... Kathy and Dave are hosting longtime friends. Rubin and ...

Overlapping and distinct brain regions involved in estimating the ...
Overlapping and distinct brain regions involved in esti ... l position of numerical and nonnumerical magnitudes.pdf. Overlapping and distinct brain regions ...

Controlled School Choice with Soft Bounds and Overlapping Types
that setting soft-bounds, which flexibly change the priorities of students based on .... the empirical analysis by Braun, Dwenger, Kübler, and Westkamp (2014).

Search, money and capital in an overlapping ...
Jun 11, 2010 - Fax: 81$77$561$4837. †Email: [email protected] ... At the 1st stage (day), each agent supplies labor inelastically, receive wage and hold.

Notes Mutually Exclusive and Overlapping Events.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

An Overlapping Generations Model of Habit Formation ...
financial support. 1 ...... Utility and Probability, New York, London: W.W. Norton & Company. ... Satisfaction, New York and Oxford: Oxford University Press.

Range Non-overlapping Indexing and Successive ... -
assume the field y(l) is set for any leaf l by the suffix tree algorithm */. 2 traverse ST(T) and set the field x(l) for each leaf l;. 3 traverse ST(T) using DFS : 4 foreach ...