Sharir - Overlapping Architectures.pdf

Viewer
Transcript

On the Expressive Efficiency of Overlapping Architectures of Deep Learning Or Sharir

Amnon Shashua

The Hebrew University of Jerusalem

June 30, 2017 Deep Learning Summer School

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

1 / 21

Overlapping vs Non-Overlapping Architectures

Receptive Field > Stride Overlapping

Sharir & Shashua (HUJI)

Receptive Field = Stride Non-Overlapping

Expressiveness of Overlapping Architectures

30/06/17

2 / 21

The Merits of Non-overlapping Architectures Non-overlapping arch’s have theoretical merit: Universality: can approximate any func given sufficient resources Optimization: better convergence guarantees than overlapping arch1

1 Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

3 / 21

The Merits of Non-overlapping Architectures Non-overlapping arch’s have theoretical merit: Universality: can approximate any func given sufficient resources Optimization: better convergence guarantees than overlapping arch1 In practice: Non-overlapping arch’s are used in some applications, but only few! Modern arch’s use ever smaller receptive fields, including many non-overlapping layers, but never all layers!

1 Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

3 / 21

The Merits of Non-overlapping Architectures Non-overlapping arch’s have theoretical merit: Universality: can approximate any func given sufficient resources Optimization: better convergence guarantees than overlapping arch1 In practice: Non-overlapping arch’s are used in some applications, but only few! Modern arch’s use ever smaller receptive fields, including many non-overlapping layers, but never all layers! Questions 1) Why are non-overlapping arch’s so uncommon? 2) Why is having just a bit of overlapping sufficient for most tasks? 1 Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

3 / 21

Expressive Efficiency

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

4 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

5 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions Let: HA – space of func compactly representable by network arch A HB –

Sharir & Shashua (HUJI)

-”-

Expressiveness of Overlapping Architectures

network arch B

30/06/17

5 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions Let: HA – space of func compactly representable by network arch A HB –

-”-

network arch B

A is efficient w.r.t. B if HA is a strict superset of HB

HA

Sharir & Shashua (HUJI)

HB

Expressiveness of Overlapping Architectures

30/06/17

5 / 21

Expressive Efficiency

Efficiency Expressive efficiency compares network arch in terms of their ability to compactly represent functions Let: HA – space of func compactly representable by network arch A HB –

-”-

network arch B

A is efficient w.r.t. B if HA is a strict superset of HB

HA

HB

A is completely efficient w.r.t. B if HB has zero “volume” inside HA

HA Sharir & Shashua (HUJI)

HB

Expressiveness of Overlapping Architectures

30/06/17

5 / 21

Expressive Efficiency

Efficiency – Formal Definition Network arch A is exponentially efficient w.r.t. network arch B if: (1) ∀func realized by B w/size1 rB can be realized by A w/size rA ∈ O(g(rB )), where g is polynomial. (2) ∃func realized by A w/size rA requiring B to have size rB ∈ Ω(f (rA )), where f is super-polynomial. A is completely efficient w.r.t. B if (2) holds for all of its func but a set of Lebesgue measure zero (in weight space).

1

Size depends on the measure of interest, e.g. # of neurons or # of parameters

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

6 / 21

Expressive Efficiency

Example: Efficiency of Depth Empirical Results: deep networks have an advantage

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

7 / 21

Expressive Efficiency

Example: Efficiency of Depth Empirical Results: deep networks have an advantage

Theory

Deep nets are exponentially efficient w.r.t. shallow ones Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

7 / 21

Expressive Efficiency

Other Works of Our Group Depth Efficiency: On the Expressive Power of Deep Learning: A Tensor Analysis N. Cohen, O. Sharir, and A. Shashua Conference on Learning Theory (COLT) 2016 Convolutional Rectifier Networks as Generalized Tensor Decompositions N. Cohen and A. Shashua International Conference on Machine Learning (ICML) 2016

Inductive Bias of Connectivity Patterns: Inductive Bias of Deep Convolutional Networks through Pooling Geometry N. Cohen and A. Shashua International Conference on Learning Representations (ICLR) 2017 Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions N. Cohen, R. Tamari and A. Shashua arXiv preprint 2017

Inductive Bias of the Widths of Layers: Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design Y. Levine, D. Yakira, N. Cohen and A. Shashua arXiv preprint 2017 Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

8 / 21

Convolutional Arithmetic Circuits

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

9 / 21

Convolutional Arithmetic Circuits

Convolutional Arithmetic Circuits To address raised Qs, we consider a special case of ConvNets: Convolutional Arithmetic Circuits (ConvACs)

1

Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML’16 Deep SimNets, CVPR’16 3 Tensorial Mixture Models, arXiv‘17 2

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

10 / 21

Convolutional Arithmetic Circuits

Convolutional Arithmetic Circuits To address raised Qs, we consider a special case of ConvNets: Convolutional Arithmetic Circuits (ConvACs) ConvACs are equivalent to hierarchical tensor decompositions: May be analyzed w/various mathematical tools Tools may be extended to additional types of ConvNets (e.g. ReLU)

1

1

Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML’16 Deep SimNets, CVPR’16 3 Tensorial Mixture Models, arXiv‘17 2

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

10 / 21

Convolutional Arithmetic Circuits

Convolutional Arithmetic Circuits To address raised Qs, we consider a special case of ConvNets: Convolutional Arithmetic Circuits (ConvACs) ConvACs are equivalent to hierarchical tensor decompositions: May be analyzed w/various mathematical tools Tools may be extended to additional types of ConvNets (e.g. ReLU)

1

Besides theoretical merits, ConvACs deliver promising results in practice: Excel in computationally constrained settings Classify optimally under missing data

2

3

1

Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML’16 Deep SimNets, CVPR’16 3 Tensorial Mixture Models, arXiv‘17 2

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

10 / 21

Convolutional Arithmetic Circuits

Baseline Architecture input X

representation

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv pooling

xi rep  i, d   fd  x i 

M

r0 pool0  j,   

conv0  j,    a0, , rep  j,:

r0



conv0  j ',  

j ' window j

poolL1   



rL1

rL1

convL1  j ',  

j ' covers space

dense (output) Y

out  y   a L, y , poolL1 :

Baseline ConvAC architecture: 2D ConvNet: conv −→ L × (conv → pool) −→ dense 1×1 convolutions, followed by linear activations (σ(z) = z) product pooling: P{cj } =

Sharir & Shashua (HUJI)

Q

j cj

(non-overlapping windows)

Expressiveness of Overlapping Architectures

30/06/17

11 / 21

Convolutional Arithmetic Circuits

Baseline Architecture input X

representation

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv pooling

xi rep  i, d   fd  x i 

M

r0 pool0  j,   

conv0  j,    a0, , rep  j,:

r0



conv0  j ',  

j ' window j

poolL1   



rL1

rL1

convL1  j ',  

j ' covers space

dense (output) Y

out  y   a L, y , poolL1 :

Baseline ConvAC architecture: 2D ConvNet: conv −→ L × (conv → pool) −→ dense 1×1 convolutions, followed by linear activations (σ(z) = z) product pooling: P{cj } =

Q

j cj

(non-overlapping windows)

Limitation: supports only non-overlapping architectures! Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

11 / 21

Convolutional Arithmetic Circuits

Generalized Convolutional Arithmetic Circuits Generalizing ConvACs to overlapping arch’s: input X

representa(on

hidden layer L-2

hidden layer 1

GC

GC

R(1)×R(1)

xi

hidden layer L-2

GC

GC (output)

S(1)

rep(i, d) = f✓d (xi )

D(1)

M GC(x(j) , w(c) , b(c) ) =

(1) (RY )

k=1

D(L-2)

2

(c)

bk +

M X

m=1

(c)

(j)

wmk xmk

D(L-1)

D(L)≡Y

!

Generalized Convolution: generalizes 1×1-conv and pooling

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

12 / 21

Convolutional Arithmetic Circuits

Generalized Convolutional Arithmetic Circuits Generalizing ConvACs to overlapping arch’s: input X

representa(on

hidden layer L-2

hidden layer 1

GC

GC

R(1)×R(1)

xi

hidden layer L-2

GC

GC (output)

S(1)

rep(i, d) = f✓d (xi )

D(1)

M GC(x(j) , w(c) , b(c) ) =

(1) (RY )

k=1

D(L-2)

2

(c)

bk +

M X

m=1

(c)

(j)

wmk xmk

D(L-1)

D(L)≡Y

!

Generalized Convolution: generalizes 1×1-conv and pooling Inspired by All Convolutional Net (pooling via stride > 1)

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

12 / 21

Convolutional Arithmetic Circuits

Generalized Convolutional Arithmetic Circuits Generalizing ConvACs to overlapping arch’s: input X

representa(on

hidden layer L-2

hidden layer 1

GC

GC

R(1)×R(1)

xi

hidden layer L-2

GC

GC (output)

S(1)

rep(i, d) = f✓d (xi )

D(1)

M GC(x(j) , w(c) , b(c) ) =

(1) (RY )

k=1

D(L-2)

2

(c)

bk +

M X

m=1

(c)

(j)

wmk xmk

D(L-1)

D(L)≡Y

!

Generalized Convolution: generalizes 1×1-conv and pooling Inspired by All Convolutional Net (pooling via stride > 1) Non-overlapping case is equivalent to standard ConvACs

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

12 / 21

Theoretical Analysis of ConvACs with Overlaps

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

13 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Architectures Are Just As Expressive Claim An overlapping arch can replicate any func realizable by a non-overlapping arch of similar size and same sequence of strides

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

14 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Architectures Are Just As Expressive Claim An overlapping arch can replicate any func realizable by a non-overlapping arch of similar size and same sequence of strides Conclusion Overlapping arch’s are just as expressive as non-overlapping arch’s!

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

14 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Architectures Are Just As Expressive Claim An overlapping arch can replicate any func realizable by a non-overlapping arch of similar size and same sequence of strides Conclusion Overlapping arch’s are just as expressive as non-overlapping arch’s! Question Could it be that overlapping arch’s are in fact more expressive?

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

14 / 21

Theoretical Analysis of ConvACs with Overlaps

Degree of Overlapping Input Layer

Total Stride

Layer L-1

Layer L

Total Receptive Field

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

15 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Efficiency Theorem Almost all func’s realizable by an overlapping arch cannot be replicated by a non-overlapping arch unless its size is exponential in the overlapping degree

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

16 / 21

Theoretical Analysis of ConvACs with Overlaps

Overlapping Efficiency Theorem Almost all func’s realizable by an overlapping arch cannot be replicated by a non-overlapping arch unless its size is exponential in the overlapping degree Common Case: alternating B×B-conv and 2×2-pooling input X

representa-on

Block 0

BxB GC

Block L-1

2x2 GC

BxB GC

R: BxB S: 1x1

W

2x2 GC

output

R: 2x2 S: 2x2

W

M

D(1)

D(1)

D(L-1)

D(L-1)

Claim Almost all func’s realizable by the above arch, cannot be replicated by a non-overlapping arch unless its size is at least M Sharir & Shashua (HUJI)

(2B−1)2 4

Expressiveness of Overlapping Architectures

30/06/17

16 / 21

Experiments on Standard ConvNets

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

17 / 21

Experiments on Standard ConvNets

Experiments on Standard ConvNets

100.00

100.00

93.33

93.33

86.67

86.67

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67 40.00 16

32

64

128

256

Number of Channels

Sharir & Shashua (HUJI)

512

1024

2048

Train Accuracy (%)

Train Accuracy (%)

ConvNets following the arch of last claim were trained on CIFAR10, while varying the number channels and size of receptive field, denoted by B.

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67

40.00 1.0e+03 4.1e+03 1.6e+04 6.6e+04 2.6e+05 1.0e+06 4.2e+06 1.7e+07

Expressiveness of Overlapping Architectures

Number of Parameters

30/06/17

18 / 21

Experiments on Standard ConvNets

Experiments on Standard ConvNets

100.00

100.00

93.33

93.33

86.67

86.67

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67 40.00 16

32

64

128

256

Number of Channels

512

1024

2048

Train Accuracy (%)

Train Accuracy (%)

ConvNets following the arch of last claim were trained on CIFAR10, while varying the number channels and size of receptive field, denoted by B.

80.00 73.33 66.67

B=1 B=2 B=3 B=4 B=5

60.00 53.33 46.67

40.00 1.0e+03 4.1e+03 1.6e+04 6.6e+04 2.6e+05 1.0e+06 4.2e+06 1.7e+07

Number of Parameters

Conjecture Increasing the overlapping degree beyond a certain point brings little to no gains in expressive efficiency! Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

18 / 21

Outline

1

Expressive Efficiency

2

Convolutional Arithmetic Circuits

3

Theoretical Analysis of ConvACs with Overlaps

4

Experiments on Standard ConvNets

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

19 / 21

Summary Comparing different arch’s through expressive efficiency.

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Summary Comparing different arch’s through expressive efficiency. Overlapping arch’s are efficient w.r.t. non-overlapping ones:

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Summary Comparing different arch’s through expressive efficiency. Overlapping arch’s are efficient w.r.t. non-overlapping ones: Proven in the case of ConvACs Holds even for arch’s of small overlapping degree Experiments suggest analysis holds for standard ConvNets as well.

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Summary Comparing different arch’s through expressive efficiency. Overlapping arch’s are efficient w.r.t. non-overlapping ones: Proven in the case of ConvACs Holds even for arch’s of small overlapping degree Experiments suggest analysis holds for standard ConvNets as well.

Concolusions: Non-overlapping arch’s are uncommon out of lack of efficiency Conjecture: Small overlapping degree might be all we need

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

20 / 21

Thank You

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Backup Slides

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Measure Efficiency via Grid Tensors Comparing functions directly can be ill-defined. Instead, compare functions via the grid tensors they induce: Denote by f (x1 , . . . , xN ) the function realized by the network. f (·) may be studied by discretizing each xi into one of {v(1) , . . . , v(M) }: A(f )d1 ...dN = f (v(d1 ) . . . v(dN ) ) , d1 . . .dN ∈ {1, . . ., M}

Efficiency: the minimal size required to induce a given grid-tensor. Universality of ConvACs: Any arch can induce any grid tensor, given sufficient number of channels. ⇒ Efficiency via grid tensors is well-defined!

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensorial Function Spaces We represent instances (images) as N-tuples of vectors (patches): X = (x1 , . . . , xN ) ∈ (Rs )N Example 32x32 RGB image represented via 5x5 patches around all pixels:

X # of patches N  32  32  1024 32

5

xi 5 32

patch dimension s  5  5  3  75

3 3

Let fθ1 . . .fθM : Rs → R be a basis of functions over patches, e.g. neurons: fθd =(wd ,bd ) (x) = σ(wd> x + bd ) Denote F = span{fθ1 . . .fθM } Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensorial Function Spaces (cont’) F ⊗N – extension of F from patches to images, i.e. the space of functions over images spanned by: (x1 , . . . , xN ) 7→

N Y

fθdi (xi ) , d1 . . .dN ∈ [M]

i=1

(formally known as the tensor product of F with itself N times) General function h ∈ F ⊗N can be written as: h (x1 , . . . , xN ) =

M X

Ad1 ,...,dN

d1 ...dN =1

N Y

fθdi (xi )

i=1

where A ∈ RM×···×M is the coefficient tensor of h

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M]

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M] Suppose we would like to draw an entry from tensor A: approach

computation complexity

storage complexity

naïve (lookup table)

constant

exponential (in N)

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M] Suppose we would like to draw an entry from tensor A: approach

computation complexity

storage complexity

naïve (lookup table)

constant

exponential (in N)

tensor decomposition

polynomial

polynomial

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Tensor Decompositions Tensor – multi-dimensional array: Ad1 ...dN ∈ R , d1 . . .dN ∈ [M] Suppose we would like to draw an entry from tensor A: approach

computation complexity

storage complexity

naïve (lookup table)

constant

exponential (in N)

tensor decomposition

polynomial

polynomial

Special case N = 2 – low-rank matrix decomposition: computation lookup table decomposition

1 k 

M

storage

M  2

M k 

Sharir & Shashua (HUJI)

M U

VT

k

k << M

  r 1U irV jr k

ij

=

M M

Expressiveness of Overlapping Architectures

k

30/06/17

21 / 21

Tensor Decompositions (cont’) For general order N, tensor decomposition is realized by convolutional arithmetic circuit over coordinate (d1 . . .dN ) indicators: coordinate indicators

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv pooling

M

r0

r0

rL1

linear

rL1

1 , d  d i ind  i, d    0 , otherwise

d1...d N

1-1 correspondence between type of tensor decomposition and structure of network (# of layers, pooling schemes, layer widths etc)

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

Computing Functions by Decomposing Coefficient Tensors h1 . . .hY – functions over images: hy (x1 , . . . , xN ) =

XM d1 ...dN =1

Ayd1 ,...,dN

YN

f (xi ) i=1 θdi

With tensor decomposition applied to Ay , functions hy are computed by convolutional arithmetic circuit over {fθd (xi )}d∈[M],i∈[N] (representation): input X

representation

hidden layer 0

1x1 conv

hidden layer L-1

pooling

1x1 conv

M rep  i, d   fd  x i 

dense (output)

pooling

xi r0

r0

rL1 hy  X    d ...d M 1

N

1

rL1 y d1 ...d N



N

f

i 1  di

Y

 xi 

decomposed coefficient tensor

Again: 1-1 correspondence between decomposition type and network structure Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

30/06/17

21 / 21

CP (CANDECOMP/PARAFAC) Decomposition ←→ Shallow Convolutional Arithmetic Circuit Classic CP decomposition of coefficient tensors Ay : Ay =

r0 X

aγ1,1,y · a0,1,γ ⊗ a0,2,γ ⊗ · · · ⊗ a0,N,γ

γ=1

{z

|

}

rank-1 tensor

(rank(Ay )≤r0 ) corresponds to shallow network (single hidden layer, global pooling): hidden layer input X

representation 1x1 conv global pooling

xi rep  i, d   fd  x i 

r0

M

conv  j ,    a0, j , , rep  j ,: pool   



conv  j ,  

j covers space

Sharir & Shashua (HUJI)

Expressiveness of Overlapping Architectures

dense (output)

r0

Y

out  y   a1,1, y , pool : 30/06/17

21 / 21

Hierarchical Tucker Decomposition ←→ Deep Convolutional Arithmetic Circuit Hierarchical Tucker decomposition of coefficient tensors Ay : φ1,j,γ

=

Xr0 α=1

aα1,j,γ · a0,2j−1,α ⊗ a0,2j,α

··· φ

l,j,γ

=

Xrl−1 α=1

aαl,j,γ · φl−1,2j−1,α ⊗ φl−1,2j,α

··· y

A

=

XrL−1 α=1

aαL,1,y · φL−1,1,α ⊗ φL−1,2,α

corresponds to deep network (L = log2 N hidden layers, size-2 pooling): input X

representation

hidden layer 0

1x1 conv

hidden layer L-1 (L=log2N)

pooling

1x1 conv

pooling

xi M rep  i, d   fd  x i 

r0

conv0  j ,    a0, j , , rep  j ,:

Sharir & Shashua (HUJI)

pool0  j,   

r0

rL1

poolL1   



j '2 j 1,2 j

conv0  j ',  

Expressiveness of Overlapping Architectures



j '1,2

dense (output)

rL1

Y

convL1  j ',  

out  y   a L ,1, y , poolL 1 : 30/06/17

21 / 21

Sharir - Overlapping Architectures.pdf

1Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a. ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI) Expressiveness of Overlapping Architectures 30/06/17 3 / 21. Page 4 of 51. Sharir - Overlapping Architectures.pdf. Sharir - Overlapping Architectures.pdf. Open. Extract. Open with.

Download PDF

3MB Sizes 1 Downloads 323 Views

Report

Sharir - Overlapping Architectures.pdf

Recommend Documents