Tera-scale deep learning - Research at Google

Viewer
Transcript

Tera-scale deep learning Quoc V. Le

Stanford University and Google

Joint work with

Kai Chen

Greg Corrado

Rajat Monga Andrew Ng

AddiNonal Thanks:

Jeﬀ Dean MaQhieu Devin

Marc Aurelio Paul Tucker Ranzato

Ke Yang

Samy Bengio, Zhenghao Chen, Tom Dean, Pangwei Koh, Mark Mao, Jiquan Ngiam, Patrick Nguyen, Andrew Saxe, Mark Segal, Jon Shlens, Vincent Vanhouke, Xiaoyun Wu, Peng Xe, Serena Yeung, Will Zou

Machine Learning successes

Face recogniNon

OCR

RecommendaNon systems

Autonomous car

Email classiﬁcaNon

Web page ranking

Feature ExtracNon

Classiﬁer

Feature extracNon (Mostly hand-‐cra]ed features)

Hand-‐Cra]ed Features Computer vision: … SIFT/HOG

SURF Speech RecogniNon:

…

MFCC

Spectrogram

ZCR

New feature-‐designing paradigm

Unsupervised Feature Learning / Deep Learning ReconstrucNon ICA Expensive and typically applied to small problems

The Trend of BigData

Outline No maQer the algorithm, more features always more successful. -‐ ReconstrucNon ICA

-‐  ApplicaNons to videos, cancer images -‐  Ideas for scaling up -‐  Scaling up Results

Topographic Independent Component Analysis (TICA)

1. Feature computaNon

( W T 1

2

)

(

W T 9

2

)

2. Learning

W T 9

W T 1 W 1

W 9

W =

Input data:

W 1 W 2 . . W 10000

Topographic Independent Component Analysis (TICA)

Invariance explained Images Features

F1

F2

Pooled feature of F1 and F2

Image1

Image2 Loc1

Loc2

1

0

0

1

sqrt(1 2 + 02 ) = 1

sqrt(0 2 + 12 ) = 1

Same value regardless the locaNon of the edge

TICA:

ReconstrucNon ICA:

Equivalence between Sparse Coding, Autoencoders, RBMs and ICA Build deep architecture by treaNng the output of one layer as input to another layer Le, et al., ICA with Reconstruc1on Cost for Eﬃcient Overcomplete Feature Learning. NIPS 2011

ReconstrucNon ICA:

Le, et al., ICA with Reconstruc1on Cost for Eﬃcient Overcomplete Feature Learning. NIPS 2011

ReconstrucNon ICA:

Data whitening

Le, et al., ICA with Reconstruc1on Cost for Eﬃcient Overcomplete Feature Learning. NIPS 2011

TICA:

ReconstrucNon ICA:

Data whitening

Le, et al., ICA with Reconstruc1on Cost for Eﬃcient Overcomplete Feature Learning. NIPS 2011

Why RICA?

Algorithms

Speed

Ease of training

Invariant Features

Sparse Coding RBMs/Autoencoders TICA ReconstrucNon ICA

Le, et al., ICA with Reconstruc1on Cost for Eﬃcient Overcomplete Feature Learning. NIPS 2011

Summary of RICA -‐  Two-‐layered network -‐  ReconstrucNon cost instead of orthogonality constraints -‐  Learns invariant features

ApplicaNons of RICA

AcNon recogniNon

Sit up

Eat

Run

Drive Car

Answer phone

Stand up

Le, et al., Learning hierarchical spa1o-‐temporal features for ac1on recogni1on with independent subspace analysis. CVPR 2011

Get Out of Car

Kiss

Shake hands

Le, et al., Learning hierarchical spa1o-‐temporal features for ac1on recogni1on with independent subspace analysis. CVPR 2011

94

55

KTH

92

51

90

49

88

47

86

43

45 41

84

39

82

37 35

80 Hessian/SURF

pLSA

HOF

GRBMs

3DCNN HMAX

HOG

76

UCF

85

Hessian/SURF

Learned Features

87

75

83

74

81

73

79

72

77

71

75

70

Hessian/SURF

Hollywood2

53

HOG

Hessian HOG.HOF

HOF

HOG3D

Learned Features

HOG/HOF

HOG3D

GRBMS

HOF

Learned Features

YouTube

Combined Engineered Features

Le, et al., Learning hierarchical spa1o-‐temporal features for ac1on recogni1on with independent subspace analysis. CVPR 2011

Learned Features

Cancer classiﬁcaNon

92%

ApoptoNc

90%

Viable tumor region

88%

86%

Necrosis

84% Hand engineered Features

… Le, et al., Learning Invariant Features of Tumor Signatures. ISBI 2012

RICA

Scaling up deep RICA networks

Scaling up Deep Learning

Deep learning data

Real data

It’s beQer to have more features! No maQer the algorithm, more features always more successful.

Coates, et al., An Analysis of Single-‐Layer Networks in Unsupervised Feature Learning. AISTATS’11

Most are local features

Local recepNve ﬁeld networks Machine #1

Machine #2

RICA features

Image

Le, et al., Tiled Convolu1onal Neural Networks. NIPS 2010

Machine #3

Machine #4

Challenges with 1000s of machines

Asynchronous Parallel SGDs

Parameter server

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Asynchronous Parallel SGDs

Parameter server

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Summary of Scaling up -‐  Local connecNvity -‐  Asynchronous SGDs

… And more -‐  RPC vs MapReduce -‐  Prefetching -‐  Single vs Double -‐  Removing slow machines -‐  OpNmized So]max -‐  …

10 million 200x200 images 1 billion parameters

Training RICA

RICA

Dataset: 10 million 200x200 unlabeled images from YouTube/Web Train on 2000 machines (16000 cores) for 1 week 1.15 billion parameters -‐  100x larger than previously reported -‐  Small compared to visual cortex

RICA

Image

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

The face neuron

Top sNmuli from the test set

OpNmal sNmulus by numerical opNmizaNon

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Random distractors Faces

Frequency

Feature value Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

0 pixels

20 pixels

Feature response

Feature response

Invariance properNes

0 pixels VerNcal shi]s

o 90

o 0 3D rotaNon angle

Feature response

Feature response

Horizontal shi]s

20 pixels

0.4x

1x

1.6x

Scale factor

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Top sNmuli from the test set

OpNmal sNmulus by numerical opNmizaNon

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Random distractors Pedestrians

Frequency

Feature value Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Top sNmuli from the test set

OpNmal sNmulus by numerical opNmizaNon

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Random distractors Cat faces

Frequency

Feature value Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

ImageNet classiﬁcaNon 22,000 categories 14,000,000 images Hand-‐engineered features (SIFT, HOG, LBP), SpaNal pyramid, SparseCoding/Compression

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

22,000 is a lot of categories… … smoothhound, smoothhound shark, Mustelus mustelus American smooth dogﬁsh, Mustelus canis Florida smoothhound, Mustelus norrisi whiteNp shark, reef whiteNp shark, Triaenodon obseus AtlanNc spiny dogﬁsh, Squalus acanthias Paciﬁc spiny dogﬁsh, Squalus suckleyi hammerhead, hammerhead shark smooth hammerhead, Sphyrna zygaena smalleye hammerhead, Sphyrna tudes shovelhead, bonnethead, bonnet shark, Sphyrna Nburo angel shark, angelﬁsh, SquaNna squaNna, monkﬁsh electric ray, crampﬁsh, numbﬁsh, torpedo smalltooth sawﬁsh, PrisNs pecNnatus guitarﬁsh roughtail sNngray, DasyaNs centroura buQerﬂy ray eagle ray spoQed eagle ray, spoQed ray, Aetobatus narinari cownose ray, cow-‐nosed ray, Rhinoptera bonasus manta, manta ray, devilﬁsh AtlanNc manta, Manta birostris devil ray, Mobula hypostoma grey skate, gray skate, Raja baNs liQle skate, Raja erinacea …

SNngray

Mantaray

Best sNmuli

Feature 1

Feature 2

Feature 3

Feature 4

Feature 5

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Best sNmuli

Feature 6

Feature 7

Feature 8

Feature 9

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Best sNmuli

Feature 10

Feature 11

Feature 12

Feature 13

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

0.005% 9.5% Random guess

State-‐of-‐the-‐art (Weston, Bengio ‘11)

?

Feature learning From raw pixels

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

0.005% 9.5% 15.8% Random guess

State-‐of-‐the-‐art (Weston, Bengio ‘11)

Feature learning From raw pixels

ImageNet 2009 (10k categories): Best published result: 17% (Sanchez & Perronnin ‘11 ), Our method: 20% Using only 1000 categories, our method > 50%

Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

Other results No maQer the algorithm, more features always more successful. -‐  We also have great features for -‐  Speech recogniNon -‐  Word-‐vector embedding for NLPs

Conclusions •  RICA learns invariant features •  Face neuron with totally unlabeled data with enough training and data •  State-‐of-‐the-‐art performances on –  AcNon RecogniNon –  Cancer image classiﬁcaNon –  ImageNet

0.005%

ImageNet 9.5%

Random guess

Best published result

15.8% Our method

94 92 90 88 86 84 82 80 Cancer classiﬁcaNon

Feature visualizaNon

AcNon recogniNon benchmarks

AcNon recogniNon

Face neuron

References •  Q.V. Le, M.A. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, A.Y. Ng. Building high-‐level features using large-‐scale unsupervised learning. ICML, 2012. •  Q.V. Le, J. Ngiam, Z. Chen, D. Chia, P. Koh, A.Y. Ng. Tiled Convolu8onal Neural Networks. NIPS, 2010. •  Q.V. Le, W.Y. Zou, S.Y. Yeung, A.Y. Ng. Learning hierarchical spa8o-‐temporal features for ac8on recogni8on with independent subspace analysis. CVPR, 2011. •  Q.V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A.Y. Ng. On op8miza8on methods for deep learning. ICML, 2011. •  Q.V. Le, A. Karpenko, J. Ngiam, A.Y. Ng. ICA with Reconstruc8on Cost for Eﬃcient Overcomplete Feature Learning. NIPS, 2011. •  Q.V. Le, J. Han, J. Gray, P. Spellman, A. Borowsky, B. Parvin. Learning Invariant Features for Tumor Signatures. ISBI, 2012. •  I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee, A.Y. Ng, Measuring invariances in deep networks. NIPS, 2009.

hQp://ai.stanford.edu/~quocle

Learning with Deep Cascades - Research at Google