Learning Visual Representations at Scale April 2014 Vincent Vanhoucke

A quick introduction
 Tech Lead on the Deep Learning Infrastructure team at Google.

! ! ! ! ! ! ! http://vincent.vanhoucke.com

!2

Meet the Hammer 


!3

The Hammer
 • ImageNet Classification with Deep Convolutional Neural Networks
 Alex Krizhevsky Ilya Sutskever, Geoffrey E. Hinton

Many variations and improvements: • Visualizing and Understanding Convolutional Networks
 Matthew D Zeiler, Rob Extraction Fergus DSP Feature Acoustic Model Language Model • OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
 Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun

!4

Thankfully 
 Nails abound...

!5

The Nails
 Image Search

Photo OCR

Image Labeling

Video Annotation

Image Segmentation

Video Recommendation

Object Detection

Fine-grained Classification

Object Tracking

Robot Perception

See also: CNN Features off-the-shelf: an Astounding Baseline for Recognition
 Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Stefan Carlsson !6

Example: Fine Grained Classification
 https://sites.google.com/site/fgcomp2013/results

! ! ! ! ! ! ! ! Secret recipe: Alex’s ImageNet + additional training with task-specific data = Tadaaa... !7

Agenda 
 1. A Better Hammer Factory 2. Beyond the Hammer

!8

Parallelizing Deep Network Training

!9

Model Parallelism

Machine

!10

Exchange O(batch x edge nodes) Values

Machine Core

!11

Data Parallelism

Model

Workers

Data

Subsets !12

Exchange O(weights) Values. Parameter Server

Model

Workers

Data

Subsets !13

Distributed Asynchronous SGD Parameter Server

∆p

p’ = p + ∆p

p’

Model

Workers

Data

Subsets Large Scale Distributed Deep Networks Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng, NIPS 2012

!14

Problem Solved?
 • Not particularly efficient in terms of speedup / additional core. • Approach only works well for low compute density: Compute Density = FLOPs / MBITs

•In a typical Google datacenter: • CPUs: low FLOPs compared to GPUs. • high MBITs: blazing fast networking. • Compare to a typical multi-GPU setup: • high FLOPs: 90+% theoretical efficiency. • low MBITs: GPUs behind PCI bus. !15

The Future
 All tradeoffs are constantly changing with technology!

• NVidia Maxwell • Intel Knights Landing • GPUDirect • Infiniband • ... Can we design for heterogeneous environments, without strong assumptions about compute density? !16

Disclaimer / Credits
 ! !

All actual ideas and results by Alex Krizhevsky. !

Arxiv paper forthcoming:
 One Weird Trick for Parallelizing
 Convolutional Neural Networks Open Source Implementation forthcoming. !17

Two Ways to Parallelize
 Model Parallelism

Data Parallelism

• Workers train different • Workers train on different subsets of the model.

data examples.

• Parameters (gradients) • Parameters (gradients) get are local to one machine. shipped around workers.

• Data (activations) get

• Data (activations) are local

shipped around workers. to one machine.

!18

Key Idea from Alex


• Use model parallelism when we have a small

parameters / activation ratio. (hint: convolutions!)

• Use data parallelism when we have a large

parameters / activation ratio (fully connected layers)

!19

Hybrid Approach for Convolutions


!20

What Happens Here?


?

!21

Data Parallelism to Model Parallelism
 A. Broadcast all-to-all.


Each convolution sends its data to all fully connected layers:

• Big synchronization point. • LOTS of bursty network traffic. B. Broadcast one-to-all. Each convolution takes turn sending its data to all fully connected layers.

• All but one of the data transfers can overlap with computation. • Communication / computation ratio can also be tuned by cutting up batches into smaller chunks.

!22

All-to-one Broadcast Training Procedure


!23

Results on ILSVRC 2012 using K20s


• Almost 4x speedup for 4x the hardware! • Compare to a 2.2x speedup in 115h on 4 Titans in: Multi-GPU Training of ConvNets
 Omry Yadan; Keith Adams; Yaniv Taigman; Marc'Aurelio Ranzato

• > 6x for 8x the hardware. Note that 8 GPU setup is not on same PCIe bus! !24

Beyond the Hammer Separable Convolutions Class-independent detection Convolutional Nets for Video 


!25

Separable Convolutions
 ! ! !

Work by Laurent Sifre. ! !

Drawing inspiration from: Rotation, Scaling and Deformation Invariant
 Scattering for Texture Discrimination, CVPR 2013
 Laurent Sifre, Stephane Mallat !26

Key Idea: Convolutional Filters are Redundant
 ! First Convolution Layer of ImageNet

!

!27

Key Idea: Convolutional Filters are Redundant
 Second Convolution Layer of ImageNet:

• Not a particularly new insight! • Well exploited in: Predicting Parameters in Deep Learning Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato and Nando de Freitas

• Here is a simpler way to take advantage of this redundancy.

• Related: Network In Network Min Lin, Qiang Chen and Shuicheng Yan !28

Typical Convolution


ID = Input Depth OD = Output Depth W = Patch Width H = Patch Height

!29

Separable Convolution
 DM = Depth Multiplier

!30

Separable Convolutions in Numbers (Zeiler & Fergus architecture)


Layer

ID

OD

W/H

DM

conv params

1st

3

96

7

4

14112

8 2nd

96

256

5

4 8

614k

separable Difference params

1740

87%

3480

75%

107k

82%

215k

64%

!31

Separable Convolutions


• Converges in 20% fewer steps on ImageNet. • Faster inference. • Identical to slightly better final accuracy. • Very easy to implement. • No benefits on smaller tasks (e.g. Cifar10).

!32

Scaling Detection Tasks to Many Classes
 ! ! !

Work by Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov.

!33

CNN-based Object Detection
 • Rules the world! Deep Neural Networks for Object Detection
 Szegedy, Toshev, Erhan, NIPS’13 OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
 Sermanet, Eigen, Zhang, Mathieu, Fergus, LeCun

• Is Slow! (I miss AdaBoost cascades...) • Is difficult to scale to large number of classes.

!34

Scaling up Detection
 • Class-independent detector: Find ‘interesting’ things on
 the image!

• No sliding window Use a sparse set of
 proposals from a CNN

• Competitive on


VOC2007 and ILSVRC2012

! Scalable Object Detection using Deep Neural Networks,
 D. Erhan, C. Szegedy, A. Toshev, D. Anguelov, accepted at CVPR 2014 !35

Convolutional Architectures for Video
 ! ! !

Work by Andrej Karpathy, Sanketh Shetty,
 George Toderici , Rahul Sukthankar,
 Thomas Leung and Li Fei-Fei.

!36

What does the Video ‘Hammer’ Look Like?


• We don’t know (yet). ! ! ! !

• But we’re getting an idea: temporal structure, multi-resolution, context, information fusion.

• Huge, interesting computational challenge. !37

Lots of Data + Transfer Learning


• Train on YouTube, fine-tune on UCF-101 ! ! ! ! ! Large-scale Video Classification using Convolutional Neural Networks A. Karpathy, S. Shetty, G. Toderici, R. Sukthankar, T. Leung and L. Fei-Fei Accepted at CVPR 2014. !38

Conclusion On running out of cheesy Hammer analogies...

!39

Concluding Half-Bakery
 • Big models + task-specific transfer learning Amazingly competitive on many tasks: Image, Video, Speech… This is new! Machine Learning used to be very brittle. ‘That’s How The Brain Works’ ™ • Computation is forever the bottleneck If I could train 10x bigger models for everything with 90% dropout, I would. On many tasks (video!), training speed is fungible with accuracy.

! • We’re hiring!

!40

Google at ICLR
 • Deep Convolutional Ranking for Multilabel Image Annotation Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe • Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,
 Greg S. Corrado, Jeffrey Dean • Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, Vinay Shet • Unit Tests for Stochastic Optimization Tom Schaul, Ioannis Antonoglou, David Silver • Learning Factored Representations in a Deep Mixture of Experts David Eigen; Marc'Aurelio Ranzato; Ilya Sutskever • Intriguing Properties of Neural Networks Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,
 Rob Fergus

!41

Thank You !

! [email protected]

!42

ICLR Invited Talk

Key Idea from Alex. 19. • Use model parallelism when we have a small parameters / activation ratio. (hint: convolutions!) • Use data parallelism when we have a ...

2MB Sizes 3 Downloads 262 Views

Recommend Documents

Invited RED: Invited RED - Biotechnology Industry Organization
Track 2. Track 3. Track 4. Track 5. Advanced Biofuels and Biorefinery ... Renewable Oil Feedstocks for the Pacific. Rim .... Roger Sedjo, Resources for the Future.

Invited RED: Invited RED - Biotechnology Industry Organization
Tim Hsiau, University of California -Berkeley. William Kenealy ... James Carothers, University of Washington -. Seattle ... Wim Vermaas, Arizona State University.

you're invited! -
Page 1 ... The Larimer County Workforce Center invites youth, young adults & parents to: * Get the inside scoop from local employers. * Apply to the Larimer County Conservation Corps (14-24). * Learn about summer jobs, internships & MORE. Loveland Pu

Invited Speakers
... ideas through both specific oral/poster presentations and free discussions. ... Mandatory online registration: from March 1st, 2014 on the workshop website.

Talk, Talk, Talk Student Example.pdf
... below to open or edit this item. Talk, Talk, Talk Student Example.pdf. Talk, Talk, Talk Student Example.pdf. Open. Extract. Open with. Sign In. Main menu.

Consequential Conditionals: Invited and Suppressed ...
Although pervasive in everyday reasoning, consequential con- ditionals have not yet been a topic of psychological research.2 We provide a characterization of those statements, a detailed experi- mental account of the inferences they invite, and a dis

Low power cryptography (invited)
Bank accounts, medical records and personal ... Creating padding bytes and blinding values for encryption, as initial values for transmission sequence counters ...

Abstracts of invited talks
teaching mathematics, which will help make the learning of mathematics interesting ..... The National Curriculum Framework (NCF, 2005) suggests the need of more ...... http://www.eurekalert.org/pub_releases/2002-02/aiop-ohs021202.php.

Medical Innovation Bill: Comments invited -
GENERAL EXPLANATORY NOTE: [. Words in bold ... "pilot health centre" shall mean a private or a government-owned hospital or other health service provider ...

DOTE campus, Chennai-600025 Proposals invited ... -
Jun 25, 2013 - Tamilnadu State Council for Science and Technology has been ... The Utilization Certificate (UC) and Statement of Expenditure (SE) should be.

You are invited to a Dementia Strategy Town Hall Discussion.
Hosted by the Ministry of Health and Long-Term Care, the Windsor Town ... creating more coordinated and accessible services, enhancing skills of the dementia ...

1. Applications are invited by 5011 ASC BN (MT) - DAVP
advertisement. 0 k oci , 20H. (b) Physical and written tests for all categories will be conducted on - - ~ - -' at 5011 ASC Bn. (MT), Jalandhar Cantt for candidates w ...

Invited SIG: designing for the living room tv experience
May 10, 2012 - app development; interaction design; user studies for tv; understanding living room; designing for tv. ACM Classification Keywords. H.5.m.

New Scouts & Guides Application invited-mentors kerala.pdf
most celebrated child centered and activity oriented educational system to the harbinger of the. new educational policy and method. As the children are divided ...

Red Text: Invited Advanced Biofuels and Biorefinery Platforms
Dec 9, 2013 - optimization of its FAST recovery technology to produce biobased .... Ventures from the Engineering ... innovations and new ventures utilizing.

Untitled - We Talk Games
Alex Kidd. Miracle. World. The Mega Cartridge. SEGA. Page 13. Black. Belt. The Mega Cartridge. SEGA. Page 14. Ghost. House. The Sega Card. Ghost. House.

You're invited FAMILY OPEN GYM BRING A FRIEND! REGISTRATION ...
REGISTRATION FORM. Bring this form to check-in at ... Name: Grade: Love the Lord your God with all your heart and with all your soul and with all your mind ...

Invited List MSc Microbiology 2nd counseling.pdf
77 GEN PG17112433 45.5 MRIDULA BHADRA MADHAV CHANDRA BHADRA. Page 3 of 22. Invited List MSc Microbiology 2nd counseling.pdf. Invited List MSc ...

Red Text: Invited Advanced Biofuels and Biorefinery Platforms
Dec 9, 2013 - the development of Advanced .... such production facility will be online ...... developers some degree of freedom to ...... Classifier web tool.

Invited SIG: designing for the living room tv experience
May 10, 2012 - hardware aspects of the user experience for TV. This. SIG will be ... app development; interaction design; user studies for tv; understanding ...

You're invited FAMILY OPEN GYM BRING A FRIEND! - Bloomingdale ...
Illustrations by Deon Williamson. BRING A FRIEND! REGISTRATION FORM. Bring this ... God is the best iend I could ever have. Earn. 25. Points. Hey Campers!

Can you Talk or only Touch-Talk?
We implemented the one-way calling system so to be immediately deployed from any phone capable of sending DTMF signals. No software was installed on the ...

invited-to-your-supportive-shift-hypnotherapy-im-david-find-positive ...
... loading more pages. Retrying... invited-to-your-supportive-shift-hypnotherapy-im-davi ... end-including-close-by-hypnotherapy-1499589419341.pdf.