Literature Review: ’Learning What and Where to Draw’ Arthur Nishimoto April 4, 2017

1

Main Innovations and Contribution

In their paper Learning What and Where to Draw, the authors build off of the notion of using Generative Adversarial Networks (GANs) to synthesize real-world images based off of a text description or label. The authors believe by incorporating finer grained details of the image such as a specific body part, action, and location on the image, a GAN can create more realistic and complex scenes. In order to accomplish this, the authors present a new model: Generative Adversarial What-Where Network (GAWWN) whose primary goal is to generate images based on text input of what should be in the image and where those elements should be located in the image. The major contributions of this model is to generate more realistic and higher resolution images based on the textual input, building a framework to isolate specific locations and features in the image, and finally a new dataset for tagging human poses for use in GAWWNs [2]. The conditional GAN classifies an input as valid if the image looks realistic and matches the input context. A convolution and recurrent text encoder is used to learn a classification function between the images and the text descriptions. As each image has multiple description captions, the image encoder takes the average of the 4 captions. The GAWWN consists of a bounding box and keypoint-conditional models. The Bounding box models takes input noise and the text embedding from the encoder and forms a feature map spatially warped to fit the normalized bounding box coordinates. Convolution and pooling operations are performed to reduce the spatial dimention back to 1 x 1 [2]. The keypoint-conditional text-to-image model processes the keypoint locations into a feature map. Each channel in the feature map corresponds to a body. After applied to a 2-stride convolution, the resulting vector has information on the content and part locations.

2

Datasets

The main datasets used is Caltech-UCSD Birds (CUB) and MPII Human Pose (MPH). CUB contains 11,788 images of birds across 200 species. The CUB images are augmented by the authors previous work on the dataset which trained a text encoder to improve image recognition and classification [3]. These included 10 single-sentence text descriptions on each bird image, the location of the bird via a bounding box, the x, y coordinates of 15 bird parts, and whether each of the 15 bird parts is visible in the current image. The MPH dataset contains over 25,000 images of 40,000 people. Each image is annotated with the x, y positions of up to 16 body joints and covers 410 human activities [1]. The authors collected 3 single-sentence text descriptions using Amazon’s Mechanical Turk to crowdsource workers to provide a description of the person and the activity being performed in each image. Only images containing a single person was used leaving around 19,000 of the original image set.

3

Results Evaluation

The overall results of the model are impressive, particularly with the comparison images with the author’s previous work as seen in figure 8 [2]. The generated humans are noticable more blurry, but are generally recognizable from the given text description.

1

4

Contribution to Project 3

The overall concept of using GANs to generate new images based on feature parameters is similar to what I want to accomplish in project 3. The ability to positionally tag a feature in an image and apply that to a model is a useful feature for any image-based GAN for building new imagry of almost anything, not just birds or humans in poses - provided the necessary training model.

References [1] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 3686–3693, June 2014. doi: 10.1109/CVPR.2014.471. [2] Scott E. Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. Learning what and where to draw. CoRR, abs/1610.02454, 2016. URL http://arxiv.org/abs/1610.02454. [3] Scott E. Reed, Zeynep Akata, Bernt Schiele, and Honglak Lee. Learning deep representations of finegrained visual descriptions. CoRR, abs/1605.05395, 2016. URL http://arxiv.org/abs/1605.05395.

2

Summary (.pdf) - GitHub

Apr 4, 2017 - The main datasets used is Caltech-UCSD Birds (CUB) and MPII Human Pose (MPH). CUB contains 11,788 images of birds across 200 species ...

92KB Sizes 4 Downloads 212 Views

Recommend Documents

Summary - GitHub
A desktop or laptop computer. 1. A Mac, PC, or Linux .... 10. 4. Specifying the CellProfiler name and type of image channels. 1. Select the NamesAndTypes ...

FASIDS development and dependency summary - GitHub
FASIDS development and dependency summary. System Components ... Web Framework Implementation. Software Tool: ... Landscape application. GET.

The summary of Tibbo Project System - GitHub
To achieve an economical basic unit price, we kept the onboard circuitry to the necessary minimum. For example, there is no built-in power supply – the boards directly accept only regulated +5V power. Real- world power processing (12V, 24V, PoE, et

Summary: Application of DDE in the imaging step - GitHub
This is an EXACT map from sky plane to the Visibilities in the UVW space! ... Because UV coverage is not continuous and infinite? Because UV ... Grid the data.

GLB_160222_Clashing Nations_contents Summary(ENG).pdf ...
Giant. Cheshire. Shadow. of Senillinean. Resident. Giant. Wolf. Nervous. Lesser Panda. Giant. Lynx. Nervous. Meerkat. Outrageous. Mushroom. Giant. Elk.

Feature Project Summary: Mobiles in Malawi Jopsa.org Summary
In the summer of 2008, an SMS-based communications network was implemented for a ... free computer program developed to act as a central text message hub. ... positive patients to support groups, and relays outreach HIV testing schedules ...

New Invoice Summary Page and Updated Billing Summary ...
You can find contact info on your invoice or in the Help Centre: http://adwords.google.com/support/bin/answer.py?answer=117601&hl=en_GB. Use our Help ...

New Invoice Summary Page and Updated Billing Summary ...
1) Log into your AdWords account at https://adwords.google.com.au. 2) Click ... http://adwords.google.com/support/bin/answer.py?answer=117601&hl=en_AU.

New Invoice Summary Page and Updated Billing Summary ...
New Invoice Summary Page and Updated Billing Summary Page for Monthly Invoiced AdWords Customers. We listened to feedback from our monthly invoiced ...

CHAPTER 20 SUMMARY Make a Summary CHAPTER ...
Unit 30 C Solutions Manual. 343. CHAPTER 20 SUMMARY. Make a Summary .... fragments are cut by certain restriction enzymes that always cleave at a specific location and specific site, such as SmaI (shown below), and when only the phosphodiester bonds

Lesson 6.3: Summary
Your search is sometimes simple and fast ... ... and sometimes it's a long piece of detective work. We've given you a set of tools—now start using them.

Summary Statistics
service, suggesting that a duopoly in Powder River Basin markets provides sufficient ..... We obtained the best statistical fit of our model by interacting the rail ...... Multimarket Contact and Cross-Ownership in the Mobile Telephone Industry,‖.

Weekly Incident Summary
Jul 26, 2016 - Please check our webpage for further updates http://www.haltonhills.ca/fire/index.php ... 905-873-2601 Ext. 2124. Email: [email protected].

Executive Summary ... -
partnership between the National Science Communication Institute (nSCI) and the ..... DPN, MetaArchive Cooperative, CLOCKSS. 5. ...... Executive Director for Operations and Director of Publications, Federation of ... Medha Devare, Data and Knowledge

Executive Summary - Agrion
And unlike current nuclear systems, Helion's fusion technology is inherently safe ... Dr. Slough has raised more than $15 M in government grants in his career.

Q & A Summary
Nintendo's basic policy is to expand the gaming population, and there have not ... Pokémon GO profits from The Pokémon Company greater than expected?

1 Summary - Sites
Nov 13, 2007 - The important features of system software are. 3 ... The overall performance (computation and communication) is normally ana- lyzed through.

LCC-Summary
computer program called BLCC, the Building. Life-Cycle ... Actual first cost vs operating cost data for glass type options for a ..... graphing purposes). • discounted ...

2018 Executive Summary - Edelman
Trust among the informed public—those with higher levels of income and education— declined slightly on a global level, ..... 25% also share or post content multiple times a week. FIG. 07. 50% consume news less than weekly. 41%. 25. 48. 23%. 89%.

2018 Executive Summary - Edelman
All respondents not including. Informed .... going direct to the end-users of information. ... the state of the world's trust in 2018, it is this. Even as people's trust in.

Research Summary
The topic of this research is the development of methodology for building computer systems capable of ... Two tasks are required: – Build logic forms, and ... ios. Next, we translate the A-CL texts and questions into our input language. We use the

Weekly Incident Summary
Jul 26, 2016 - scene staff met with building personnel who advised that both the alarm and burglar .... Please check our webpage for further updates.

executive summary - IDA Ireland
Associations. EC and Industry-led. EFFRA. A.SPIRE. E2B. EU Robotics. NETWORKS, ASSOCIATIONS AND INDUSTRIAL BODIES. Industry-led. Developing ... Environmental Quality. • Heating and cooling systems. • Insulation materials. • Material recycling.

executive summary - IDA Ireland
Horizon 2020 can be broken down by initiative areas under which “calls” for ...... in later rounds running for up to five years (predicted final project completion ...