Structured Prediction Daniel DeTone Ye Liu EECS, University of Michigan September 16th, 2014

Outline • • • •

What is Structured Prediction Simple Example Efficient Inference in Fully Connected CRFs Learning Message-Passing Inference Machines for Structured Prediction

What is Structured Prediction • Model - relates image to high-level info • •

Observation variables (usually describe image) Output variables (high-level label)

• Parameters of model • •

Interaction of image and label Use annotated data to learn good mapping • AKA - train the model

Example - Image Segmentation

Example - Image Segmentation • • •

Observation x (image) Binary label yi for each pixel • Foreground (yi=1), background (yi=0) Two interaction terms • Does pixel i look like the foreground? -> gi(yi, x) • Get local features i(x) around pixel i •



e.g. “if pixels around i are green, more likely to be background”

Is this label locally consistent? -> gi,j(yi, yj) • Pairwise, 4-connected neighbors

i

(x)’s

Yi’s

Example - Image Segmentation • Maximization problem over all n labels

i

(x)’s

Yi’s

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials •Summary: 1. Exact inference on fully connected CRFs (FC-CRFs) is intractable. Need to approximate. 2. Mean field approximation is used together with a computational trick to make it more efficient. 3. Incorporating long range dependency increases prediction accuracy .

Correlated Features •Naïve Bayes Li Xi1 Image source: coursera.com PGM lecture by Daphne Koller



Xik

Correlated Features •Naïve Bayes Li Xi1 Image source: coursera.com PGM lecture by Daphne Koller

Xik

Conditional Random Fields • Instead of modeling P(I, X), CRF models P(X|I) • CRF representation

CRF Learning • Training a graphical model using maximum likelihood estimation

or

Fully Connected CRF (FC-CRF)

Locally Connected

Fully Connected

Exact Inference on CRFs is HARD •How hard is it and why? •Partition function sums over instances, where L is the size of labeling space, N is number of pixels.

•How to solve it? •Use approximation

Mean Field Approximation •Use tractable distributions to approximate the original distribution •Becomes an optimization problem:

Efficient Message Passing • Key point of this paper: how to update Q efficiently, where

• Bottleneck: A naïve implementation of this has quadratic complexity

Efficient Message Passing Cont’d • Overcome bottleneck by convolution

where convolution can be done in linear time using permutohydral lattice data structure. • Bring down complexity from quadratic to linear!

Learning •Piecewise training • Train unary classifiers • appearance kernel

• •

smoothness kernel

Experimental Result • Convergence

Experimental Result Cont’d • Accuracy

Experimental Result Cont’d • Long Range Dependency

Recap •CRFs is a powerful technique to do structured prediction. •Fully connected CRFs can be expensive to learn. Approximation is needed. •Convolution in feature space can bring down computational complexity from quadratic to linear.

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Message Passing • Canonical method for approximate inference xf: features of variables connected to f yf: labels of variables connected to f Nv: factors connected to v Nv-f: factors connected to v except f Nf: variables connected to f Nf-v: variables connected to f except f mvf: message from factor to variable mfv: message from variable to factor

How to Unroll a Graph into a Sequential Predictor • mvf : marginal of variable v when factor f (and its influence) is removed • mvf is also known as a cavity marginal • Outgoing mvf classify v using incoming cavity marginals mv’f’ • mv’f’: cavity marginals connected to v by f’ ≠ f

Unroll a Graph into a Sequential Predictor • Learn parameters over inference model, not graphical model • Specifically, message passing inference • Similar to a deep neural network

Synchronous vs Asynchronous • Which is this?

Training the Inference Model • “Nice” properties for training • Assume same predictor at each node - small number of params • We know hidden variable’s target labels (1) • “Not nice” properties • Non i.i.d. - previous classifications are input for future classifications • Must set previous classification constant to get marginal - similar to neural net

Learning Synchronous Message-Passing - Local Training • At iteration n, use neighbors previously learnt predictors (from iteration n-1) • Set ideal marginals as target • Trick • Average all n marginals as final output • Error no worse than (average true loss of learnt predictors h1:N)

Learning Asynchronous Message-Passing Local Training • Messages sent immediately • Many more messages that synchronous -- impractical on large graphs • Use DAgger (Dataset Aggregation) • Creates new, aggregate dataset of inputs on each iteration • Fancy way of doing parameter sharing • Use same trick to average all iteration’s marginals

Global Training • If predictors are differentiable • Use back-propagation • Use local training as initialization point • Helps with non-convexities of network

Testing - 3D Point Cloud Classification • Five labels • Building, ground, poles/tree trunks, vegetation, wires • Creating graphical model • Five nearest points (local consistency) • Two k-means clusterings over points (global consistency) • Features: describe local geometry linear, planar, scattered, orientation

Testing •

• • •

BPIM - Asynchronous BP Inference Machine • BPIM-D: DAgger • BPIM-B: Backpropagation • BPIM-DB: DAgger + Backpropagation MFIM - Mean Field Inference Machine CRF - Baseline M3N-F - Max-Margin Markov Network • Comparison method

Results - 3D Point Cloud Classification • Results aren’t overly impressive • Performs as well as state of art • Baseline CRF performs pretty well! • Proposed method ~1% accuracy increase

Advantages • We can use a broader class of learning techniques • More control over accuracy vs speed • i.e. compute new features or change structure based on results of partial inference • No longer limited to

• Can leverage local information for hidden nodes • Rigorous guarantees on performance of inference • Error no worse than (average true loss of learnt predictors h1:N)

Disadvantages • Complex • Need to understand relationship between local and global training • Poor local training = poor global training • Performance? • Other disadvantages?

End

Motivations of Paper • Learning models with approx inference not well understood ●Reduce expressivity of GM ●Abandons many theoretical guarantees • Some good results from tying approx inference to graphical model at test time

Structured Prediction

Sep 16, 2014 - Testing - 3D Point Cloud Classification. • Five labels. • Building, ground, poles/tree trunks, vegetation, wires. • Creating graphical model.

5MB Sizes 0 Downloads 284 Views

Recommend Documents

What is structured prediction? - PDFKUL.COM
VW learning to search. 11. Hal Daumé III ([email protected]). Python interface to VW. Library interface to VW (not a command line wrapper). It is actually documented!!! Allows you to write code like: import pyvw vw = pyvw.vw(“--quiet”) ex1 = vw.examp

What is structured prediction? - GitHub
9. Hal Daumé III ([email protected]). State of the art accuracy in.... ➢ Part of speech tagging (1 million words). ➢ wc: ... iPython Notebook for Learning to Search.

Learning Ensembles of Structured Prediction Rules - NYU Computer ...
of the existing on-line-to-batch conversion tech- niques to the ... of selecting the best path expert (or collection of. 0. 1 h11 ... hp1. 2 ... gorithm of (Mohri, 1997); and the Follow the Per- ...... Proceedings of the 1997 IEEE ASRU Workshop, page

Learning Ensembles of Structured Prediction Rules - NYU Computer ...
Let αt−1 ∈ RN denote the vector obtained after t − 1 iterations and et the tth unit vector in RN . We denote ..... Efficient algorithms for online decision prob- lems.

Prediction of Thematic Rank for Structured Semantic ...
To import structural information, re-ranking technique is emplied to incorporate ... the subject is the O [=Object, i.e., Patient/Theme]. ▻ John broke the window ...

Ensemble Methods for Structured Prediction - NYU Computer Science
and possibly most relevant studies for sequence data is that of Nguyen & Guo ... for any input x ∈ X, he can use the prediction of the p experts h1(x),... ...... 336, 1999. Schapire, R. and Singer, Y. Boostexter: A boosting-based system for text ..

Learning Ensembles of Structured Prediction Rules - NYU Computer ...
tion and computer vision, with structures or sub- .... with a large set of classes as in the problem we .... of selecting the best path expert (or collection of. 0. 1 h11.

Ensemble Methods for Structured Prediction - NYU Computer Science
http://ai.stanford.edu/˜btaskar/ocr/. It con- tains 6,877 word instances with a total of .... 2008. URL http://www.cs.cornell.edu/people/ · tj/svm_light/svm_struct.html.

Prediction of Thematic Hierarchy for Structured ...
archy that is a rank relation restricting syntactic realization of arguments. De- tection of thematic hierarchy is formu- lated as a classification problem through.

Training Structured Prediction Models with Extrinsic Loss ... - Slav Petrov
loss function with the addition of constraints based on unlabeled data. .... at least one example in the training data where the k-best list is large enough to include ...

Structured Output Prediction using Covariance-based ...
Substituting (2) in (1), we obtain an analytic solution for the optimization problem as. Ψ = (K + λI)−1Φl,. (3) where Φl is the column vector of [Φl(yi) ∈ FY. ]n i=1.

Sum-Product Networks for Structured Prediction - Signal Processing ...
tional undirected graphical model and as b) sum-product network. Dashed edges ..... For every seg- ment of the recording, we computed 13 Mel frequency cep-.

Ensemble Methods for Structured Prediction - NYU Computer Science
may been used for training the algorithms that generated h1(x),...,hp(x). ...... Finally, the auto-context algorithm of Tu & Bai (2010) is based on experts that are ...

Sum-Product Networks for Structured Prediction - Signal Processing ...
c ,h. (l) i,c)l. Model Optimization. The model weights w = (wk) are optimized to maximize the logarithm of the conditional likelihood over the training set, i.e.. F(w, P) = N. ∑ n=1 log p(yn|xn), where P = 1(y1, x1),..., (yN , xN )l is a given labe

A Structured Prediction Approach for Statistical ...
For sequence labeling problem, the standard loss function is. Hamming distance, which measures the difference between the true output and the predicting one:.

A Structured Prediction Approach for Statistical ... - ACL Anthology
Abstract. We propose a new formally syntax-based method for statistical machine translation. Transductions between parsing trees are transformed into a problem of sequence tagging, which is then tackled by a search- based structured prediction method

Experimental Results Prediction Using Video Prediction ...
RoI Euclidean Distance. Video Information. Trajectory History. Video Combined ... Training. Feature Vector. Logistic. Regression. Label. Query Feature Vector.

Prediction markets
Management and Sustainable Development, Vol. ... The essential problem of management is to transform a company's strategic objectives .... used by Siemens to predict a large software project's completion date. .... Boca Raton, Florida, USA.

Prediction markets - CiteSeerX
management, logistics, forecasting and the design of production systems. ... research into and assessment of business applications of various forecasting ...

Prediction markets - CiteSeerX
aggregation and transmission of information through prices. Twenty years ... The first business application however took place some years later. In Ortner .... that will provide the environment for hosting such business games is already under.

Prediction markets
subjects such as data mining and prediction markets. I. Tatsiopoulos is a Professor in Production ... data-driven, self-adaptive method that comprises a universal non-linear functional approximation and has an extensive .... Other considerations conc

Structured Teaching Education Program
Education. Program. STEP-by-STEP... our students learn how to read, write and compute ... general education environment. ... assistive technology and behavior.

NO HAZARDS - Climate Prediction Center
Climate Prediction Center's Central America Hazards Outlook. October 19 – October 25, 2017. Heavy rains were observed in parts of Guatemala, Honduras and ...