Pattern recognition Notes 1.pdf

Viewer
Transcript

Introduction to Pattern Recognition Jason Corso SUNY at Buffalo

15 January 2013

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

1 / 41

Examples of Pattern Recognition in the Real World

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

2 / 41

Examples of Pattern Recognition in the Real World

Examples of Pattern Recognition in the Real World

Hand-Written Digit Recognition

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

3 / 41

Examples of Pattern Recognition in the Real World

Examples of Pattern Recognition in the Real World Computational Finance and the Stock Market

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

4 / 41

Examples of Pattern Recognition in the Real World

Examples of Pattern Recognition in the Real World Bioinformatics and Gene Expression Analysis

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

5 / 41

Examples of Pattern Recognition in the Real World

Examples of Pattern Recognition in the Real World Biometrics

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

6 / 41

Examples of Pattern Recognition in the Real World

Examples of Pattern Recognition in the Real World It is also a Novel by William Gibson!

Do let me know if you want to borrow it! J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

7 / 41

Pattern Recognition By Example

Example: Sorting Fish

Salmon

Sea Bass

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

8 / 41

Pattern Recognition By Example

Example: Sorting Fish Pattern Recognition System Requirements

Set up a camera to watch the fish coming through on the conveyor belt. Classify each fish as salmon or sea bass. Prefer to mistake sea bass for salmon.

J. Corso (SUNY at Buffalo)

Introduction to

FIGURE 1.1. The objects to be classified are first sensed by a transducer (camera), whose signals are preprocessed. Next the features are extracted and finally the classification is emitted, here either “salmon” or “sea bass.” Although the information flow is often chosen to be from the source to the classifier, some systems employ information Pattern Recognition January 2013 41 flow in which earlier levels of processing can be15 altered based on the tentative9or/pre-

Pattern Recognition By Example

A Note On Preprocessing Inevitably, preprocessing will be necessary. Preprocessing is the act of modifying the input data to simplify subsequent operations without losing relevant information. Examples of preprocessing (for varying types of data): Noise removal. Element segmentation; Spatial. Temporal.

Alignment or registration of the query to a canonical frame. Fixed transformation of the data: Change color space (image specific). Wavelet decomposition.

Transformation from denumerable representation (e.g., text) to a 1-of-B vector space.

Preprocessing is a key part of our Pattern Recognition toolbox, but we will talk about it directly very little in this course. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

10 / 41

Pattern Recognition By Example

Patterns and Models Ideal State Space

The Space of All Fish

Salmon Sea Bass

Clear that the populations of salmon and sea bass are indeed distinct. The space of all fish is quite large. Each dimension is defined by some property of the fish, most of which we cannot even measure with the camera. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

11 / 41

Pattern Recognition By Example

Patterns and Models Real State Space The Space of All Fish Given a Set of Features

Salmon Sea Bass

When we choose a set of possible features, we are projecting this very high dimension space down into a lower dimension space. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

12 / 41

Pattern Recognition By Example

Patterns and Models Features as Marginals The Space of All Fish Given a Set of Features

Salmon Sea Bass

Marginal (A Feature)

And indeed, we can think of each individual feature as a single marginal distribution over the space. In other words, a projection down into a single dimension space. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

13 / 41

Pattern Recognition By Example

Patterns and Models Models The Space of All Fish

Salmon Sea Bass

Sea Bass

Salmon Of Models

We build a model of each phenomenon we want to classify, which is an approximate representation given the features we’ve selected. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

14 / 41

Pattern Recognition By Example

Patterns and Models Models

The overarching goal and approach in pattern classification is to hypothesize the class of these models, process the sensed data to eliminate noise (not due to the models), and for any sensed pattern choose the model that corresponds best. -DHS

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

15 / 41

Pattern Recognition By Example

Modeling for the Fish Example

Selecting Feature(s) for the Fish Suppose an expert at the fish packing plant tells us that length is the best feature. We cautiously trust this expert. Gather a few examples from our installation to analyze the length feature. These examples are our training set. Want to be sure to gather a representative population of them. We analyze the length feature by building histograms: marginal distributions.

J. Corso (SUNY at Buffalo)

Histogram of the Length Feature salmon

sea bass

count 22 20 18 16 12 10 8 6 4 2 0

length 5

10

15

20

25

l*

FIGURE 1.2. Histograms for the length feature for the two categories. No single threshold value of the length will serve to unambiguously discriminate between the two categories; using length alone, we will have some errors. The value marked l ∗ will lead to the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and c 2001 by John Wiley & Sons, Inc. David G. Stork, Pattern Classification. Copyright

Introduction to Pattern Recognition

15 January 2013

16 / 41

Pattern Recognition By Example

Modeling for the Fish Example

Selecting Feature(s) for the Fish Suppose an expert at the fish packing plant tells us that length is the best feature. We cautiously trust this expert. Gather a few examples from our installation to analyze the length feature. These examples are our training set. Want to be sure to gather a representative population of them. We analyze the length feature by building histograms: marginal distributions.

Histogram of the Length Feature salmon

sea bass

count 22 20 18 16 12 10 8 6 4 2 0

length 5

10

15

20

25

l*

FIGURE 1.2. Histograms for the length feature for the two categories. No single threshold value of the length will serve to unambiguously discriminate between the two categories; using length alone, we will have some errors. The value marked l ∗ will lead to the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and c 2001 by John Wiley & Sons, Inc. David G. Stork, Pattern Classification. Copyright

But this is a disappointing result. The sea bass length does exceed the salmon length on average, but clearly not always. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

16 / 41

Pattern Recognition By Example

Modeling for the Fish Example

Selecting Feature(s) for the Fish Lightness Feature

Try another feature after inspecting the data: lightness. count

salmon

14

sea bass

12 10 8 6 4 2 0 2

4

x* 6

lightness 8

10

FIGURE 1.3. Histograms for the lightness feature for the two categories. No single threshold value x ∗ (decision boundary) will serve to unambiguously discriminate between the two categories; using lightness alone, we will have some errors. The value x ∗ marked will lead to the smallest number of errors, on average. From: Richard O. Duda, c 2001 by John Peter E. Hart, and David G. Stork, Pattern Classification. Copyright Wiley & Sons, Inc.

This feature exhibits a much better separation between the two classes. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

17 / 41

Pattern Recognition By Example

Modeling for the Fish Example

Feature Combination Seldom will one feature be enough in practice. In the fish example, perhaps lightness, x1 , and width, x2 , will jointly do better than any alone. This is an example of a 2D feature space: x x= 1 . (1) x2 width 22

salmon

sea bass

21 20 19 18 17 16 15 14

lightness 2

4

6

8

10

J. Corso (SUNY at 1.4. Buffalo) Introduction to Pattern Recognition 15 The January FIGURE The two features of lightness and width for sea bass and salmon. dark2013

18 / 41

Key Ideas in Pattern Recognition

Curse Of Dimensionality

The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

19 / 41

Key Ideas in Pattern Recognition

Curse Of Dimensionality

The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

19 / 41

Key Ideas in Pattern Recognition

Curse Of Dimensionality

The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? Is there a point where we have too many features?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

19 / 41

Key Ideas in Pattern Recognition

Curse Of Dimensionality

The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? Is there a point where we have too many features? How do we know beforehand which features will work best?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

19 / 41

Key Ideas in Pattern Recognition

Curse Of Dimensionality

The two features obviously separate the classes much better than one alone. This suggests adding a third feature. And a fourth feature. And so on. Key questions How many features are required? Is there a point where we have too many features? How do we know beforehand which features will work best? What happens when there is feature redundance/correlation?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

19 / 41

Key Ideas in Pattern Recognition

Decision Boundaries and Generalization

Decision Boundary

The decision boundary is the sub-space in which classification among multiple possible outcomes is equal. Off the decision boundary, all classification is unambiguous. width 22

count

salmon

14

sea bass

salmon

sea bass

21

12

20

10

19 8

18 6

17

4

16

2

15

0 2

4

x* 6

lightness 8

10

14

lightness 2

4

6

8

10

FIGURE 1.3. Histograms for the lightness feature for the two categories. single FIGURENo 1.4. The two features of lightness and width for sea bass and salmon. The dark threshold value x ∗ (decision boundary) will serve to unambiguouslyline discriminate be-as a decision boundary of our classifier. Overall classification error on could serve ∗ tween the two categories; using lightness alone, we will have some errors. The value the data shownx is lower than if we use only one feature as in Fig. 1.3, but there will marked will lead to the smallest number of errors, on average. From: Richard O. Duda, still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern c 2001 by John Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Classification . Copyright Wiley & Sons, Inc.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

20 / 41

Key Ideas in Pattern Recognition

Decision Boundaries and Generalization

Bias-Variance Dilemma Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22

salmon

width 22

sea bass

salmon

width 22

sea bass

21

21

21

20

20

20

19

19

19

18

18

18

17

17

17

16

16

16

15

15

14

lightness 2

4

6

8

10

salmon

sea bass

?

15

14

lightness 2

4

6

8

10

14

lightness 2

4

6

8

10

FIGURE 1.5. Overly complex FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The dark FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be- models for the fish will lead to decision boundaries t line could serve as a decision boundary of our classifier. Overall classification error on are complicated. tween performance on the training set and simplicity of classifier, therebyWhile givingsuch the a decision may lead to perfect classification of our train the data shown is lower than if we use only one feature as highest in Fig. 1.3, but thereonwill it would poor performance on future patterns. The novel test po accuracy new patterns. From: Richard O. Duda,samples, Peter E. Hart, and lead DavidtoG. still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern marked ? is Inc. evidently most likely a salmon, whereas the complex decision bound c 2001 by John Wiley & Sons, Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Classification. Copyright

shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a c 2001 by John Wiley & Sons, Inc David G. Stork, Pattern Classification. Copyright

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

21 / 41

Key Ideas in Pattern Recognition

Decision Boundaries and Generalization

Bias-Variance Dilemma Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22

salmon

width 22

sea bass

salmon

width 22

sea bass

21

21

21

20

20

20

19

19

19

18

18

18

17

17

17

16

16

16

15

15

14

lightness 2

4

6

8

10

salmon

sea bass

?

15

14

lightness 2

4

6

8

10

14

lightness 2

4

6

8

10

FIGURE 1.5. Overly complex FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The dark FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be- models for the fish will lead to decision boundaries t line could serve as a decision boundary of our classifier. Overall classification error on are complicated. tween performance on the training set and simplicity of classifier, therebyWhile givingsuch the a decision may lead to perfect classification of our train the data shown is lower than if we use only one feature as highest in Fig. 1.3, but thereonwill it would poor performance on future patterns. The novel test po accuracy new patterns. From: Richard O. Duda,samples, Peter E. Hart, and lead DavidtoG. still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern marked ? is Inc. evidently most likely a salmon, whereas the complex decision bound c 2001 by John Wiley & Sons, Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Classification. Copyright

Simple decision boundaries (e.g., linear) seem to miss some obvious Pattern Classification trends in the data — variance.

shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a c 2001 by John Wiley & Sons, Inc . Copyright David G. Stork,

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

21 / 41

Key Ideas in Pattern Recognition

Decision Boundaries and Generalization

Bias-Variance Dilemma Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22

salmon

width 22

sea bass

salmon

width 22

sea bass

21

21

21

20

20

20

19

19

19

18

18

18

17

17

17

16

16

16

15

15

14

lightness 2

4

6

8

10

salmon

sea bass

?

15

14

lightness 2

4

6

8

10

14

lightness 2

4

6

8

10

FIGURE 1.5. Overly complex FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The dark FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be- models for the fish will lead to decision boundaries t line could serve as a decision boundary of our classifier. Overall classification error on are complicated. tween performance on the training set and simplicity of classifier, therebyWhile givingsuch the a decision may lead to perfect classification of our train the data shown is lower than if we use only one feature as highest in Fig. 1.3, but thereonwill it would poor performance on future patterns. The novel test po accuracy new patterns. From: Richard O. Duda,samples, Peter E. Hart, and lead DavidtoG. still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern marked ? is Inc. evidently most likely a salmon, whereas the complex decision bound c 2001 by John Wiley & Sons, Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Classification. Copyright

Simple decision boundaries (e.g., linear) seem to miss some obvious Pattern Classification trends in the data — variance. Complex decision boundaries seem to lock onto the idiosyncracies of the training data set — bias.

shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a c 2001 by John Wiley & Sons, Inc . Copyright David G. Stork,

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

21 / 41

Key Ideas in Pattern Recognition

Decision Boundaries and Generalization

Bias-Variance Dilemma Depending on the available features, complexity of the problem and classifier, the decision boundaries will also vary in complexity. width 22

salmon

width 22

sea bass

salmon

width 22

sea bass

21

21

21

20

20

20

19

19

19

18

18

18

17

17

17

16

16

16

15

15

14

lightness 2

4

6

8

10

salmon

sea bass

?

15

14

lightness 2

4

6

8

10

14

lightness 2

4

6

8

10

FIGURE 1.5. Overly complex FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The dark FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be- models for the fish will lead to decision boundaries t line could serve as a decision boundary of our classifier. Overall classification error on are complicated. tween performance on the training set and simplicity of classifier, therebyWhile givingsuch the a decision may lead to perfect classification of our train the data shown is lower than if we use only one feature as highest in Fig. 1.3, but thereonwill it would poor performance on future patterns. The novel test po accuracy new patterns. From: Richard O. Duda,samples, Peter E. Hart, and lead DavidtoG. still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern marked ? is Inc. evidently most likely a salmon, whereas the complex decision bound c 2001 by John Wiley & Sons, Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Classification. Copyright

Simple decision boundaries (e.g., linear) seem to miss some obvious Pattern Classification trends in the data — variance. Complex decision boundaries seem to lock onto the idiosyncracies of the training data set — bias. A central issue in pattern recognition is to build classifiers that can work properly on novel query data. Hence, generalization is key. Can we predict how well our classifier will generalize to novel data?

shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a c 2001 by John Wiley & Sons, Inc . Copyright David G. Stork,

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

21 / 41

Key Ideas in Pattern Recognition

Cost and Decision Theory

Decision Theory In many situations, the consequences of our classifications are not equally costly. Recalling the fish example, it is acceptable to have tasty pieces of salmon in cans labeled sea bass. But, the converse is not so. Hence, we need to adjust our decisions (decision boundaries) to incorporate these varying costs. count

salmon

14

sea bass

12 10 8

For the lightness feature on the fish, we would want to move the boundary to smaller values of lightness.

6 4 2 0 2

4

x* 6

lightness 8

10

FIGURE 1.3. Histograms for the lightness feature for the two categories. No single threshold value x ∗ (decision boundary) will serve to unambiguously discriminate between the two categories; using lightness alone, we will have some errors. The value x ∗ marked will lead to the smallest number of errors, on average. From: Richard O. Duda, c 2001 by John Peter E. Hart, and David G. Stork, Pattern Classification. Copyright Wiley & Sons, Inc.

Our underlying goal is to establish a decision boundary to minimize the overall cost; this is called decision theory.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

22 / 41

Key Ideas in Pattern Recognition

Definitions

Pattern Recognition

First in-class quiz: can you define Pattern Recognition?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

23 / 41

Key Ideas in Pattern Recognition

Definitions

Pattern Recognition

First in-class quiz: can you define Pattern Recognition? DHS: Pattern recognition is the act of taking in raw data and taking an action based on the “category” of the pattern.

DHS: Pattern classification is to take in raw data, eliminate noise, and process it to select the most likely model that it represents.

Jordan: The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying data into different categories.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

23 / 41

Key Ideas in Pattern Recognition

Definitions

Types of Pattern Recognition Approaches

Statistical Focus on statistics of the patterns. The primary emphasis of our course.

Syntactic Classifiers are defined using a set of logical rules. Grammars can group rules.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

24 / 41

Key Ideas in Pattern Recognition

Definitions

Feature Extraction and Classification Feature Extraction — to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. Invariant features—those that are invariant to irrelevant transformations of the underlying data—are preferred.

Classification — to assign an category to the object based on the feature vector provided during feature extraction.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

25 / 41

Key Ideas in Pattern Recognition

Definitions

Feature Extraction and Classification Feature Extraction — to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. Invariant features—those that are invariant to irrelevant transformations of the underlying data—are preferred.

Classification — to assign an category to the object based on the feature vector provided during feature extraction. The perfect feature extractor would yield a representation that is trivial to classify. The perfect classifier would yield a perfect model from an arbitrary set of features. But, these are seldom plausible.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

25 / 41

Key Ideas in Pattern Recognition

Definitions

Feature Extraction and Classification Classification Objective Functions

For classification, there are numerous underlying objective functions that we can seek to optimize. Minimum-Error-Rate classification seeks to minimize the the error rate: the percentage of new patterns assigned to the wrong category. Total Expected Cost, or Risk minimization is also often used. Important underlying questions are How do we map knowledge about costs to best affect our classification decision? Can we estimate the total risk and hence know if our classifier is acceptable even before we deploy it? Can we bound the risk?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

26 / 41

Key Ideas in Pattern Recognition

No Free Lunch Theorem

No Free Lunch Theorem

A question you’re probably asking is What is the best classifier? Any ideas?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

27 / 41

Key Ideas in Pattern Recognition

No Free Lunch Theorem

No Free Lunch Theorem

A question you’re probably asking is What is the best classifier? Any ideas? We will learn that indeed no such generally best classifier exists. This is described in the No Free Lunch Theorem. If the goal is to obtain good generalization performance, there are no context-independent or usage-independent reasons to favor one learning or classification method over another. When confronting a new pattern recognition problem, appreciation of this thereom reminds us to focus on the aspects that matter most—prior information, data distribution, amount of training data, and cost or reward function.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

27 / 41

Key Ideas in Pattern Recognition

Analysis By Synthesis

Analysis By Synthesis

The availability of large collections of data on which to base our pattern recognition models is important. In the case of little data (and sometimes even in the case of much data), we can use analysis by synthesis to test our models. Given a model, we can randomly sample examples from it to analyze how close they are to our few examples and what we expect to see based on our knowledge of the problem.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

28 / 41

Key Ideas in Pattern Recognition

Classifier Ensembles

Classifier Ensembles

Classifier combination is obvious – get the power of multiple models for a single decision. But, what happens when the different classifiers disagree? How do we separate the available training data for each classifier? Should the classifiers be learned jointly or in silos? Examples Bagging Boosting Neural Networks (?)

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

29 / 41

Key Ideas in Pattern Recognition

Classifier Ensembles

SO MANY QUESTIONS...

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

30 / 41

Schedule of Topics

Schedule of Topics 1

Introduction to Pattern Recognition

2

Tree Classifiers 1 2

Getting our feet wet with real classifiers

Decision Trees: CART, C4.5, ID3. Random Forests

3

Bayesian Decision Theory

4

Linear Discriminants 1 2 3

5

2 3

Discriminative Classifiers: the Decision Boundary

Separability Perceptrons Support Vector Machines

Parametric Techniques Theory 1

Grounding our inquiry

Generative Methods grounded in Bayesian Decision

Maximum Likelihood Estimation Bayesian Parameter Estimation Sufficient Statistics

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

31 / 41

Schedule of Topics

6

Non-Parametric Techniques 1 2 3

7

Kernel Density Estimators Parzen Window Nearest Neighbor Methods

Unsupervised Methods 1

Component Analysis and Dimension Reduction 1 2 3 4

2

The Curse of Dimensionality Principal Component Analysis Fisher Linear Discriminant Locally Linear Embedding

Clustering 1 2 3

8

Exploring the Data for Latent Structure

K-Means Expectation Maximization Mean Shift

Classifier Ensembles (Bagging and Boosting) 1 2

Bagging Boosting / AdaBoost

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

32 / 41

Schedule of Topics

9

Graphical Models Machine Learning 1 2 3

Introductory ideas and relation back to earlier topics Bayesian Networks Sequential Models 1 2 3

10

State-Space Models Hidden Markov Models Dynamic Bayesian Networks

Algorithm Independent Topics Learned Tools 1 2 3 4

11

The Modern Language of Pattern Recognition and

Theoretical Treatments in the Context of

No Free Lunch Theorem Ugly Duckling Theorem Bias-Variance Dilemma Jacknife and Bootstrap Methods

Other Items Time Permitting 1 2

Syntactic Methods Neural Networks

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

33 / 41

Coding and Experiment Environments

Code / Environments Course material will be enriched with code examples and problems. We will use both Matlab and Python.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

34 / 41

Coding and Experiment Environments

Code / Environments Course material will be enriched with code examples and problems. We will use both Matlab and Python. Why Python (and Matlab)? 1

2

3

4 5

6 7

Matlab is the language of the PR/ML/CVIP realm. You will get exposed to it outside of this course. . . Python is maturing and becoming increasingly popular for projects both within PR/ML/CVIP and beyond. So, I want to expose you to this alternate reality. Preparation with Python in 555 may be more useful to a graduate in the job-hunt than some of the 555 material itself, e.g. Google does a lot with Python. Python is free as in beer. Some of the constructs in Python are easier to work with than other high-level languages, such as Matlab or Perl. Python is cross-platform. Numpy and Scipy are available.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

34 / 41

Coding and Experiment Environments

Python Introduction to Python Slides (from inventor of Python) Introduction to NumPy/SciPy http://www.scipy.org/Getting_Started http://www.scipy.org/NumPy_for_Matlab_Users

We will use the Enthought Python Distribution as our primary distribution (version 7.3). http://enthought.com/products/epd.php Available on the CSE network. https://wiki.cse.buffalo.edu/ services/content/enthought-python-distribution Python 2.7 Packages up everything we need into one simple, cross-platform package.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

35 / 41

Coding and Experiment Environments

Python Introduction to Python Slides (from inventor of Python) Introduction to NumPy/SciPy http://www.scipy.org/Getting_Started http://www.scipy.org/NumPy_for_Matlab_Users

We will use the Enthought Python Distribution as our primary distribution (version 7.3). http://enthought.com/products/epd.php Available on the CSE network. https://wiki.cse.buffalo.edu/ services/content/enthought-python-distribution Python 2.7 Packages up everything we need into one simple, cross-platform package.

You should become Python-capable so you can work with many of the examples I give. J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

35 / 41

Wrap-Up

Logistical Things

Read the course webpage (now and regularly): http://www.cse.buffalo.edu/~jcorso/t/ CSE555

THE BOOK

Read the syllabus: http://www.cse.buffalo.edu/~jcorso/t/ CSE555/files/syllabus.pdf Read the course mailing list: [email protected]

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

36 / 41

Wrap-Up

Logistical Things

Policy on reading and lecture notes Lecture notes are provided (mostly) via pdf linked from the course website. For lectures that are given primarily on the board, no notes are provided. It is always in your best interest to attend the lectures rather than exclusively read the book and notes. The notes are provided for reference. In the interest of the environment, I request that you do NOT print out the lecture notes. The lecture notes linked from the website may be updated time to time based on the lecture progress, questions, and errors. Check back regularly.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

37 / 41

Wrap-Up

Grading and Course Evaluation

There will be homeworks posted after each topic. The homeworks are to be done alone or in groups. Solutions will be posted. No homeworks will be turned in or graded. There will be a quiz once a week. Each quiz will have one rote question and one longer question; ten minutes of class time will be allotted to quizzes each week. 14 quizzes will be given. 2 lowest will be dropped. Quizzes will be on Tuesday or Thursday; you will not know in advance. Quizzes will be in-class, independent, closed-book. Quizzes will not require a calculator. Assessments of this type force you to study continuously throughout the term. See syllabus for more information.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

38 / 41

Wrap-Up

Testimonials (err, Evaluation Comments) 455 Slightly advanced math for an undergrad CSE student; I felt bombarded with math; This is a statistics class. I would have liked to see more in depth walkthroughs. . . cemented with real numbers. This will rarely happen in the course. First, there is a lot of material to cover. Second, you can work through these while you study; active study. Third, there are recitations/hours with the TA to work through these.

I appreciated the balance between powerpoint and blackboard; there was good reason to attend class.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

39 / 41

Wrap-Up

Testimonials (err, Evaluation Comments) 455 Slightly advanced math for an undergrad CSE student; I felt bombarded with math; This is a statistics class. I would have liked to see more in depth walkthroughs. . . cemented with real numbers. This will rarely happen in the course. First, there is a lot of material to cover. Second, you can work through these while you study; active study. Third, there are recitations/hours with the TA to work through these.

I appreciated the balance between powerpoint and blackboard; there was good reason to attend class. The (hands-down) most interesting class I’ve taken to-date; Very cool course. Really cool field.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

39 / 41

Wrap-Up

Testimonials (err, Evaluation Comments) 555 The course requires a very strong foundation in probability theory. . . it would have been a lot easier if the professor [reviewed this material in the beginning of the semester]. Students are expected to be fluent in probability theory and have a fresh review of the material. Take responsibility.

I need more detailed examples on the course material; More time should be spent with examples. High-level examples on plausible data sets are indeed shown throughout the course. Source code is also given to allow self-experimentation.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

40 / 41

Wrap-Up

Testimonials (err, Evaluation Comments) 555 The course requires a very strong foundation in probability theory. . . it would have been a lot easier if the professor [reviewed this material in the beginning of the semester]. Students are expected to be fluent in probability theory and have a fresh review of the material. Take responsibility.

I need more detailed examples on the course material; More time should be spent with examples. High-level examples on plausible data sets are indeed shown throughout the course. Source code is also given to allow self-experimentation.

This is the best course I have taken so far in UB; This course is great; This class stimulated me to go into the field of Machine Learning.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

40 / 41

Wrap-Up

Parting Comments, Online Materials The nature of lecture courses in higher ed is in flux. Free, online courses are abundant. https://www.coursera.org/course/ml

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

41 / 41

Wrap-Up

Parting Comments, Online Materials The nature of lecture courses in higher ed is in flux. Free, online courses are abundant. https://www.coursera.org/course/ml

So, why are you here?

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

41 / 41

Wrap-Up

Parting Comments, Online Materials The nature of lecture courses in higher ed is in flux. Free, online courses are abundant. https://www.coursera.org/course/ml

So, why are you here? I will run this course to best possible embrace the worthwhile material available yet make good use of my own time. I pay very specific attention to the material selected in my course and marry it well with the other courses here at Buffalo. I will link to online video lectures and related material when possible. The in-class time will be rich with interactive questions and discussion, which is crucial to understanding material.

J. Corso (SUNY at Buffalo)

Introduction to Pattern Recognition

15 January 2013

41 / 41

Pattern recognition Notes 1.pdf

J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 15 January 2013 4 / 41. Page 4 of 58. Pattern recognition Notes 1.pdf. Pattern recognition Notes ...

Download PDF

16MB Sizes 1 Downloads 334 Views

Report

Pattern recognition Notes 1.pdf

Recommend Documents