Forecaster’s dilemma: Extreme events and forecast evaluation

Sebastian Lerch, Thordis Thorarinsdottir, Francesco Ravazzolo and Tilmann Gneiting Conference on Predictability and Multi-Scale Prediction of High Impact Weather, Landshut, October 2017

Motivation

http://www.spectator.co.uk/features/8959941/whats-wrong-with-the-met-office/

Outline

1. Probabilistic forecasting and forecast evaluation 2. The forecaster’s dilemma 3. Proper forecast evaluation for extreme events

Evaluation of probabilistic forecasts: Proper scoring rules

0

2

4

6

8

10

Evaluation of probabilistic forecasts: Proper scoring rules A (negatively oriented) proper scoring rule is any function S(F , y ) such that for all F , G , 0

2

4

6

8

10

EY ∼G S(G , Y ) ≤ EY ∼G S(F , Y ).

Evaluation of probabilistic forecasts: Proper scoring rules A (negatively oriented) proper scoring rule is any function S(F , y ) such that for all F , G , 0

2

4

6

8

10

Popular examples include the logarithmic score LogS(F , y ) = − log(f (y ))

EY ∼G S(G , Y ) ≤ EY ∼G S(F , Y ).

Evaluation of probabilistic forecasts: Proper scoring rules A (negatively oriented) proper scoring rule is any function S(F , y ) such that for all F , G , 0

2

4

6

8

10

EY ∼G S(G , Y ) ≤ EY ∼G S(F , Y ).

Popular examples include the logarithmic score LogS(F , y ) = − log(f (y ))

the continuous ranked probability score Z ∞ CRPS(F , y ) = (F (z)−1{y ≤ z})2 dz −∞

Outline

1. Probabilistic forecasting and forecast evaluation 2. The forecaster’s dilemma 3. Proper forecast evaluation for extreme events

Media attention often exclusively falls on prediction performance in the case of extreme events

http://www.theguardian.com/business/2009/jan/24/nouriel-roubini-credit-crunch

Toy example We compare Alice’s and Bob’s forecasts for Y ∼ N (0, 1), FAlice = N (0, 1),

FBob = N (4, 1)

Toy example We compare Alice’s and Bob’s forecasts for Y ∼ N (0, 1), FAlice = N (0, 1),

FBob = N (4, 1)

Based on all 10 000 replicates, Forecaster

CRPS

LogS

Alice Bob

0.56 3.53

1.42 9.36

Toy example We compare Alice’s and Bob’s forecasts for Y ∼ N (0, 1), FAlice = N (0, 1),

FBob = N (4, 1)

Based on all 10 000 replicates, Forecaster

CRPS

LogS

Alice Bob

0.56 3.53

1.42 9.36

When the evaluation is restricted to the largest ten observations, Forecaster Alice Bob

R-CRPS

R-LogS

2.70 0.46

6.29 1.21

Verifying only the extremes erases propriety

Some econometric papers use the restricted logarithmic score R-LogS≥r (F , y ) = −1{y ≥ r } log f (y ).

Verifying only the extremes erases propriety

Some econometric papers use the restricted logarithmic score

0.2

Density

0.0

E R-LogS≥r (H, Y ) < E R-LogS≥r (F , Y )

f h

0.1

However, if h(x) > f (x) for all x ≥ r , then

0.3

0.4

R-LogS≥r (F , y ) = −1{y ≥ r } log f (y ).

independently of the true density.

−2

0

2 x

4

The forecaster’s dilemma

Given any (non-trivial) proper scoring rule S and any non-constant weight function w , any scoring rule of the form S ∗ (F , y ) = w (y )S(F , y ) is improper.

The forecaster’s dilemma

Given any (non-trivial) proper scoring rule S and any non-constant weight function w , any scoring rule of the form S ∗ (F , y ) = w (y )S(F , y ) is improper.

Forecaster’s dilemma: Forecast evaluation based on a subset of extreme observations only corresponds to the use of an improper scoring rule and is bound to discredit skillful forecasters.

Outline

1. Probabilistic forecasting and forecast evaluation 2. The forecaster’s dilemma 3. Proper forecast evaluation for extreme events

Proper weighted scoring rules provide suitable alternatives Gneiting and Ranjan (2011) propose the threshold-weighted CRPS Z ∞ twCRPS(F , y ) = (F (z) − 1{y ≤ z})2 w (z) dz −∞

Proper weighted scoring rules provide suitable alternatives Gneiting and Ranjan (2011) propose the threshold-weighted CRPS Z ∞ twCRPS(F , y ) = (F (z) − 1{y ≤ z})2 w (z) dz −∞

The weight function w (z) can be tailored to the situation of interest, for example, to emphasize the right tail, windicator (z) = 1{z ≥ r }, or wGaussian (z) = Φ(z|µr , σr2 ) Parameters r , µr , σr can be motivated by applications at hand. Gneiting, T. and Ranjan, R. (2011) Comparing density forecasts using threshold- and quantile-weighted scoring rules. Journal of Business and Economic Statistics, 29, 411–422.

Toy example revisited Recall Alice’s and Bob’s forecasts for Y ∼ N (0, 1), FAlice = N (0, 1), based on all 10 000 replicates

FBob = N (4, 1) based the largest 10 observations

Forecaster

CRPS

LogS

Forecaster

Alice Bob

0.56 3.53

1.42 9.36

Alice Bob

R-CRPS

R-LogS

2.70 0.46

6.29 1.21

Toy example revisited Recall Alice’s and Bob’s forecasts for Y ∼ N (0, 1), FAlice = N (0, 1), based on all 10 000 replicates

FBob = N (4, 1) based the largest 10 observations

Forecaster

CRPS

LogS

Forecaster

Alice Bob

0.56 3.53

1.42 9.36

Alice Bob

R-CRPS

R-LogS

2.70 0.46

6.29 1.21

threshold-weighted CRPS, with indicator weight w (z) = 1{z ≥ 2} and Gaussian weight w (z) = Φ(z|µr = 2, σ = 1) Forecaster Alice Bob

windicator

wGaussian

0.076 2.355

0.129 2.255

Case study: Probabilistic wind speed forecasting ●●

● ●

I

Forecasts and observations of daily maximum wind speed







● ●











● ●



● ●













●●



● ●





● ● ●



● ●

● ● ●



● ●

● ● ●







● ●

● ● ●







● ●

● ●

● ●

● ●





●●













● ● ●

●●

● ●

● ● ●

● ●

● ● ● ●





● ● ●



● ●

● ●

● ●

● ●

● ●











● ●

● ●







● ●

●●

● ●



● ●





● ●



● ●



● ●











● ●













● ●

● ● ●

● ●

● ●●





































Evaluation period: May 2010 – April 2011









I

● ●

● ●











Prediction horizon of 1-day ahead 228 observation stations over Germany

● ●



● ● ● ●





I









● ●





● ●

● ●

● ● ●







I



● ● ●

● ● ●



Case study: Probabilistic wind speed forecasting ●●

● ●

I

Forecasts and observations of daily maximum wind speed













● ●

















●●



● ●





● ● ●



● ●

● ● ●



● ●

● ● ●







● ●

Probabilistic forecasts: I

ECMWF ensemble (maximum over forecast period)

I

Bob: for every forecast case,

● ● ●







● ●

● ●

● ●

● ●





●●













● ● ●

●●

● ●

● ● ●

● ●

● ● ● ●





● ● ●



● ●

● ●

● ●

● ●

● ●









● ●











● ●

●●

● ●



● ●





● ●



● ●



● ●











● ●













● ●

● ● ●

● ●

● ●●





























● ●



F = N (15, 1)















Evaluation period: May 2010 – April 2011









I

● ●

● ●











Prediction horizon of 1-day ahead 228 observation stations over Germany

● ●



● ● ● ●





I









● ●





● ●

● ●

● ● ●







I



● ● ●

● ● ●



Case study: Results based on all observations Forecaster

CRPS

ECMWF Bob

1.26 8.49

Case study: Results based on all observations

based on observations > 14 m/s

Forecaster

CRPS

Forecaster

ECMWF Bob

1.26 8.49

ECMWF Bob

R-CRPS 6.87 1.80

Case study: Results based on all observations

based on observations > 14 m/s

Forecaster

CRPS

Forecaster

ECMWF Bob

1.26 8.49

ECMWF Bob

R-CRPS 6.87 1.80

threshold-weighted CRPS, with indicator weight w (z) = 1{z ≥ 14} and Gaussian weight w (z) = Φ(z|µr = 14, σ = 1) Forecaster ECMWF Bob

windicator

wGaussian

0.059 0.653

0.063 0.761

Summary and conclusions I

Forecaster’s dilemma: Verification on extreme events only is bound to discredit skillful forecasters.

I

The only remedy is to consider all available cases when evaluating predictive performance.

I

Proper weighted scoring rules emphasize specific regions of interest, such as tails, and facilitate interpretation, while avoiding the forecaster’s dilemma.

I

In particular, the weighted versions of the CRPS share (almost all of) the desirable properties of the unweighted CRPS.

Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F. and Gneiting, T. (2017) Forecaster’s dilemma: Extreme events and forecast evaluation. Statistical Science, 32, 106–127.

Summary and conclusions I

Forecaster’s dilemma: Verification on extreme events only is bound to discredit skillful forecasters.

I

The only remedy is to consider all available cases when evaluating predictive performance.

I

Proper weighted scoring rules emphasize specific regions of interest, such as tails, and facilitate interpretation, while avoiding the forecaster’s dilemma.

I

In particular, the weighted versions of the CRPS share (almost all of) the desirable properties of the unweighted CRPS.

Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F. and Gneiting, T. (2017) Forecaster’s dilemma: Extreme events and forecast evaluation. Statistical Science, 32, 106–127.

Thank you for your attention!

Forecaster's dilemma: Extreme events and forecast evaluation

Evaluation of probabilistic forecasts: Proper scoring rules. 0. 2. 4. 6. 8. 10 .... Parameters r,µr ,σr can be motivated by applications at hand. Gneiting, T. and ...

1MB Sizes 0 Downloads 230 Views

Recommend Documents

Forecaster's dilemma: Extreme events and forecast ...
Forecaster's dilemma: Extreme events and forecast evaluation. Sebastian Lerch. Karlsruhe Institute of Technology. Heidelberg Institute for Theoretical Studies.

Forecaster's dilemma: Extreme events and forecast ...
Non-homogeneous regression models. 1. Truncated normal model (TN). Following Thorarinsdottir and Gneiting (2010), set. Y |X1,...,Xk ∼ N[0,∞)(µ, σ2), where µ = a + b. ¯. X and σ2 = c + d · 1 k. ∑k i=1(Xi − ¯X)2.

Extreme events and forecast evaluation - 7th International Verification ...
Advertisement. R package scoringRules (joint work with Alexander Jordan and. Fabian Krüger). ▷ implementations of popular proper scoring rules for ...

Extreme events and forecast evaluation - 7th International Verification ...
Sebastian Lerch. Karlsruhe Institute of Technology ... Popular examples of proper scoring rules include. ▷ the logarithmic ... Journal of Business and. Economic ...

pdf-1876\global-climate-change-and-extreme-weather-events ...
... the apps below to open or edit this item. pdf-1876\global-climate-change-and-extreme-weather ... tious-disease-emergence-workshop-summary-by-fo.pdf.

Extreme events in optics: Challenges of the ...
Aug 23, 2010 - problems have led to an absence of extensive data sets generated .... intensity fluctuations in SC generation had already been analysed to ...

Extreme events in optics: Challenges of the ...
Aug 23, 2010 - Centre de Mathématique et de Leurs Applications (CMLA), ENS Cachan, France ... these dynamics and the associated statistical behaviour. .... propagate data signals over long distances in fibre optics networks since the ...

Inattentive professional forecasters
the conferences ESEM 2009, AFSE 2010, CESIfo on “Macroeconomics and Survey Data” and SED 2010 for useful comments. .... business condition survey to the US manufacturing production index. Coibion & ...... This approach cannot explain the large de

Events, Conferences and Meetups
You can search for public calendar information so you can always keep up-to-date on the latest schedule of events,conferences and meetups.

Extreme Science, Extreme Parenting, and How to Make ...
Full PDF The Boy Who Played with Fusion: Extreme Science, Extreme Parenting, and How to Make a Star, PDF ePub Mobi The Boy Who Played with Fusion: ...

Evaluation of a robot learning and planning via Extreme ...
we propose to achieve the optimization step with the Cross. Entropy algorithm for ... filtering (EKF) [10] are valuable tools. ... context of stochastic optimization.

Seminar and Events
Digital Design using Root Locus. Design using Frequency Response. Digital PID Controllers. Deadbeat ... Processor Selection. Additional discussions upon ...

Package 'forecast'
Oct 4, 2011 - Depends R (>= 2.0.0), graphics, stats, tseries, fracdiff, zoo. LazyData yes .... Largely wrappers for the acf function in the stats package. The main ...

Forecast Broker -
Catalogue - Service – for – the -Web (CSW), OPeNDAP,. NetCDF ... Forecast Broker: idea & design. 15 maart 2016 ... Archive. Forecast Broker Web Application.

Dilemma story.pdf
What are the possible ways that Charlotte could have avoided this dilemma? Page 1 of 1. Dilemma story.pdf. Dilemma story.pdf. Open. Extract. Open with.

october forecast .pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. october forecast ...

Semiparametric forecast intervals
May 25, 2010 - include quarterly inflation fan charts published by the Bank of ... in Wiley Online Library ... Present value asset pricing models for exchange rates (e.g., Engel .... has also shown that the ANW estimator has good boundary ...