Predicting Response in Mobile Advertising with ...

Viewer
Transcript

Predicting Response in Mobile Advertising with Hierarchical Importance-Aware Factorization Machine Richard Oentaryo, Ee-Peng Lim, Jia-Wei Low, Mike Finegold WSDM 2014

PART 1: PROBLEM

Our focus

Mobile Adver1sing Landing page

Response rates (click, conversion)

Bids

arg max (bid x rate)

Ads Page Mobile user (Adapted from Agarwal & Chen, 2011)

Publisher

Choose best ads

Ad network

Adver1sers

Response Predic1on Task

# clicks pCTR = # exposes

•  Goal: Predict click-‐through rate (pCTR) for a given (page, ad) pair at a parDcular Dme •  Challenges: –  Temporal dynamics –  Cost-‐varying –  Sparse data / cold-‐start

En11es known to us: -‐  Webpage (= page ID) -‐  Ad (= campaign ID) -‐  Time (= day of week)

•  Desiderata: –  Accurate pCTR esDmate à crucial for ad price aucDon –  Good pCTR ranking à eﬀecDve placement of ads in a page

Related Works: A Rough Taxonomy Response predicDon

Explicit feature-‐based

Latent feature-‐based

•  Richardson et al., WWW, 2007 •  Dave & Varma, SIGIR, 2010

(Log)-‐linear model •  Craswell et al, WSDM, 2008 •  Agarwal et al, WWW, 2009 •  Agarwal et al, KDD, 2010

Matrix factorizaDon

Tensor factorizaDon

•  Menon et al, KDD, 2011 •  Shen et al, WSDM, 2012 •  Yin et al., WSDM, 2014 •  This work

PART 2: DATASET

BuzzCity Ad Network

Dataset: 05-‐31 October 2012 emin = 1

emin = 10

emin = 100

emin = 1000

24,172,134

10,535,658

3,587,160

931,032

Page hierarchy

No. of webpages

244,341

138,351

55,260

16,374

No. of publishers

3,945

3,539

2,654

1,643

No. of countries

243

239

226

199

No. of channels

8

8

8

8

Ad hierarchy

En1ty

Minimum no. of ad exposes (emin)

No. of ads

23,500

18,365

15,600

10,877

No. of adverDsers

1,989

1,406

1,245

1,124

5

4

3

3

No. of records

No. of banner types

PART 3: APPROACH

day d

?

K latent factors

page p

Goal: Predict unknown pCTR Challenge: Sparse data, cold-‐start cases

page x ad page x day

ad a

ad x day

ad a

page p

K latent factors

pCTR

Tensor factorizaDon

K latent factors

Response Predic1on Framework

day d

Unifying Model: Factoriza1on Machine Factoriza1on machine is a generic bilinear model (Rendle2012)

Linear regression

Two-‐way interacDon

In this work, we use the following feature representaDon:

x = (0,..., 0,1, 0,..., 0, 0,..., 0,1, 0,..., 0, 0,...0,1, 0,..., 0) #pages

#ads

#days

Result: Pairwise tensor factoriza1on

Model Interpreta1on

x = (0,..., 0,1, 0,..., 0, 0,..., 0,1, 0,..., 0, 0,...0,1, 0,..., 0) #pages

#days

#ads K

K

K

yˆ = w0 + w p + wa + wd + ∑ vk, p vk,a + ∑ vk, p vk,d + ∑ vk,a vk,d k=1

Global bias

Local bias

k=1

Pairwise interacDons

k=1

Importance-‐Aware Learning WEIGHTED COST FUNCTIONS

Individual

Individual = (page, ad, day) triplet Share = no. of exposures

Share/weight

Handling Cold Start: Hierarchy Structure Page hierarchy Channel Publisher

Ad hierarchy AdverDser

Banner Type

Country

Page

Ad

Hierarchical structures as prior probability and a means for back-‐oﬀ

Hierarchical Learning 1

Regulariza1on

Intui1on: Each latent factor should have a prior that makes it more similar to its parents

2

Fi]ng

Idea: Reasonable prior of the parents can be obtained by aggregaDng click/expose data Click

Expose

Hierarchical + Importance-‐Aware Learning

Hierarchical regularizaDon

StochasDc gradient descent (Square and logisDc loss)

Weighted update + hierarchical regularizaDon

Coordinate descent (Cyclic and stochasDc)

PART 4: EXPERIMENTS

Experiment Setup Valida1on Evalua1on

Metrics

Training set

Test set

Trial 1

05-‐11 Oct

12-‐13 Oct

Trial 2

07-‐13 Oct

14-‐15 Oct

Trial 3

09-‐15 Oct

16-‐17 Oct

Trial 4

11-‐17 Oct

18-‐19 Oct

Trial 5

13-‐19 Oct

20-‐21 Oct

Trial 6

15-‐21 Oct

22-‐23 Oct

Trial 7

17-‐23 Oct

24-‐25 Oct

Trial 8

19-‐25 Oct

26-‐27 Oct

Trial 9

21-‐27 Oct

28-‐29 Oct

Trial 10

23-‐29 Oct

30-‐31 Oct

1

PredicDon metric: •  Root mean square error •  NegaDve log-‐likelihood

2

Ranking metric: •  Area under ROC curve

Performance Benchmark Regression

0.013 0.011 0.009 0.007 0.005

100%

Baseline

Proposed methods

Proposed methods

80% 70% 60% 50% 40% 30%

0.003

20%

0.001

10%

-‐0.001

Baseline

90% Ad ranking (wAUC)

pCTR predic1on error (wRMSE)

0.015

Ranking

0%

Min. exposure emin = 1000

Performances in Cold Start Situa1ons Observa1ons 1)  Results improve when using hierarchical regularizaDon and ﬁlng 2)  Results for importance-‐aware and unweighted learning are not directly comparable

Min. exposure emin = 1000

PART 5: CONCLUSION

Main Contribu1ons •  A uniﬁed tensor factorizaDon model catering for temporal dynamics, importance-‐aware and hierarchical learning •  Simple yet eﬀecDve extensions of SGD and CD algorithms for handling importance weights and hierarchical regularizaDon •  PracDcal usage and result improvements showcased on real mobile adverDsing data

FUTURE WORKS

•  Need to scale up with parallelizaDon or more eﬃcient data representaDon •  Enhance interpretability of the model, e.g., non-‐negaDvity constraints in the latent factors •  Explore model applicaDons in other tasks, e.g., item adopDon

Varia1ons at Diﬀerent Hierarchy Levels Page x adverDser

Channel x adverDser

Observa1ons

1)  CTR values vary across diﬀerent days x ad increases as we go up in the 2)  Publisher CTR variaDons hierarchy à possibility for “back-‐oﬀ”

Channel x ad

Performance Benchmark

Min. exposure emin = 10

Performance Benchmark

Min. exposure emin = 100

Performance Benchmark

Min. exposure emin = 1000

Performances in Cold Start Situa1ons

Min. exposure emin = 100

Performances in Cold Start Situa1ons

Min. exposure emin = 10

Predicting Response to Exposure Treatment in PTSD ...