Predicting Response in Mobile Advertising with Hierarchical Importance-Aware Factorization Machine Richard Oentaryo, Ee-Peng Lim, Jia-Wei Low, Mike Finegold WSDM 2014

PART  1:  PROBLEM  

Our   focus  

Mobile  Adver1sing   Landing     page  

Response  rates   (click,  conversion)  

Bids  

arg  max  (bid  x  rate)  

  Ads     Page   Mobile   user   (Adapted  from     Agarwal  &  Chen,  2011)    

Publisher  

Choose   best  ads  

Ad   network  

Adver1sers  

Response  Predic1on  Task  

# clicks pCTR = # exposes

•  Goal:  Predict  click-­‐through  rate  (pCTR)  for  a  given  (page,   ad)  pair  at  a  parDcular  Dme   •  Challenges:   –  Temporal  dynamics   –  Cost-­‐varying   –  Sparse  data  /  cold-­‐start  

En11es  known  to  us:   -­‐  Webpage  (=  page  ID)   -­‐  Ad  (=  campaign  ID)   -­‐  Time  (=  day  of  week)  

•  Desiderata:   –  Accurate  pCTR  esDmate  à  crucial  for  ad  price  aucDon   –  Good  pCTR  ranking  à  effecDve  placement  of  ads  in  a  page  

Related  Works:  A  Rough  Taxonomy   Response  predicDon  

Explicit  feature-­‐based  

Latent  feature-­‐based  

•  Richardson  et  al.,  WWW,  2007   •  Dave  &  Varma,  SIGIR,  2010  

(Log)-­‐linear   model   •  Craswell  et  al,  WSDM,  2008   •  Agarwal  et  al,  WWW,  2009   •  Agarwal  et  al,  KDD,  2010  

Matrix   factorizaDon  

Tensor   factorizaDon  

•  Menon  et  al,  KDD,  2011   •  Shen  et  al,  WSDM,  2012   •  Yin  et  al.,  WSDM,  2014   •  This  work  

PART  2:  DATASET  

BuzzCity  Ad  Network  

Dataset:  05-­‐31  October  2012   emin  =  1  

emin  =  10  

emin  =  100  

emin  =  1000  

24,172,134  

10,535,658  

3,587,160  

931,032  

Page hierarchy

No.  of  webpages  

244,341  

138,351  

55,260  

16,374  

No.  of  publishers  

3,945  

3,539  

2,654  

1,643  

No.  of  countries  

243  

239  

226  

199  

No.  of  channels  

8  

8  

8  

8  

Ad hierarchy

En1ty  

Minimum  no.  of  ad  exposes  (emin)  

No.  of  ads  

23,500  

18,365  

15,600  

10,877  

No.  of  adverDsers  

1,989  

1,406  

1,245  

1,124  

5  

4  

3  

3  

No.  of  records  

No.  of  banner  types  

PART  3:  APPROACH  

day  d  

?  

K  latent   factors  

page  p  

Goal:  Predict  unknown  pCTR   Challenge:  Sparse  data,  cold-­‐start  cases  

page  x  ad   page  x  day  

ad  a  

ad  x  day  

ad  a  

page  p  

K  latent   factors  

pCTR  

Tensor   factorizaDon  

K  latent   factors  

Response  Predic1on  Framework  

day  d  

Unifying  Model:  Factoriza1on  Machine   Factoriza1on  machine  is  a  generic  bilinear  model  (Rendle2012)  

Linear  regression  

Two-­‐way  interacDon  

In  this  work,  we  use  the  following  feature  representaDon:  

x = (0,..., 0,1, 0,..., 0, 0,..., 0,1, 0,..., 0, 0,...0,1, 0,..., 0) #pages  

#ads  

#days  

Result:  Pairwise  tensor  factoriza1on  

Model  Interpreta1on  

x = (0,..., 0,1, 0,..., 0, 0,..., 0,1, 0,..., 0, 0,...0,1, 0,..., 0) #pages  

#days  

#ads   K

K

K

yˆ = w0 + w p + wa + wd + ∑ vk, p vk,a + ∑ vk, p vk,d + ∑ vk,a vk,d k=1

Global   bias  

Local   bias  

k=1

Pairwise   interacDons  

k=1

Importance-­‐Aware  Learning   WEIGHTED  COST  FUNCTIONS  

Individual  

Individual  =  (page,  ad,  day)  triplet   Share  =  no.  of  exposures  

Share/weight  

Handling  Cold  Start:  Hierarchy  Structure   Page  hierarchy   Channel   Publisher  

Ad  hierarchy   AdverDser  

Banner   Type  

Country  

Page  

Ad  

Hierarchical  structures  as  prior  probability  and  a  means  for  back-­‐off    

Hierarchical  Learning   1

Regulariza1on  

Intui1on:  Each  latent  factor  should   have  a  prior  that  makes  it  more   similar  to  its  parents  

2

Fi]ng  

Idea:  Reasonable  prior  of  the   parents  can  be  obtained  by   aggregaDng  click/expose  data     Click  

Expose  

Hierarchical  +  Importance-­‐Aware  Learning  

Hierarchical   regularizaDon  

StochasDc  gradient  descent   (Square  and  logisDc  loss)  

Weighted   update  +   hierarchical   regularizaDon  

Coordinate  descent   (Cyclic  and  stochasDc)  

PART  4:  EXPERIMENTS  

Experiment  Setup   Valida1on   Evalua1on  

Metrics  

Training  set  

Test  set  

Trial  1  

05-­‐11  Oct  

12-­‐13  Oct  

Trial  2  

07-­‐13  Oct  

14-­‐15  Oct  

Trial  3  

09-­‐15  Oct  

16-­‐17  Oct  

Trial  4  

11-­‐17  Oct  

18-­‐19  Oct  

Trial  5  

13-­‐19  Oct  

20-­‐21  Oct  

Trial  6  

15-­‐21  Oct  

22-­‐23  Oct  

Trial  7  

17-­‐23  Oct  

24-­‐25  Oct  

Trial  8  

19-­‐25  Oct  

26-­‐27  Oct  

Trial  9  

21-­‐27  Oct  

28-­‐29  Oct  

Trial  10  

23-­‐29  Oct  

30-­‐31  Oct  

1

PredicDon  metric:   •  Root  mean  square  error   •  NegaDve  log-­‐likelihood  

2

Ranking  metric:   •  Area  under  ROC  curve  

Performance  Benchmark   Regression  

0.013   0.011   0.009   0.007   0.005  

100%  

Baseline  

Proposed  methods  

Proposed  methods  

80%   70%   60%   50%   40%   30%  

0.003  

20%  

0.001  

10%  

-­‐0.001  

Baseline  

90%   Ad  ranking  (wAUC)  

pCTR  predic1on  error  (wRMSE)  

0.015  

Ranking  

0%  

Min.  exposure  emin  =  1000  

Performances  in  Cold  Start  Situa1ons   Observa1ons   1)  Results  improve  when  using  hierarchical  regularizaDon  and  filng   2)  Results  for  importance-­‐aware  and  unweighted  learning  are  not  directly  comparable  

Min.  exposure  emin  =  1000  

PART  5:  CONCLUSION  

Main  Contribu1ons   •  A  unified  tensor  factorizaDon  model  catering   for  temporal  dynamics,  importance-­‐aware   and  hierarchical  learning   •  Simple  yet  effecDve  extensions  of  SGD  and  CD   algorithms  for  handling  importance  weights   and  hierarchical  regularizaDon   •  PracDcal  usage  and  result  improvements   showcased  on  real  mobile  adverDsing  data  

FUTURE   WORKS  

•  Need  to  scale  up  with   parallelizaDon  or  more  efficient   data  representaDon   •  Enhance  interpretability  of  the   model,  e.g.,  non-­‐negaDvity   constraints  in  the  latent  factors   •  Explore  model  applicaDons  in   other  tasks,  e.g.,  item  adopDon  

Varia1ons  at  Different  Hierarchy  Levels   Page  x  adverDser  

Channel  x  adverDser  

Observa1ons    

1)  CTR  values  vary  across  different  days   x  ad   increases  as  we  go  up  in  the   2)  Publisher   CTR  variaDons   hierarchy  à  possibility  for  “back-­‐off”  

Channel  x  ad  

Performance  Benchmark  

Min.  exposure  emin  =  10  

Performance  Benchmark  

Min.  exposure  emin  =  100  

Performance  Benchmark  

Min.  exposure  emin  =  1000  

Performances  in  Cold  Start  Situa1ons  

Min.  exposure  emin  =  100  

Performances  in  Cold  Start  Situa1ons  

Min.  exposure  emin  =  10  

Predicting Response in Mobile Advertising with ...

Sparse data / cold-‐start. • Desiderata: – Accurate ... Webpage (= page ID) ... Response Predic`on Framework page p ad a day d. pCTR. K latent factors. K latent.

5MB Sizes 0 Downloads 200 Views

Recommend Documents

Predicting Response to Exposure Treatment in PTSD ...
supported by her social network and of herself as a member of the network. The rating of ..... Correlates and consequences of adult sexual disclosure. Journal of.

The Connector Service - Predicting Availability in Mobile Contexts
in a proactive yet implicit way, services should be able to identify and possibly ... Section 3 outlines a series of large-scale field studies with one hundred .... Students used their own mobile phones to call the Connector telephone server (see.

Predicting visibilities in MeqTrees with UVBrick - GitHub
SixpackImageComponent (Python). FFTBrick ... interpolated visibilities as a function of frequency and time. ... enables one to calculate V(u,v,w) from V(u,v,w=0).

BASK Digital Media drives rapid response advertising with HTML5 ...
In the world of online political advertising, the window of opportunity shuts as fast as it opens. If campaigns cannot react quickly, they lose out. So when HTML5 ...

BASK Digital Media drives rapid response advertising with HTML5 ...
But to create HTML5 rich media ads from scratch, Kleiner used to spend hours compressing video and tinkering with code. The whole process was very complex ...

IHG welcomes Google mobile advertising with a 91 ... - SoloMarketing
De Rosa adds: “Thanks to our current activity with Google, traffic to our mobile site is increasing by 20 per cent month-on-month..” Click-to-Call on mobiles accounts for around 40 per cent of our mobile web revenue globally. At the same time as

Mobile display advertising on the AdMob network provided G4 with ...
G4 is TV that's plugged in. “Partnering with AdMob for Android app promotion enabled G4 to quickly grow our. Android user base at launch. We were also able.

Mobile display advertising on the AdMob network provided G4 with ...
G4 is TV that's plugged in. “Partnering with AdMob for Android app promotion enabled G4 to quickly grow our. Android user base at launch. We were also able.

IHG welcomes Google mobile advertising with a 91 ... - SoloMarketing
How to build a successful mobile advertising strategy. In line with De Rosa's views, IHG ... localised sites in the UK, France, Germany and Italy. Furthermore, the ...

Predicting Winning Price in Real Time Bidding with Censored Data.pdf ...
Displaying Predicting Winning Price in Real Time Bidding with Censored Data.pdf. Predicting Winning Price in Real Time Bidding with Censored Data.pdf.

Predicting Winning Price in Real Time Bidding with Censored Data.pdf
Predicting Winning Price in Real Time Bidding with Censored Data.pdf. Predicting Winning Price in Real Time Bidding with Censored Data.pdf. Open. Extract.

Instant Advertising in Mobile Peer-to-Peer Networks
models [6][7][8] in which users retrieve information from middle-ware service ... propagation manner (gossiping vs exchange at encounter) and different probability ..... pure gossiping model plus optimization mechanism (1) &. (2) as Optimized ...

Predicting the Number of Mobile Subscribers.pdf
With curve fitting, we model such saturation by. using the inverse logit function !! = !!!!!! !!!!!!!! , (7). also called “logistic function”. As this function is log-concave [1,.

OMS-Mobile Advertising-062612.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. OMS-Mobile Advertising-062612.pdf. OMS-Mobile Advertising-062612.pdf.

Predicting the Present with Google Trends
Apr 10, 2009 - Denote Ford sales in the t-th month as {yt : t = 1, 2, ··· ,T} and the Google Trends index in the .... i. http://www.census.gov/marts/www/marts.html ii.

Predicting Word Pronunciation in Japanese
dictionary using the web; (2) Building a decoder for the task of pronunciation prediction, for which ... the word-pronunciation pairs harvested from unannotated text achieves over. 98% precision ... transliteration, letter-to-phone. 1 Introduction.

Predicting Synchrony in a Simple Neuronal Network
of interacting neurons. We present our analysis of phase locked synchronous states emerging in a simple unidirectionally coupled interneuron network (UCIN) com- prising of two heterogeneously firing neuron models coupled through a biologically realis

Inefficient response inhibition in individuals with mild ...
Available online 18 December 2006. Abstract ... vary, but individuals with MCI progress to AD at a rate of. 10–25% per year as .... et al., 1992). Thus, higher error rates at the fastest segments of ..... of interest. For each ..... banks, L., et a

Predicting Information Seeker Satisfaction in ...
personal or classroom use is granted without fee provided that copies are not made or .... the core of maintaining an active and healthy QA community. .... F1: the geometric mean of Precision and Recall measures, ..... no longer meaningful).

Predicting Human Reaching Motion in ... - Semantic Scholar
algorithm that can be tuned through cross validation, however we found the results to be sparse enough without this regularization term (see Section V).

Predicting Synchrony in a Simple Neuronal Network
as an active and adaptive system in which there is a close connection between cog- nition and action [5]. ..... mild cognitive impairment and alzheimer's disease.

Predicting Synchrony in Heterogeneous Pulse ... - Semantic Scholar
University of Florida, Gainesville, FL 32611. (Dated: July 16 .... time of the trajectory, we empirically fit the modified ... The best fit in the least squares sense was.