Dynamic Tournament Design: An Application to ...

Viewer
Transcript

Dynamic Tournament Design: An Application to Prediction Contests∗ Jorge Lemus†

Guillermo Marshall‡

May 14, 2018

Abstract Online contests have become a prominent form of innovation procurement. How do elements of contest design shape players’ incentives throughout the competition? Does a real-time leaderboard improve outcomes? We provide two complementary approaches to answer these questions. First, we build a tractable dynamic model of competition and estimate it using data from the online platform Kaggle. We evaluate contest outcomes under counterfactual contest designs, which modify information disclosure, allocation of prizes, and participation restrictions. And second, we present experimental evidence from student competitions hosted on Kaggle. Our main finding is that a public leaderboard improves contest outcomes: it increases the total number of submissions and the score of the best submission.

Keywords: Dynamic contest, contest design, dynamic games, Kaggle, big data. JEL codes: C51, C57, C72, O31. ∗

We thank participants and discussants at the Conference on Internet Commerce and Innovation

(Northwestern), IIOC 2017, Rob Porter Conference (Northwestern), Second Triangle Microeconomics Conference (UNC), University of Georgia, Cornell (Dyson), and University of Technology Sydney for helpful suggestions. Approval from the University of Illinois Human Subjects Committee, IRB18644. † University of Illinois at Urbana-Champaign, Department of Economics; [email protected] ‡ University of Illinois at Urbana-Champaign, Department of Economics; [email protected]

1

1

Introduction

Online competitions have become a valuable resource for government agencies and private companies to procure innovation. For instance, U.S. government agencies have sponsored over 830 competitions that have awarded over $250 million in prizes for software, ideas, or designs through the website www.challenge.gov—e.g., DARPA sponsored a $500,000 competition to accurately predict cases of chikungunya virus.1 In the UK, the website www.datasciencechallenge.org was created to “drive innovation that will help to keep the UK safe and prosperous in the future.” Multiple platforms allow private companies to sponsor online competitions.2 Given that the design of online competitions varies across different platforms, several economic questions arise. How are players’ incentives shaped by the design of a competition? Does a real-time public leaderboad encourage or discourage participation? Is a winner-takes-all competition better than one that allocates multiple prizes? Our contribution is to empirically study how contest design affects players’ incentives. Although the literature on contest design has advanced our knowledge on static settings, the amount of research on dynamic contest design with heterogeneous players is still limited. We advance this literature by presenting results from two complementary approaches. First, we build and estimate a tractable structural model using data on online competitions. And second, we run a randomized control trial to provide an answer, independent of our modeling assumptions, to the question of how contest design impacts competition outcomes. Our setting is Kaggle,3 an online platform that hosts prediction contests, i.e., competitions where the winner is the player with the most accurate prediction of some random variable.4 We use Kaggle to create and host 44 student competitions for our randomized control trial; additionally, we use data from 57 large Kaggle competitions to estimate the structural parameters of our model. Participants in Kaggle competitions have access to a training and a test dataset. An observation in the training dataset includes both an outcome variable and covariates; 1

http://www.darpa.mil/news-events/2015-05-27 Examples include CrowdAnalytix, Tunedit, InnoCentive, Topcoder, HackerRank, and Kaggle. 3 https://www.kaggle.com/ 4 For instance, IEEE sponsored a $60,000 contest to diagnose schizophrenia; The National Data 2

Science Bowl sponsored a $175,000 contest to identify plankton species from multiple images.

2

whereas, the test dataset only includes covariates. A valid submission in a contest must include the outcome variable prediction for each observation in the test dataset. To avoid overfitting, Kaggle partitions the test dataset into two subsets and does not inform participants which observations correspond to each subset. The first subset is used to generate a public score that is posted in real-time on a public leaderboard. The second subset is used to generate a private score that is never made public during the contest—it is revealed only at the end of the competition. Prizes are awarded according to the private score ranking. Thus, the public score, which is highly correlated with the private score, provides a noisy signal about performance.5 In Kaggle competitions, players can submit multiple solutions and observe in realtime a public leaderboard that displays the public score of each submission throughout the contest.6 Modeling this class of dynamic contests pose some technical challenges. First, participants’ final standings are uncertain, because the public-score ranking only provides a noisy signal of their private-score ranking. Thus, players need to keep track of the complete public history to compute the benefit of an extra submission: a state space that keeps track of the complete public history is computationally intractable. Second, there is a large number of heterogeneous participants, sending thousands of submissions, in each competition. An analytic solution for the equilibrium of a dynamic model with heterogeneous players is cumbersome, and computationally expensive. Our descriptive evidence indicates that each player sends multiple submissions, and players are heterogeneous in their ability to produce high scores. To capture these features in our model, we assume that players work on at most one submission at a time, and that a player’s type determines the distribution from which scores are drawn. After entering the contest, a player draws a cost from a distribution, which represents the cost of making a new submission. The player then decides to make a new submission or to exit the competition. To make this decision, the player compares the expected payoff of a new submission minus its cost versus the payoff of finishing the competition with her current set of submissions. If the player decides to make a new submission, then she works on that submission (and only that submission) for a random amount 5

In our data, the correlation between public and private is 0.99 and 79 percent of the contest’s

winners finish in the top 3 of the public leaderboard. 6 Other websites (e.g., www.datasciencechallenge.org and www.drivendata.org) share these features.

3

of time. Immediately after the submission is completed, the submission is evaluated, and the public score of that submission is revealed in the public leaderboard.7 At this point, and after observing the public leaderboard, the player draws a new cost and again decides to make a new submission or to quit. In computing the benefit of a new submission, a player considers the chances of winning a prize at the end of the contest given the current public leaderboard, her type, the current scores, and the expected number of rival submissions that will arrive before the end of the contest—more rival submissions will lower the player’s chance of winning a prize. To deal with the problem of a computationally-unmanageable state space, we assume that players are small—i.e., a player’s belief of how many rival submissions will arrive in the future is unaffected by the action of sending a new submission—and we also limit the amount of information that players believe is relevant for computing their chances of winning the contest. Under these assumptions, our model can be tractably estimated and used to generate a series of counterfactual contest designs. Our counterfactual simulations show that different contest designs have heterogeneous effects on the incentives to make a submission and contest outcomes. We evaluate counterfactual contest designs by several performance indicators: the total number of submissions, the number of submissions by player type, and the upper-tail statistics of the score distribution. We find that information disclosure has an economically significant effect both on the number and the quality of the submissions. Without a public leaderboard, the number of submissions would decrease on average by 23.6 percent. This decline in the number of submissions is explained mostly by a reduction in the number of submissions by high-type players. Consistent with this finding, the maximum score decreases by 1.3 percent without a leaderboard. Reducing the noise between public and private scores, limiting the number of players, or changing the number of prizes, have smaller effects on competition performance. First, a fully informative leaderboard increases the number of submissions by 3.3 percent and the maximum score by 0.2 percent. Second, limiting the entry to 90 percent of the actual participants reduces the number of submissions by 8.2 percent and the maximum score by 0.2 percent. And third, awarding a single or multiple prizes, does not have a significant effect in any of 7

We do not model the choice of keeping a submission secret. As we explain in Section 2, the

evidence does not indicate that players are strategic in the timing of their submissions.

4

our indicators of contest performance. Finally, the results from our randomized control trials corroborate the main finding of our structural estimates. We created and hosted 44 student competitions on Kaggle to study how a public leaderboard impacts incentives and contest outcomes.8 Half of the competitions were randomly assigned to the control group (i.e., no public leaderboard) and the other half were assigned to the treatment group (i.e., public leaderboard), with competitions being otherwise equal. The results suggest that displaying a public leaderboard has a significant and positive effect on both the number of submissions and the maximum score. These “model free” results provide further evidence that a public leaderboard improves competition outcomes.

1.1

Related Literature

Contests are a widely used open innovation mechanism (Chesbrough et al., 2006). They attract talented individuals with different backgrounds (Jeppesen and Lakhani, 2010; Lakhani et al., 2013) and procure a diverse set of solutions (Terwiesch and Xu, 2008). An extensive literature on static contests has focused on design features such as the number and allocation of prizes, and the number of participants. The optimal allocation of prizes includes the work of Lazear and Rosen (1979), Taylor (1995), Moldovanu and Sela (2001), Che and Gale (2003), Cohen et al. (2008), Sisak (2009), Olszewski and Siegel (2015), Kireyev (2016), Xiao (2016), Strack (2016), and Balafoutas et al. (2017). This literature, surveyed by Sisak (2009), has found that the convexity of the cost of effort plays an important role in determining the optimal allocation of prizes. Taylor (1995) and Fullerton and McAfee (1999), among others, show that restricting the number of competitors in winner-takes-all tournaments increases the equilibrium level of effort. Intuitively, players have less incentives to exert costly effort when they face many competitors, because they have a smaller chance of winning. In dynamic settings, the role of information disclosure and feedback has only recently been explored. Aoyagi (2010) compares the provision of effort by agents under full disclosure of information (i.e., players observe their relative position) versus no infor8

All of the participants were University of Illinois at Urbana-Champaign students.

5

mation disclosure, in a dynamic tournament. Ederer (2010) adds private information to this setting whereas Klein and Schmutzler (2016) add different forms of performance evaluation. Goltsman and Mukherjee (2011) study when to disclose workers’ performance. Other recent articles studying dynamic contest design include Halac et al. (2014), Bimpikis et al. (2014), Benkert and Letina (2016), and Hinnosaar (2017). Some authors have analyzed design tools other than prizes, limited entry, or feedback. Megidish and Sela (2013) consider contests that require players to exert an exogenously given minimal level of effort to particpate. They show that a single prize is dominated by giving each participant an equal share of the prize, when the required minimal level of effort is high. Moldovanu and Sela (2006) show that it is optimal to split competitors into two divisions when the number of competitors is large. In the first round, participants compete within each of these divisions, and in the second round the winners of each division compete to determine the final winner. A growing empirical literature on contests includes Boudreau et al. (2011), Genakos and Pagliero (2012), Takahashi (2015), Boudreau et al. (2016), Bhattacharya (2016) and Zivin and Lyons (2018). Gross (2015) studies how the number of participants changes the incentives for creating novel solutions versus marginally better ones. In a static environment, Kireyev (2016) uses an empirical model to study how elements of contest design affect participation and the quality of outcomes. Huang et al. (2014) estimates a dynamic structural model to study individual behavior and outcomes in a platform where individuals can contribute ideas, some of which will be implemented at the end of the contest. This paper focuses on learning the value of ideas rather than on contest design. Finally, Gross (2017) studies how performance feedback impacts participation in design contests, but the analysis abstracts away from the dynamics of competition. Stopping decisions are based on each players’ past outcomes and not on a dynamic leaderboard. This is in contrast with our paper, where we allow for sequential participation and dynamic feedback based on other competitors’ results. Also related to our paper is the “gamification” literature, which studies the application of game-design elements (e.g., leaderboards) to areas such as education, marketing, health, or labor markets, among others. Most of these research is conducted with experiments. Landers and Landers (2014) show that adding a leaderboard improves

6

‘time-on-task” in a education setting. Landers et al. (2017) show that a leaderboard motivates agents to set more ambitious goals. Athanasopoulos and Hyndman (2011) find that a leaderboard improves forecasting accuracy. Finally, the literature on effort provision for non-pecuniary motives is also related. Lerner and Tirole (2002) argue that good quality contributions are a signal of ability to potential employers. Moldovanu et al. (2007) studies a setting where status motivates participation. Finally, it is possible to establish a parallel between a contest and an auction. While there is a well-established empirical literature on bidding behavior in auctions (Hendricks and Porter, 1988; Li et al., 2002; Bajari and Hortacsu, 2003, among others), there are only a few papers analyzing dynamic behavior in contests.

2

Background, Data, and Motivating Facts

2.1

Background and Data

To motivate and estimate our model, we use publicly available information on 57 featured competitions hosted by Kaggle.9 These competitions offered a monetary prize of at least $1,000, received at least 1,000 submissions, used between 10 and 90 percent of the test dataset to generate public scores, and evaluated submissions according to a well-defined rule. In these competitions, there was an average of 894 teams per contest, competing for rewards that ranged between $1,000 and $500,000 and averaged $30,489. A partial list of competition characteristics are summarized in Table 1 (see Table A.1 in the Online Appendix for the full list). All of these competitions, with the exception of the Heritage Health Prize, granted prizes to the top three scores.10 For example, in the Coupon Purchase Prediction competition, the three submissions with the highest scores were awarded $30,000, $15,000, and $5,000, respectively. Kaggle computes the public score and private score by evaluating a player’s submission in two subsamples of a test dataset. For example, in the Heritage Health Prize, the 9 10

https://www.kaggle.com/kaggle/meta-kaggle The following contests also granted a prize to the fourth position: Don’t Get Kicked!, Springleaf

Marketing Response, and KDD Cup 2013 - Author Disambiguation Challenge (Track 2).

7

Name of the

Total

Number of

Competition

Reward

Submissions

Heritage Health Prize

Teams

Start Date

Deadline

500,000

23,421

1,221

04/04/2011

04/04/2013

Allstate Purchase Prediction Challenge

50,000

24,526

1,568

02/18/2014

05/19/2014

Higgs Boson Machine Learning Challenge

13,000

35,772

1,785

05/12/2014

09/15/2014

Acquire Valued Shoppers Challenge

30,000

25,138

952

04/10/2014

07/14/2014

Liberty Mutual Group - Fire Peril Loss Cost

25,000

14,751

634

07/08/2014

09/02/2014

Driver Telematics Analysis

30,000

36,065

1,528

12/15/2014

03/16/2015

Crowdflower Search Results Relevance

20,000

23,237

1,326

05/11/2015

07/06/2015

Caterpillar Tube Pricing

30,000

23,834

1,187

06/29/2015

08/31/2015

Liberty Mutual Group: Property Inspection Prediction

25,000

40,594

2,054

07/06/2015

08/28/2015

Coupon Purchase Prediction

50,000

18,477

1,076

07/16/2015

09/30/2015

100,000

34,861

1,914

08/14/2015

10/19/2015

Homesite Quote Conversion

Springleaf Marketing Response

20,000

28,571

1,334

11/09/2015

02/08/2016

Prudential Life Insurance Assessment

30,000

42,336

2,452

11/23/2015

02/15/2016

Santander Customer Satisfaction

60,000

93,031

5,117

03/02/2016

05/02/2016

Expedia Hotel Recommendations

25,000

22,709

1,974

04/15/2016

06/10/2016

Table 1: Summary of the Competitions in the Data (Partial List) Note: The table only considers submissions that received a score. The total reward is measured in US dollars at the moment of the competition. See Table A.1 in the Online Appendix for the complete list of competitions.

test data was divided into a 30 percent subsample to compute the public scores and a 70 percent subsample to compute the private scores. Kaggle discloses the percentage of the data in each subsample, but players do not know which observations belong to each subsample, which creates imperfect correlation between public and private scores. All the competitions we consider display, in real-time, a public leaderboard which contains the public score of every submission made at each point in time. Because these public scores are calculated by only using part of the test dataset (e.g., 30 percent in the Heritage Health Prize competition), players’ final standing may be different than the final standing displayed in the public leaderboard. Although the correlation between public and private scores is very high in our sample (the coefficient of correlation is 0.99), the ranking in the public leaderboard and the private leaderboard may diverge. Hence, the public leaderboard provides informative, yet noisy, signals on the performance of all players throughout the contest. In about 79 percent of the competitions, the winner finished the competition within the top three of the final public leaderboard (see Table A.2 in the Online Appendix).

8

2.2

Motivating Facts

Our modeling choices are guided by a series of empirical facts. For each contest, we observe information on all submissions including when they were made (time of submission), who made them (team identity), and their score (public and private scores). Using this information, we reconstruct both the public and private leaderboard at every instant of time. To make comparisons across contests, we normalize the contest length and the total prize to one, and we standardize public and private scores. In each competition, the score heterogeneity at the lower end of the distribution is driven by participants that may be not trying to win the competition, but instead they are participating for non-pecuniary motives such as learning or as a recreational activity. Given that we are interested in modeling competitive players—those who are trying to win the competition—we divide teams into two categories: “competitive” and “noncompetitive.” Competitive teams are defined as teams that obtain scores above the 75th percentile of the score distribution in a competition.11 Table 2 presents summary statistics at the competition level, team level, and submission level. Table 2 (Panel A) presents summary statistics at the competition level. On average, there are 893.7 teams per competition, the reward is about $30,489, and competitions last for about 81.69 days. Panels B and C in Table 2 show summary statistics for all teams and competitive teams, respectively. About 25 percent of the teams are competitive and these teams send an average of 40 submissions per competition, which exceeds the overall sample average of 16.531 submissions. The number of members in a competitive team is on average 1.2 members, which is not significantly different than the average number of members in a team when considering the full sample. Panels D and E in Table 2 present summary statistics for all submissions and submissions by competitive teams, respectively. Public and private scores are standardized and range between -4 and 5.659 and -4 and 5.432, respectively. Competitive teams obtain higher public and private scores on average, where the average public scores are 0.358 and 0.004 for competitive teams and the overall sample, respectively. Com11

Table A.3 in the Online Appendix shows that competitive teams are more experienced: 63 percent

participate in more than one competition.

9

Share of teams with 1 submission or more 0 .2 .4 .6 .8 1

.2 Fraction of submissions .05 .1 .15 0

0

.2 .4 .6 .8 Fraction of time completed

1

0

.2

.4 .6 .8 Fraction of time completed

1

kernel = epanechnikov, degree = 0, bandwidth = .12, pwidth = .18

(a)

(b)

Figure 1: Submissions and Entry of Teams Over Time Across all Competitions Note: An observation is a submission. Panel (a) shows a histogram of submission by elapsed time categories. Panel (b) shows a local polynomial regression of the number of teams with 1 or more submissions as a function of time.

petitive teams also present significant variation in their scores (standard deviation of 0.75), and play more frequently than the rest of the teams—the average time between submissions for competitive teams and all teams is 1.2 and 1.5 percent of the contest time, respectively. Observation 1. Most teams are composed of a single member. Figure 1 shows the evolution of the number of submissions and teams over time. Figure 1 (Panel A) partitions all the submissions into time intervals based on their submission time. The figure shows that the number of submissions increases over time, with roughly 20 percent of them being submitted when 10 percent of the contest time remains, and only 6 percent of submissions occurring when 10 percent of the contest time has elapsed. Figure 1 (Panel B) shows the timing of entry of new teams into the competition. The figure shows that the rate of entry is roughly constant over time, with about 20 percent of teams making their first submission when 20 percent of the contest time remains. Observation 2. New teams enter at a constant rate throughout the contest. We also explore the time between submissions at the team level. Figure 2 shows a local polynomial regression for the average time between submission as a function of 10

Panel A: Competition-level statistics N

Mean

St. Deviation

Min

Max

Number of teams

57

893.702

963.081

79

5,117

Reward quantity

57

30,488.596

66,736.377

1,000

500,000

Length (days)

57

81.69

87.90

1

700

Panel B: Overall team-level statistics Number of submissions

N

Mean

St. Deviation

Min

Max

50,941

16.531

29.538

1

665

Number of members

50,941

1.127

0.604

1

40

Competitive team (indicator)

50,941

0.247

0.431

0

1

Panel C: Team-level statistics — competitive teams N

Mean

St. Deviation

Min

Max

Number of submissions

12,591

40.078

47.904

1

665

Number of members

12,591

1.228

0.881

1

24

Panel D: Overall submission statistics N

Mean

St. Deviation

Min

Max

842,089

0.004

0.991

-4.000

5.659

Private score

842,089

0.005

0.991

-4.000

5.432

Time of submission

842,089

0.601

0.289

0.000

1.000

Time between submissions

791,146

0.015

0.053

0.000

0.998

Public score

Panel E: Overall submission statistics — competitive teams N

Mean

St. Deviation

Min

Max

504,621

0.358

0.751

-3.999

5.659

Private score

504,621

0.355

0.749

-4.000

5.432

Time of submission

504,621

0.623

0.281

0.000

1.000

Time between submissions

492,030

0.012

0.044

0.000

0.985

Public score

Table 2: Summary Statistics Note: An observation in Panel D and E is a submission; an observation is a team–competition combination in Panels B and C; an observation in Panel A is a contest. Scores are standardized and time is rescaled to be contained in the unit interval. Time between submissions is the time between two consecutive submissions by the same team. Competitive teams are teams that achieved a public score above the 75th percentile of a contest’s final distribution of scores.

11

Time between submissions .005 .01 .015 .02

0

.2

.4 .6 .8 Fraction of time completed

1

kernel = epanechnikov, degree = 0, bandwidth = .08, pwidth = .12

Figure 2: Time Between Submissions Note: An observation is a submission. The figure shows a local polynomial regression of the time between submissions as a function of time.

time. The figure shows that the average time between submissions increases over time, suggesting that either teams are experimenting when they enter the contest or that building a new submission becomes increasingly difficult over time. Combined, Figure 1 and Figure 2 suggest that the increase in submissions at the end of contests is not driven by all teams making submissions at a faster pace, but simply because there are more active teams at the end of the contest and potentially more incentives to play. Observation 3. The rate of arrival of submissions increases with time. Table 3 decomposes the variance of public scores using a regression analysis. In column 1, we find that 40 percent of the variation in public score is between-team variation, suggesting that teams differ systematically in the scores that they achieve. In column 2, we control for the number of submissions that a team has submitted up to the time of each submission (e.g., the variable takes the value n − 1 for a team’s nth submission). This variable allows us to capture whether learning can explain some of the variation in scores. Column 2 shows that later submissions obtain higher scores, but only an extra 2.3 percent of the variance in scores is explained by this control. This suggests that while learning may be present, between-team variation explains the majority of the systematic variation in scores. Columns 3 and 4 repeat the analysis for competitive teams. In this restricted sample, teams are more homogeneous, so team fixed-effects explains less of the variation when compared to the whole sample. 12

(1)

(2)

Observations R2

(4)

All teams

Competitive teams

Public Score

Public Score

Submission number Competition × Team FE

(3)

0.0047∗∗∗

0.0041∗∗∗

(0.0000)

(0.0000)

Yes

Yes

Yes

Yes

833,970

833,970

504,410

504,410

0.490

0.513

0.226

0.270

Table 3: Decomposing the Public Score Variance Note: Robust standard errors in parentheses.

∗

p < 0.1,

∗∗

p < 0.05,

∗∗∗

p < 0.01. An observation is

a submission. Submission number is defined at the competition–team–submission level and measures the number of submissions made by a team up to the time of a submission.

Observation 4. Teams systematically differ in their ability to produce high scores. To understand how the public leaderboard shapes incentives to participate, we regress an indicator for whether a given submission was a team’s last submission on the distance between the team’s best public score up to that time and the best public score across all teams up to that time. Table 4 (Column 1) shows that it is more likely for teams to drop out of the competition when they start falling behind in the public score leaderboard. A one standard deviation increase in a team’s deviation from the maximum public score at time t is associated with a 2.84 percent increase in the likelihood of a team dropping out of the competition at time t. Column 2 explores whether this result is heterogeneous between competitive and non-competitive teams, and shows that competitive teams are less discouraged to quit the competition when they are falling behind, compared to non-competitive teams. In Table 5, we analyze how incentives to make a new submission are affected by a submission that increases the maximum public score by a sufficient amount (e.g., 0.01 for our analysis in Table 5). We call such a submission disruptive . To measure how a disruptive submission affects incentives to make new submissions, we first partition time into intervals of length 0.001 and compute the number of submissions in each of these intervals. We then perform a comparison of the number of submissions before13

(1)

(2)

Last submission (indicator) Deviation from max

0.0284∗∗∗

0.0138∗∗∗

public score (standardized)

(0.000 3)

(0.0005) -0.0156∗∗∗

Deviation * Competitive

(0.0005) -0.0646∗∗∗

Competitive team

(0.0009) Competition FE Observations R

2

Yes

Yes

842089

842089

0.020

0.040

p-value F-test

0.0000

Table 4: Indicator for Last Submission as a Function of a Team’s Deviation from the Maximum Public Score Note: Robust standard errors in parentheses.∗ (p < 0.1),

∗∗

(p < 0.05),

∗∗∗

(p < 0.01). Deviation from

max public score is the difference between the maximum public score minus the score of a submission, at the time of that submission. We standardize this variable using its competition-level standard deviation. See Table 2 for the definition of competitive team.

and-after the arrival of the disruptive submission, restricting attention to periods that are within 0.05 time units of the disruptive submission. Column 1 in Table 5 shows that the number of submissions decreases immediately after the disruptive submission by an average of 2.24 percent. We take this as further evidence of players using to the public leaderboard to make their decisions to continue or to quit. Table 5 (Column 2) shows a positive discouragement effect for non-competitive teams, while a near zero discouragement effect for competitive teams. Column 3 repeats the exercise in Column 2 using instead an indicator for teams who ended in the top 50, and shows similar results. Column 4 repeats the exercise in Column 3 using instead an indicator for whether the team ended in the top 10, and shows that top 10 teams are encouraged by a disruptive submission (i.e., they increase their number of submissions after a disruptive submission). Table 5 complements Table 4 in showing that the leaderboard shapes participation incentives and the leaderboard has heterogeneous effects across players. 14

(1)

(2)

(3)

(4)

Number of submissions (in logs) After disruptive submission

-0.0224∗∗∗

-0.0373∗∗∗

-0.0506∗∗∗

-0.0395∗∗∗

(0.0081)

(0.0095)

(0.0107)

(0.0097)

After * Competitive

0.0196 (0.0129) 0.0586∗∗∗

After * Top 50

(0.0157) 0.0885∗∗∗

After * Top 10

(0.0195) Competition FE Observations R

2

Yes

Yes

Yes

Yes

21545

37666

36701

31657

0.819

0.729

0.637

0.694

0.0522

0.4926

0.0045

p-value F-test

Table 5: The Impact of Disruptive Submissions on Participation Note: Robust standard errors in parentheses.

∗

(p < 0.1),

∗∗

(p < 0.05),

∗∗∗

(p < 0.01). Disruptive

submissions are those that increase the maximum public score by at least 0.01. Number of submissions is the number of submissions in time intervals of length 0.001. The regressions restrict the sample to before and after 0.05 time-units of the disruptive submission. All specifications control for time and time squared. See Table 2 for the definition of competitive team. Top 50 and Top 10 are indicators for whether the team ended the competition within the top 50 and 10 participants, respectively.

Observation 5. The public leaderboard shapes participation incentives. This effect is heterogeneous among players. Players may strategically choose when to release a disruptive submission, if they knew that a submission is disruptive. In this case, teams would have incentives to submit a disruptive submission as late as possible in the competition, to avoid encouraging players that are capable to generating good scores (Column 4 in Table 5). Empirically, however, we do not find this effect. Figure 3 plots the timing of submissions that increased the maximum public score by at least 0.01. In the figure we restrict attention to submissions that were during the final 75 percent of the contest time, because score processes are noisier earlier in contests. The figure suggests that disruptive submissions arrive uniformly over time and the pattern suggests that teams are either not strategic or they do not know when a submission will be disruptive. This may be driven by 15

the fact that teams only learn about the out-of-sample performance of a submission after Kaggle has evaluated it. That is, before making a submission, the teams can only evaluate the solution using the training data, which is not fully informative about its out-of-sample performance. Observation 6. Submissions that disrupt the public leaderboard are submitted uniformly over time.

Cumulative Probability

1 .8 .6 .4 .2 0 .2

.4

.6 Time of submission

.8

1

Figure 3: Timing of Drastic Changes in the Public Leaderboard’s Maximum Score (i.e., Disruptive Submissions): Cumulative Probability Functions Note: An observation is a submission that increases the maximum public score by at least 0.01. The figure plots submissions that were made when at least 25 percent of the contest time had elapsed.

3

Empirical Model

We consider a contest in which a number of players enter over time. Observation 2 does not suggest that players strategically choose their time of entry, but rather that they enter at a random time, possibly related to idiosyncratic shocks such as when they find out about the contest. We model the time of entry of a player as a random variable, τentry , drawn from an exponential distribution of parameter µ > 0. Players have heterogeneous ability (Observation 4),12 which is captured by the set of types Θ = {θ1 , ..., θp }. The distribution of types, κ(θk ) = Pr(θ = θk ), k = 1, ..., p is 12

We ignore team incentives and we treat each team as a single player (Observation 1)

16

known by all players. A player type reflects the player’s ability to produce good results in the competition. Players can send multiple submissions throughout the contest, but they can work at most on one submission at a time. We assume that the rate of arrival of submissions is constant (Observation 3) and that finishing a submission takes a random time τ distributed according to an exponential distribution of constant parameter λ. After the arrival of a submission, players immediately make the decision of whether to continue playing or to quit forever. The cost of building a new submission, c ∈ [0, 1], is an independent draw from the distribution K(c) = cσ . Figure 4 shows the timing of the game before the end of the competition at T . τentry ∼ exp(µ)

τ ∼ exp(λ) time

0

t1

t2

T

Figure 4: Timing of the game. A player enters at time t1 . At this time, the player decides to continue playing. The next submission takes time t2 − t1 to arrive. At time t = 2 the player again decides to quit or play again.

We model the score of a submission as a type-dependent random variable. A player of type θ draws a public-private score pair (ppublic,θ , pprivate,θ ) from a joint distribution Hθ . Players know the joint distribution Hθ , but they do not observe the realization (ppublic,θ , pprivate,θ ). This pair of scores is private information of the contest designer. In the baseline case, the contest designer discloses, in real-time, only the public score ppublic,θ but not the private score pprivate,θ . The final ranking, however, is constructed with the private scores.13 At the end of the contest, players are ranked according to their private scores and the first j-th players in the ranking receive prizes of value VP1 ≥ ... ≥ VPj , and we normalize

Pj

i=1

VPi = 1.

The collection of pairs (identity, score) from the beginning of the contest until instant 13

Players are allowed to send multiple submissions—each player sends about 20 submissions on

average. However, the final ranking is computed with at most two submissions selected by each player. About 50 percent of the players do not make a choice, in which case Kaggle picks the two submissions with the largest public scores. Out of the 50 percent remaining that indeed choose, 70 percent choose the two scores with the highest public score.

17

t t conforms the public leaderboard, denoted by Lt = {(identity, score)j }Jj=1 , where Jt

is the total number of submissions up to time t. Conditional on the terminal public history LT , player i is able to compute pfinal `,i = Pr(i’s private ranking is `|LT ), which is the probability of player i being ranked in position ` in the private leaderboard at the end of the contest, conditional on the final public leaderboard LT . > 0 even if The public leaderboard is just a noisy ranking. It is possible that pfinal 1,i player i is ranked last in the public leaderboard, albeit this is a low probability event. Hence, all of the information in the public leaderboard is relevant for deciding to play or to quit. Keeping track of the complete history of submissions, with as many as 15,000 submissions in some competitions, is computationally intractable. In contrast, in a competition with a fully-informative ranking, players would only need to keep track of a single number (e.g., the current highest jth public scores) to make their investment decision, which can be captured by a low-dimensional state space. This is because in our model draws are independent (there is no effort choice) so players only need to “beat a benchmark.” Conditional on a type, players outside the top j players have the same incentives to play regardless on the distance to the leaders. For this reason, if there is no noise, the j public scores are a sufficient statistic for all players. To overcome the computational difficulty posed by keeping track of the whole public > 0 for ` = 1, 2, 3 if and only if player i is among the history, we assume that pfinal `,i three highest scores in the final public leaderboard. In other words, we assume that the final three highest private scores are a permutation of the final three highest public scores. Table A.2 in the Online Appendix shows that in 79 percent of the contests that we study, the winner is among the three highest public scores, suggesting that this assumption is not too restrictive.14 Small and Myopic Players There are thousands of submissions and players in each contest. Fully-rational players would take into account the effect of their submissions on the strategy of the rival players. However, solving analytically or computationally a dynamic model with fullyrational and heterogeneous players turns out to be infeasible. As a simplification, we assume that players are small, i.e., they do not consider how their actions affect the 14

This could be relaxed with more computational power.

18

incentives of other players. This price-taking-like assumption is not unreasonable for our application and it is not in contradiction with Observations 5 and 6. Our model captures competitive effects through beliefs over the number of submissions that rival players will send in the competition. We derive these beliefs as an equilibrium object by equating the actual number of submissions in a contest with the realized number of submissions in the model. In the counterfactual simulations, we find these beliefs as a fixed-point: given a belief about the total number of submission, the model predicts certain number of submissions and in equilibrium these two quantities must coincide.15 In addition to assuming that players are small, we assume that when players decide to play or to quit, they expect more submissions in the future by rival players, but not by themselves. That is, myopic players think this current opportunity to play is their last one. It is worth noting that under this assumption players might play multiple times, however they think that they will never have a future opportunity to play or in case they do they will choose not to play. This assumption can be relaxed with more computational power. State Space and Incentives to Play The relevant state space is defined by three sets. First, we define the set of scores Y = {y = (y1 , y2 , y3 ) ∈ [−4, 6]3 : y1 ≥ y2 ≥ y3 }. Second, we define the set of score ownership RS = {∅, 1, 2, 3, (1, 2), (1, 3), (2, 3)} to be . An element r ∈ RS indicates which of the top 3 public scores (if any) belong to a player. And third, T = [0, 1] represents the contest’s time. With a public leaderboard, y ∈ Y and t ∈ T are public information common to all players. Under the small-player assumption, the relevant state for each player is characterized by si = (t, ri , y) ∈ S ≡ T × RS × Y. To be precise, s = (t, ri , y) ∈ S means that at time t the three scores on the leaderboard are given by y, and player i owns the components of vector y indicated by ri . For example, if player i’s state is (t, (1, 3), (0.6, 0.25, 0.1)) it means that at time t, player i owns components one and three in vector y, i.e., the player two out of the three highest public scores, 0.6 and 0.1, belong to player i. The small-player assumption reduces the dimensionality of the state space, because players care only about the three highest public scores and which one of them they 15

Similar assumptions are made in Bhattacharya (2016).

19

own. Players do not observe the private scores, but they are able to compute the conditional distribution of private scores given the set of public scores. Because prizes are allocated at the end of the contest, the payoff-relevant states are the terminal states s ∈ {T } × RS × Y. We denote by π(s) the payoff of a player at terminal state s. In vector notation, we denote the vector of terminal payoffs by π. We consider a finite grid of m values for the public scores, Y = {y 1 , ..., y m }. If a player of type θ decides to play and sends a new submission, the public score of that submission is distributed according to qθ (k) ≡ Pr(y = y k |θ), k = 1, ..., m. Although players are small, they form beliefs over the number of future submissions sent by their rivals. At time t, a player believes that with probability pt (n) exactly n additional rival submissions will arrive before the end of the competition with scores independently drawn from the distribution G, where PrG (y = y k ) =

P

κ(θ)qθ (k). We

θ∈Θ

assume

γtn e−γt , n = 0, 1, ... (1) n! Under this functional form, players believe that the expected number of remaining pt (n) =

rival’s submissions at time t is γt . We impose γt to be a decreasing function of t. To derive the expected payoff of sending an additional submission we proceed in two steps. First, we solve for the case in which a player thinks she is the last one to play, i.e., pt (0) = 1, and then we solve for the belief pt (n) given in Equation 1. Denote by Btθ (s) the expected benefit of building a new submission for a player of type θ at state s, when pt (0) = 1. For clarification, consider the following example. A player of type θ is currently at a state s = (t, r = (1, 2), y = (y1 , y2 , y3 )) and has an opportunity to play. If she plays and the new submission arrives before T (which happens with probability 1 − e−(T −t)λ ), the transition of the state depends on the score of the new submission y˜. The state (r, y) can transition to (r0 , y 0 ) where: r0 = (1, 2) and y 0 = (y1 , y2 , y3 ) when y˜ < y2 ;16 or r0 = (1, 2) and y 0 = (y1 , y˜, y3 ) when y2 ≤ y˜ < y1 ; or r0 = (1, 2) and y 0 = (˜ y , y1 , y3 ) when y1 ≤ y˜. More generally, we can repeat this exercise for all states s ∈ S and put all these transition probabilities in a |RS × Y| × |RS × Y| matrix denoted by Ωθ . Each row of this matrix corresponds to the probability distribution over states (r0 , y 0 ) starting from state (r, y), conditional on 16

See footnote 13.

20

the arrival of a new submission. If the new submission does not arrive, then there is no transition and the state remains (r, y). In matrix notation, where each row is a different state, the expected benefit of sending one extra submission is given by B θt = (1 − e−(T −t)λθ )Ωθ π + e−(T −t)λθ π. Consider a given state s. With probability (1 − e−(T −t)λ ) the new submission arrives before the end of the contest. The score of that submission (drawn from qθ ) determines the probability distribution over final payoffs. This is given by the s-row of the matrix Ωθ . The expected payoff is computed as (Ωθ )s• ·π which corresponds to the dot-product between the probability distribution over final states starting from state s and the payoff of each terminal state. With probability e−(T −t)λ the new submission is not finished in time and therefore the final payoff for the player is given by πs (the transition matrix is the identity matrix). A player chooses to plays if and only if the expected benefit of playing net of the cost of building a submission is larger than the expected payoff of not playing, i.e., B θt − c ≥ π

⇐⇒

(1 − e−(T −t)λ )[Ωθ − I]π ≥ c.

(2)

We can now easily incorporate the belief pt (n) in Equation 2. With myopic players, the final state does not depend on the order of submissions, because payoffs are realized at the end of the competition,17 so each player cares only about their ownership at the final state. Thus, we can replace the final payoff by the expected payoff after n rival submissions and then let the agent decide whether to make her last submission considering this new expected payoff. That is, from state s, there is a probability distribution over S after n rival submissions (of scores drawn from the distribution G) ˆ n , where Ω ˆ is constructed similarly to Ωθ but replacing given by the s-th row of matrix Ω qθ (·) by the mixture probability g(·). Instead of considering the payoff π before the last ˆ n π with probability pt (n). Hence, the play, the player considers the expected payoff Ω player plays if and only if: (1 − e−(T −t)λ )[Ωθ − I]

∞ X

ˆ n πpt (n) ≥ c. Ω

(3)

n=0

Equation 3 is similar to Equation 2, except that now the final payoff depends on the player’s belief about the number of submissions made by rival players in the future. 17

Except for ties, but we deal with this issue in the numerical implementation.

21

Using the definition of pt (n) and the exponential of a matrix we obtain18 ˆ

Γθ,t ≡ (1 − e−(T −t)λ )[Ωθ − I]eγt [Ω−I] π ≥ c.

(4)

Equation 4 provides a tractable equation that can be used for estimation, by making use of efficient algorithms to compute the exponential of a matrix. Future competition in this equation is captured through γt , the beliefs over the total number of remaining rival submissions in the contest. Conditional on a state s = (t, r, y) there are two effects driving the comparative statics on t: As the competition approaches its end, on the one hand a player has less incentives to make an extra submission because she is less likely to finish building it before the end of the competition. On the other hand, it faces fewer rival submissions, which gives her more incentives to send an extra submission later on in the contest. The comparative statics on γ is intuitively clear: When γt > γt0 players have less incentives to play because they expect more competition in the future. Finally, given that higher types draw better scores than low types, high types have larger incentives to play conditional on a given state.

3.1

Discussion of Model Assumptions

Some of the assumptions in our model are made for computational tractability or to keep the model parsimonious, whereas others are justified from empirical observations. Our analysis does not incorporate learning. Teams might experiment (Figure 2) and get a better understanding of the problem over time, which may lead them to potentially improve their performance over time.19 We do not incorporate learning for two reasons. First, learning would make the model more involved. And second, Table 3 shows that between-team differences explain the majority of the systematic variation in scores. A second assumption of our model is that entry is exogenous. In reality, players choose which contests to participate in. Azmat and Möller (2009) show that contest design (in particular, the allocation of prizes) affects players decisions when they choose among multiple contests. Also Levin and Smith (1994), Bajari and Hortacsu (2003), and Krasnokutskaya and Seim (2011) explore how endogenous entry affects equilibrium outcomes P∞ n The exponential of a matrix A is defined by eA ≡ n=0 An! 19 Clark and Nilssen (2013), for example, present a theory of learning by doing in contests.

18

22

and optimal design. Although we acknowledge this shortcoming of our analysis, we have various reasons to make this assumption. First, in our data most players participate in a single contest (see Table A.3 in the Online Appendix), so it is hard to define a group of potential entrants. Second, all contests in Kaggle display a leaderboard, so we cannot identify how this feature of contest design (displaying a leaderboard) affects entry using the observational data. Finally, and as we will discuss below, our experiment reveals that contests with and without a public leaderboard draw the same amount of participants on average, which alleviates the concern of endogeneous entry. A third potential concern is the assumption that players do not strategically choose when to send their submissions. Ding and Wolfstetter (2011) show that players could withhold their best solutions and negotiate with the sponsor of the contest after the contest has ended. This selection introduces a bias on the quality of submitted solutions. In our setting, players benefit by sending a submission, because they receive a noisy signal about the performance of the submission. We also find that the timing of disruptive submissions is roughly uniformly distributed over time (as shown in Figure 3), which alleviates the concern of strategic timing of submissions. Also related is the assumption that players make a decision to continue or to quit immediately after the arrival of a submission. If we observe two submissions by a player at times t1 and t2 , we know that this player must have spent some time t ∈ [0, t2 − t1 ] working on the submission. Instead of modeling the distribution of idle time between submissions— similar to the random time of play assumption in Arcidiacono et al. (2016)—we assume that the idle time is zero, i.e, t = t2 − t1 . We make this assumption because we observe a short time between submissions. Thus, the effect of idle time is likely to be small, but adding this effect incorporates an extra burden in the estimation of our model. We present a model of myopic players, purely motivated for computational tractability. The problem is that the estimation of a contest (by backward induction) takes a long time given the size of our state space.20 Myopic players bias the results towards less participation and more exit than a model with fully rational players would predict. A non-myopic player would expect a higher continuation payoff in the future (because of the option value of playing again), so conditional on a cost realization the myopic 20

In an earlier draft, we included estimates for the dynamic model for a handful of contests. The

estimates suggested that the myopic assumption only caused a small bias in the cost estimates.

23

player has less incentives to play than forward looking agent.

4

Estimation

We estimate the parameters of the model in two steps. First, we estimate a number of primitives directly from the data. Second, using the estimates of the first step, we estimate the remaining parameters using a likelihood function constructed based on the model. We repeat this procedure for each contest. When estimating the model, we restrict attention to the subsample of competitive teams (see Table 2). The full set of parameters for a given contest include: i) the distribution of new player arrival times, which we assume follow an exponential distribution with parameter µ; ii) the distribution of submission arrival times, which we assume follow an exponential distribution with parameter λ; iii) the distribution of private score conditional on public score, H(·|ppublic ), which we assume is given by pprivate = α + βppublic + , with distributed according to a double exponential distribution; iv) the type-specific cumulative distribution of public scores, which we assume is given by the standard normal distribution, Qj (x) = Φ

x−θjmean θjst.dev

for type θj ; v) the distribution of types, κ, which we assume

is a discrete distribution over the set of player types, Θ; vi) the time-specific distribution of the number of submissions that will be made in the remainder of the contest, pt (n), which we assume follows a Poisson distribution with parameter γt = γ · (T − t); and, lastly, vii) the distribution of submission costs, which we assume has a support that is bounded above by 1 (i.e., the normalized value of the total prize money), and has a cumulative distribution function given by K(c; σ) = cσ (with σ > 0). We estimate primitives i) through vi) in the first step, and vii) using the likelihood function implied by the model. i), ii), and iii) are estimated using the maximum likelihood estimators for µ, λ, and (α, β), respectively. We estimate iv) and v) using a Gaussian mixture model that we estimate using the EM algorithm. The EM algorithm estimates the k Gaussian distributions (and their weights, κ(θk )) that best predict the observed distribution of public scores. Throughout our empirical analysis we assume that there are k = 2 player types.21 Appendix B in the Online Appendix provides 21

We experimented with different number of types. k = 2 is parsimonious and gave us a good fit.

24

additional details of the estimation of these objects. Lastly, for vi), and as discussed above, we impose that γ must equal the observed number of submissions in each contest (see Table 2), as a way of capturing γ as an equilibrium object. The linearity assumption, γt = γ · (T − t), is made to simplify the computation of the equilibrium. Under this assumption, finding an equilibrium entails finding a single number as a fixed point, rather than a function. In each of the counterfactuals, we recompute γ as an equilibrium object. The likelihood function implied by the model is based on the decision of a player to make a new submission. Recall that a player chooses to make a new submission immediately after the arrival of each of his submissions. A player facing state variables s chooses to make a new submission at time t if and only if Γθ,t (s) ≥ c,

(5)

ˆ

where Γθ,t = (1 − e−(T −t)λθ )[Ωθ − I]eγ(T −t)[Ω−I] π is the vector of the net benefits of making a new submission at time t for all posible states s (before deducting the cost of making a submission) and c is the cost of a submission. Γθ,t depends only on primitives estimated in the first step of the estimation, which simplifies the rest of the estimation. When computing Γθ,t we partitioned the contest time [0, 1] in 200 time intervals. Based on Equation 5, a θ-type player facing state variables s plays at time t with probability P r(Γθ,t (s) > c), so we have Pr(play|s, t, θ) = K (Γθ,t (s)) . Given that we do not observe the player’s type, we take the expectation with respect to θ, which yields Pr(play|s, t) =

X

κ(θ)K (Γθ,t (s)) ,

θ

where κ(θ) is the probability of a player being of type θ. The likelihood is constructed using tuples {(si , ti , t0i )}i∈N , where i is a submission, si is the vector of state variables at the moment of making the submission, ti is the submission time, and t0i is the arrival time of the next submission, which may or may not be observed. If the next submission is observed, then ti < t0i ≤ T , if not, t0i > T . If 25

the new submission arrives at t0i ≤ T , then the player must have chosen to make a new submission at ti , and the likelihood of the observation (si , ti , t0i ) is given by l(si , ti , t0i ) = 0

0

Pr(play|si , ti ) · λe(−λ(ti −ti )) , where λe(−λ(ti −ti )) is the density of the submission arrival time. If we do not observe a new submission after the player’s decision at time t (i.e., t0i > T ), then the likelihood of (si , ti , t0i > T ) is given by l(si , ti , t0i > T ) = Pr(play|si , ti ) · e(−λ(T −ti )) + 1 − Pr(play|si , ti ), which considers both the events of i) the player choosing to make a new submission at ti and the submission arriving after the end of the contest; and ii) the event of the player choosing not to make a new submission. The log-likelihood function is then given by L(δ) =

X

log l(si , ti , t0i ),

i∈N

where δ is the vector of structural parameters. We perform inference using the asymptotic distribution of the maximum likelihood estimator.

4.1

Model Estimates

Table 6 presents the maximum likelihood estimates for the submission-cost distribution as well as for the distributions of entry time and submission arrival time. Table A.5 in the Online Appendix presents the EM algorithm estimates for the type-specific distributions of scores, and Table A.6 in the Online Appendix presents estimates for the distribution of private scores conditional on public scores. The model was estimated separately for each contest. Table 6 (Column 1) shows estimates for the players’ rate of entry in a given competition. The estimates imply that the average entry time ranges between 22 and 63 percent of the contest time, and the mean average entry time across all contests is 41 percent of the contest time. Table 6 (Column 3) presents the estimates for the rate at which submissions are completed. In line with Table 2, the estimates suggest that the average time between submissions ranges between 0.5 and 5.5 percent of the contest time, and the mean average time between submissions across all contests is 1.5 percent of the contest time. 26

Table 6 (Column 5) presents estimates for the coefficients governing the distribution of submission costs. These estimates imply that the expected submission cost ranges between $1.89 and $649.22 Figure 5 shows some implications of our estimates. Figure 5(a) shows the distribution of the expected cost of making a submission (in dollars), and Figure 5(b) shows the daily cost of working on a submission (in dollars). The average values for the expected cost of a submission and the daily cost of a submission are $57.1 and $47.4, respectively. Figure 5(c) shows a scatter plot of the total expected cost spent by all participants of a contest and the prize, both measured in logs. We can see that in the majority of the contests the total expected cost is greater than the prize, suggesting rent dissipation. µ

SE

λ

SE

σ

SE

ˆ log L(δ)/N

N

hhp

2.585

0.1701

191.9182

1.6088

0.0013

0.0001

-4.123

14231

allstate-purchase-prediction-challenge

1.9824 0.1274

125.6546

1.1708

0.0045

0.0005

-3.6895

11519

higgs-boson

2.3698

0.114

122.1003

0.8177

0.0013

0.0001

-3.6756

22298

acquire-valued-shoppers-challenge

2.0772

0.1316

165.2723

1.2866

0.0011

0.0001

-3.9934

16500

liberty-mutual-fire-peril

3.3344

0.3194

122.6742

1.3403

0.0008

0.0002

-3.7291

8377

axa-driver-telematics-analysis

2.4383

0.1405

127.6736

0.8938

0.0009

0.0001

-3.7595

20405

crowdflower-search-relevance

2.1332

0.1093

83.3317

0.6605

0.0021

0.0002

-3.2883

15919

caterpillar-tube-pricing

3.2674

0.1757

68.2579

0.5565

0.001

0.0001

-3.0972

15047

liberty-mutual-group-property-inspection-prediction

3.1055

0.1152

67.0227

0.4112

0.0062

0.0004

-3.0375

26573

coupon-purchase-prediction

2.0586

0.1093

73.1853

0.6539

0.0009

0.0001

-3.146

12526

springleaf-marketing-response

3.0405

0.1689

97.3279

0.7153

0.0008

0.0001

-3.4726

18513

homesite-quote-conversion

2.4958

0.1548

128.5381

0.9678 0.0004

0.0001

-3.7599

17638

prudential-life-insurance-assessment

2.1598 0.0799

78.0817

0.4707

0.004

0.0003

-3.2007

27512

santander-customer-satisfaction

2.3098

0.0563

75.1579

0.3048

0.0055

0.0002

-3.1614

60816

expedia-hotel-recommendations

2.2792

0.086

43.2208

0.3422

0.0009

0.0001

-2.564

15948

Table 6: Maximum Likelihood Estimates of the Cost and Arrival Distributions (partial list).

Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’ See Table A.4 in the Online Appendix for the full table. 22

The expected cost in dollars is given by

σV 1+σ ,

where V is the total reward.

27

.04 .03

.03

Density .02

Density .02

0

.01

.01 0

0

200 400 600 Expected cost of a submission (in dollars)

100 200 300 Cost of submission per day

400

(b) Cost of a submission per day (in dollars)

8

Prize (in logs) 10 12 14

16

(a) Cost of a submission (in dollars)

0

8

10 12 14 Expected cost of all submissions (in logs)

16

(c) Rent dissipation Figure 5: Estimates for the cost of making a submission Note: An observation is a contest. Cost of a submission per day is the expected cost divided by the average number of days between submissions. The average values for the expected cost of a submission and the daily cost of a submission are 57.1 and 47.4 dollars, respectively. The expected cost of all submissions is the expected cost of a submission multiplied by the predicted number of submissions for each contest. The predicted number of submissions is based on 2, 000 simulations of each contest using our model estimates.

Table 7 studies how the estimates for the average cost of making a submission, the rate of team entry, and the rate of arrival of submissions vary as a function of contest prize. The table shows a positive correlation between the contest reward and both the average cost of making a submission and the rate of team entry. This suggests that contests with larger prizes are more difficult and lead teams to enter sooner. The greater difficulty is consistent with the empirical observation that teams remain active for less time in 28

log Prize (in USD)

(1)

(2)

(3)

log E[c]

log λ

log µ

0.8906∗∗∗

0.2085∗∗∗

-0.0021

(0.1040)

(0.0388)

(0.0228)

57

57

57

0.494

0.232

0.000

Observations R

2

Table 7: Parameter estimates and contest observables Note: Robust standard errors in parentheses.

∗

p < 0.1,

∗∗

p < 0.05,

∗∗∗

p < 0.01. An observation is a

contest. Expected cost is given by σ/(1 + σ). The values of σ, µ, λ are reported in Table A.4.

competitions with greater rewards (i.e., the exit rate is higher). To capture this pattern in the data, the model needs a larger cost in order to fit the larger exit rate.23 Finally, Figure A.1 in the Online Appendix presents a scatter plot of the entry rate of teams and the arrival rate of submissions, and shows a weak negative correlation. With respect to how well the model fits the data, Figure 6 plots the actual versus the predicted number of submissions in each contest. The predicted number of submissions in a contest is computed by averaging the number of submissions across 2,000 simulations of the contest. The simulations make use of the estimates of the model and take the number of teams that participate in each contest as given. The correlation between the actual and the predicted number of submissions is 0.97. The figure shows that the model does not systematically over or under predict participation. Figure 7 shows the fit of the EM algorithm for one of the competitions. These figures suggest a good fit along the dimensions of the number of submissions and distribution scores. 23

Almost every competition in our data lasts three months, so the data offers little variation in

contest length to establish relationships between estimates and contest length.

29

Predicted participation (in logs)

11

10

9

8

7

6 6

7

8

9

10

11

12

Actual participation (in logs)

Figure 6: Number of Submissions Predicted by the Model Versus Actual Number of Submissions Note: An observation is a contest. The coefficient of correlation between the actual and predicted number of submissions is 0.97. The predicted number of submissions is based on 2, 000 simulations of each contest using our model estimates.

1.2

0.8 Type 1 Type 2

Model Data

0.7 1 0.6 0.8

Density

Density

0.5

0.4

0.6

0.3 0.4 0.2 0.2 0.1

0 -10

-8

-6

-4

-2

0

2

0 -10

4

Score

(a) Distribution of scores by type.

-5

0

5

Score

(b) Distribution of scores in the data and predicted by the EM algorithm.

Figure 7: Estimates of the distribution of scores by type for the contest ‘Homesite Quote Conversion’

30

5

Counterfactual Contest Designs

In this section, we use our model estimates to compare the baseline contest design with a number of counterfactual designs. We evaluate each counterfactual contest design by three performance indicators: the total number of submissions, the 99th percentile of the distribution of scores, and the maximum score. The total number of submissions is a proxy for diversity whereas the moments of the score distribution measure the quality of the “best solutions.” In the counterfactual exercises, the expected number of submissions, γ, is estimated as an equilibrium object. An equilibrium γ ∗ equates the number of submissions implied by the model, N (γ ∗ ) with the total expected number of submissions, γ ∗ T . The equilibrium exists and it is uniquely determined because, conditional on all the other parameters of the model (including the information design), the expected number of submissions N (γ) is decreasing in γ—players have less incentives to play when they expect more rival submissions—and γT is increasing in γ. Therefore, there is a unique fixed point.

5.1

Information Design

We first study the role of information disclosure. Kaggle competitions score each submission according to a partition of the test dataset—e.g., 60 percent of test data to generate public scores and 40 percent to generate private scores. The contest designer chooses the size of this partition and whether to disclose public scores. In the competitions in our data, the contest designer discloses the size of each of the subsets that partition the test-dataset, and the public scores. However, the final standings are computed using the private scores, which are revealed only at the end of the competition. We consider two counterfactual designs. In the first one, the sponsor does not display a leaderboard: participants only observe their own scores but they do not observe their rivals’ scores. In the second one, a public leaderboard is displayed but the sponsor eliminates the noise in the evaluation (the public score equals the private score). No Public Leaderboard

31

With a public leaderboard, at any given time t, players have more incentives to play after histories where the maximum score is low, relative to histories where the maximum score is high. Without a public leaderboard, players’ decisions to continue playing or to quit are based on their belief about the current history, which is an average across all feasible histories. When taking this average, the favorable histories “cross-subsidize” unfavorable ones. Depending on the strength of the cross-subsidization, a public leaderboard may encourage or discourage participation. We simulate the counterfactual case where all competitions are held without a public leaderboard. Perfect Correlation between Public and Private Scores The sponsor of a competition decides the correlation between public and private scores. All of the competitions in our data feature imperfect correlation between public and private (although this correlation is 0.99). We simulate a counterfactual where the sponsor removes the noise between public and private scores, perfectly informing participants about their current ranking. Noisy signals can encourage or discourage participation— players’ incentives in a competition with an extremely noisy leader are the same as players’ incentives in a competition without leaderboard.

5.2

Number of Prizes

Most of the competitions in our data award three prizes. Instead, the sponsor of a competition could decide to award a single prize. We study the counterfactual where the sponsor allocates a single prize to the winner, keeping the total reward fixed. Relative to a single prize, multiple prizes strengthen the incentives to be “at the top” of the ranking, but weaken the incentives to reach the first place, conditional on being at the top of the ranking. Hence, total participation in a contest may increase or decrease when there are multiple prizes rather than a single prize.

5.3

Limiting the Number of Participants

The majority of Kaggle competitions are open to anyone willing to participate. Some competitions, however, restrict participation. We consider the counterfactual case 32

where the number of entrants is limited to 90 percent of the actual number of entrants in each competition. Limiting participation has a direct effect (fewer participants) that could worsen outcomes, and an indirect equilibrium effect (less competition) that could improve outcomes.

5.4

Estimation Results for Counterfactual Designs

Table 8 reports estimates for how contest outcomes change in each counterfactual contest design, measured relative to the contest outcomes in the baseline case with the public leaderboard. (1)

(2)

(3)

(4)

Contest-level outcomes (in logs) Number of

Max

99th pct

submissions

score

score

∗∗∗

No leaderboard

-0.236

∗∗∗

-0.013

∗∗∗

-0.008

(5)

(6)

Team-level outcomes (in logs) Av number of submissions All teams ∗∗∗

-0.236

Low-type

High-type

-0.015

-0.322∗∗∗

(0.036)

(0.002)

(0.002)

(0.036)

(0.051)

(0.033)

Fully informative

0.033∗∗

0.002∗∗

0.000

0.033∗∗

0.012

0.037∗∗

leaderboard

(0.014)

(0.001)

(0.001)

(0.014)

(0.021)

(0.014)

-0.082∗∗∗

-0.002∗∗

0.002∗∗

-0.082∗∗∗

-0.247∗∗∗

-0.052∗∗∗

participation

(0.011)

(0.001)

(0.001)

(0.011)

(0.042)

(0.014)

Single prize

0.005

0.000

0.000

0.005

0.002

0.006

(0.011)

(0.000)

(0.001)

(0.011)

(0.019)

(0.010)

Limited

N

285

285

285

285

285

285

R2

0.992

0.999

0.999

0.961

0.937

0.957

Table 8: Contest outcomes on contest design Notes: An observation is a contest–design combination. All specifications include contest fixed effects. The definition of the variables is as follows: ‘all’ is total submissions, ‘av’ is the average number of submissions per team, ‘sm’ is the max score in a competition, ‘s99p’ is the score at the 99th percentile in a competition, ‘m’ is the average number of submissions per low-type teams, and ‘h’ is the average number of submissions per high-type team. Type θi is the high-type if θimean + 3θist.dev > θjmean + 3θjst.dev .

The first row of Table 8 presents the results of comparing the contest performance indicators in the counterfactual case without a public leaderboard relative to the baseline 33

case with a public leaderboard. The results in columns 1 to 3 indicate that hiding the public leaderboard on average leads to a lower maximum score as well as fewer submissions. Columns 4 to 6 explore heterogeneity results and suggest that these results are driven by an average decrease in the number of submissions made by high-type teams. We define type θi to be the high type if θimean + 3θist.dev > θjmean + 3θjst.dev (see Table A.5 for the type-specific parameter estimates). Figure A.2 in the Online Appendix plots the distribution of effects across contests, and reveals that the number of submissions and the maximum score decrease for the great majority of contests without a leaderboard. When comparing participation with and without a leaderboard, there are several forces at play: the rate at which players enter, the rate at which submissions arrive, the distribution of costs, and the distribution of players’ types. All of these effects combined determine the distribution of histories, which in turn determines participation decisions. Without a leaderboard, and given that prizes are awarded at the end of the competition, at any time t players consider a fixed benchmark—the expected final maximum score of their rivals. With a leaderboard, players update their beliefs about the maximum score at the end of the competition, considering the remaining contest time and the current maximum score. Participation improves with a leaderboard when the increase in participation at favorable histories (those with a low maximum score) outweighs the decrease in participation at unfavorable histories (those with a high maximum score). Table A.7 in the Online Appendix shows that at any moment of time players send more submissions when the maximum score is lower, which is in line with our descriptive analysis (e.g., see Table 4 and Table 5). Furthermore, the table also shows an asymmetric response to the maximum score when it is below or above the expected maximum score: a maximum score that is below the expected maximum score by x, increases the number of submissions by more than the decrease in submissions caused by a maximum score that is greater than the expected maximum score by x. This asymmetry suggests that the increase in participation in favorable histories outweighs the decrease in participation in unfavorable histories, explaining the result of increased participation with a public leaderboard. In explaining this asymmetry, we note that exit decisions are irreversible in our model, so observing a favorable history in the leaderboard may encourage a player to stay in the competition, adding future opportunities to play for that player, whereas without 34

a leaderboard the same player may be discouraged and quit the competition. Another beneficial aspect of displaying a leaderboard are non-pecuniary incentives, which we do not model. For instance, Lerner and Tirole (2002) rationalize collaboration in open-software as a signaling mechanism to potential employers, Moldovanu et al. (2007) provides examples where players care about the relative position (status), and Brunt et al. (2012) present evidence that non-monetary awards (such as medals) are more important than monetary awards to encourage competition. The leaderboard in a competition may provide all of these elements: a signalling device, status, and a progression system.24 If anything, modeling non-pecuniary motives would magnify our estimated benefits of showing a leaderboard. The second row of Table 8 presents the results of comparing the contest performance indicators in the counterfactual case where the public leaderboard is fully informative relative to the baseline case with a noisy public leaderboard. This row shows that a fully informative leaderboard on average improves performance. The direction of this effect is consistent with the previous result (no leaderboard and large noise should move in the same direction), although the magnitude of the change in participation and upper tail outcomes is small. A fully informative leaderboard, however, increases the risk of participants engaging in overfitting—working on solutions that maximize the public score ranking but are not robust outside of the test data. Our model does not incorporate overfitting, but knowing that a fully informative leaderboard does not have a large impact on outcomes provides an argument in favor of a noisy public leaderboard. The third row of Table 8 compares outcomes in the case where the number of participants in each contest are decreased by 10 percent relative to the baseline case. We find that the measures of contest performance worsen, implying that the encouragement players get by facing fewer competitors does not outweigh the direct effect of 10 percent fewer players.25 24

The Kaggle website states: “Kaggle’s Progression System uses performance tiers to track your

growth as a data scientist on Kaggle. Along the way, you’ll earn medals for your achievements and compete for data science glory on live leaderboards.” 25 Table A.8 in the Online Appendix further explores counterfactual designs that limit participation. The table shows that decreasing the number of teams between 1 and 9 percent on the total number of submissions weakly decreases the number of submissions (statistically speaking).

35

The last row of Table 8 shows how contest performance changes in the counterfactual case where a single prize is awarded. Our results show that changing the allocation of prizes has a small and statistically insignificant overall effect on participation. In our model, conditional on a type, players outside the top 3 have the same incentives to play, because each new submission is a random draw from a distribution (and there is no learning). And because the difference between the first score and the third score at the end of every competition is very small, this is a second order effect on incentives.

6

Experimental Evidence

To complement our structural estimates, we ran a randomized control trial on Kaggle.26 The objective of the experiment is to provide additional evidence—independent of our model’s assumptions—on how information disclosure impacts participation and outcomes. The experiment allowed us to observe contest outcomes in competitions with and without a leaderboard, keeping other aspects of the contest fixed (e.g., difficulty, prize, duration, number of participants).

6.1

Description of the Experiment

We hosted 44 competitions on Kaggle and each competition was randomly assigned to the treatment or control groups. Our treatment competitions displayed a real-time leaderboard, providing information about the performance of all participants, whereas our control competitions did not provide feedback to players. All of the competitions were identical in other aspects of design. The competitions were run simultaneously and lasted for 10 days. The competitions consisted on solving a simple prediction problem: to interpolate a function (see Online Appendix C for details). Participants were allowed to submit up to 10 sets of predictions per day. The most accurate predictions in each competition were awarded an Amazon gift card worth $50. We recruited 220 students (both undergraduates and graduates) from the University of Illinois at Urbana Champaign, via emails, department newsletters, and flyers. Par26

Approval from the University of Illinois Human Subjects Committee, IRB18644.

36

ticipants were asked to complete an initial survey from which we obtained information about participants such as past experience with online competitions and data analysis. There were also asked to create a Kaggle username. With this pool of potential players, we formed 44 competitions of 5 players each. Participants were randomly allocated to these 44 competitions. Table 9 shows the outcome of the randomization. The left panel (“Invited players”) shows the balance of covariates across competitions in the treatment and control groups. The table shows no statistically significant differences across groups in a number of covariates related to the participants’ knowledge of statistical tools and experience. The right panel (“Entrants”) repeats the analysis, but restricts attention to the participants who submitted at least one solution during the competition. This second panel shows no statistically significant differences across contests in the control and treatment groups both in the number of players who submitted at least one solution and in the composition of their participants. Invited players Variable

Entrants

Control

Treatment

t-stat

5

5

-

3.227

3.545

1.151

participated_past

0.236

0.191

-0.954

0.202

0.244

0.507

software_code

0.964

0.973

0.403

0.968

0.987

0.909

stat_tools

0.882

0.836

-0.883

0.887

0.934

0.822

mach_learning

0.536

0.518

-0.276

0.615

0.607

-0.082

regression

0.736

0.709

-0.487

0.808

0.747

-0.77

Participants

Control Treatment

t-stat

Table 9: Average covariates at the contest level: Randomization results Notes: An observation is a contest. ‘Invited players’ is the pool of players who were invited to enter a competition, and ‘Entrants’ is the pool of players who submitted at least one submission during the competition. Treated contests are the contests where a leaderboard was displayed. All variables are defined at the contest level as follows: ‘participants’ is the number of players in a competition, ‘participated_past’ is the share of players who have participated in a prediction contest in the past, ‘software_code’ is the share of players who know how to use a statistical software, ‘stat_tools’ is the share of players who have statistical skills, ‘mach_learning’ is the share of players who have machine learning skills, and ‘regression’ is the share of players who have regression analysis skills.

37

6.2

Experimental Results

Table 10 shows the main results of the experiment, which are in line with the main finding of our structural estimates: Participation and outcomes improve when a realtime leaderboard is displayed. Columns 1, 2 and 3 in Table 10 show that outcomes are worse in competitions that do not display a leaderboard, relative to competitions with leaderboard. Column 1 shows that the maximum score was on average 0.057 lower in competitions without a leaderboard, a magnitude that is 29.68 percent of the average maximum score across all contests. This result is robust to controlling for the number of entrants in each competition (column 2) and after controlling for player covariates (column 3). (1)

(2)

(3)

Maximum score No leaderboard

(4)

(5)

(6)

Number of submissions

-0.057∗∗

-0.045∗∗

-0.050∗∗

-31.636∗∗∗

-26.767∗∗∗

-26.758∗∗∗

(0.022)

(0.019)

(0.020)

(7.073)

(5.250)

(5.973)

[0.014]

[0.03]

[0.042]

[0.000]

[0.000]

[0.000]

0.038∗∗∗

0.038∗∗∗

15.302∗∗∗

15.843∗∗∗

(0.013)

(0.014)

(3.050)

(3.244)

Entrants Controls

No

No

Yes

No

No

Yes

Observations

44

44

44

44

44

44

0.135

0.332

0.398

0.323

0.565

0.610

0.192

0.192

0.192

23.636

23.636

23.636

R

2

Dep. variable mean

Table 10: The effect of the leaderboard on contest outcomes: Experimental results Notes: Robust standard errors in parentheses. p-values for Monte Carlo permutation tests to allow for arbitrary randomization procedures in squared brackets (based on 1,000 replications). ∗∗

p < 0.05,

∗∗∗

∗

p < 0.1,

p < 0.01. An observation is a contest. The definition of the variables is as follows:

‘No leaderboard’ is an indicator for contests without a leaderboard and ‘Entrants’ is the number of entrants. Controls include the share of participants in the contest who have i) participated in a prediction contest in the past, ii) know how to use a statistical software, iii) have statistical skills, iv) have machine learning skills, and v) have regression analysis skills.

Columns 4, 5, and 6 in Table 10 show that the number of submissions is on average lower in competitions without a leaderboard versus competitions with a leaderboard. 38

Column 4 shows that competitions without a leaderboard received an average of 31.636 fewer submissions than competitions with a leaderboard, which is a large effect relative to the average number of submissions across all contests. This result is also robust to controlling for the number of entrants in each competition (column 5) and player covariates (column 6).

7

Discussion

We contribute to the literature of competition design by investigating how different elements of design affect players’ incentives in a dynamic environment. We use field data from Kaggle.com to build and estimate a structural model used for counterfactual analysis, and we complement this analysis with experimental evidence from competitions also run on Kaggle.com. To build a computationally tractable model, we rely on various simplifications, most of them motivated by empirical evidence. We assume that players enter the contest at some exogenous time and, after observing the cost of building a submission, they decide to exit or to build a new submission. If they exit, they cannot reenter the competition. If they play, they must wait a random time until the next opportunity to play. Different contest designs affect players’ expectations about the benefit of continuing playing, hence distorting players’ participation incentives. Our counterfactual simulations show that a public leadearboard improves performance in a competition: the total number of submissions, the 99 percentile of the distribution of scores, and the maximum score increase, relative to the counterfactual competition without a leaderboard. Even more, when a real-time leaderboard increases participation, it is because it encourages high-type players to stay in the competition. The intuition behind this result is that the increase in participation created by favorable histories (those with a low maximum score) outweighs the decrease in participation created by unfavorable histories (those with a high maximum score). This asymmetric effect is caused by irreversible exit decisions, because players that stay in the competition face multiple future opportunities to play.

39

To complement our analysis, we ran an experiment in which we randomly allocated participants into competitions without a leaderboard (control group) or with a leaderboard (treatment group), to measure the impact of a public leaderboard on participation and contest outcomes. The experiment is independent from our modeling assumptions, so it serves the double purpose of providing experimental evidence for the question of contest design and contest outcomes and testing the predictions of our model. Our experimental findings are consistent with our counterfactual simulations results: the number of submissions and the maximum score increase in competitions that display a public leaderboard.

8

References

Aoyagi, Masaki (2010) “Information feedback in a dynamic tournament,” Games and Economic Behavior, Vol. 70, pp. 242–260. Arcidiacono, Peter, Patrick Bayer, Jason R Blevins, and Paul B Ellickson (2016) “Estimation of dynamic discrete choice models in continuous time with an application to retail competition,” The Review of Economic Studies, Vol. 83, pp. 889–931. Athanasopoulos, George and Rob J Hyndman (2011) “The value of feedback in forecasting competitions,” International Journal of Forecasting, Vol. 27, pp. 845–849. Azmat, Ghazala and Marc Möller (2009) “Competition among contests,” The RAND Journal of Economics, Vol. 40, pp. 743–768. Bajari, Patrick and Ali Hortacsu (2003) “The winner’s curse, reserve prices, and endogenous entry: Empirical insights from eBay auctions,” RAND Journal of Economics, pp. 329–355. Balafoutas, Loukas, E Glenn Dutcher, Florian Lindner, and Dmitry Ryvkin (2017) “The Optimal Allocation of Prizes in Tournaments of Heterogeneous Agents,” Economic Inquiry, Vol. 55, pp. 461–478. Benkert, Jean-Michel and Igor Letina (2016) “Designing dynamic research contests,” University of Zurich, Department of Economics, Working Paper.

40

Bhattacharya, Vivek (2016) “An Empirical Model of R&D Procurement Contests: An Analysis of the DOD SBIR Program,” MIT, Department of Economics, Working Paper. Bimpikis, Kostas, Shayan Ehsani, and Mohamed Mostagir (2014) “Designing dynamic contests,” Working paper, Stanford University. Boudreau, Kevin J, Nicola Lacetera, and Karim R Lakhani (2011) “Incentives and problem uncertainty in innovation contests: An empirical analysis,” Management Science, Vol. 57, pp. 843–863. Boudreau, Kevin J, Karim R Lakhani, and Michael Menietti (2016) “Performance responses to competition across skill levels in rank-order tournaments: field evidence and implications for tournament design,” The RAND Journal of Economics, Vol. 47, pp. 140–165. Brunt, Liam, Josh Lerner, and Tom Nicholas (2012) “Inducement prizes and innovation,” The Journal of Industrial Economics, Vol. 60, pp. 657–696. Che, Yeon-Koo and Ian Gale (2003) “Optimal design of research contests,” The American Economic Review, Vol. 93, pp. 646–671. Chesbrough, Henry, Wim Vanhaverbeke, and Joel West (2006) Open innovation: Researching a new paradigm: Oxford University Press on Demand. Clark, Derek J and Tore Nilssen (2013) “Learning by doing in contests,” Public Choice, Vol. 156, pp. 329–343. Cohen, Chen, Todd R Kaplan, and Aner Sela (2008) “Optimal rewards in contests,” The RAND Journal of Economics, Vol. 39, pp. 434–451. Ding, Wei and Elmar G Wolfstetter (2011) “Prizes and lemons: procurement of innovation under imperfect commitment,” The RAND Journal of Economics, Vol. 42, pp. 664–680. Ederer, Florian (2010) “Feedback and motivation in dynamic tournaments,” Journal of Economics & Management Strategy, Vol. 19, pp. 733–769.

41

Fullerton, Richard L and R Preston McAfee (1999) “Auctioning entry into tournaments,” Journal of Political Economy, Vol. 107, pp. 573–605. Genakos, Christos and Mario Pagliero (2012) “Interim rank, risk taking, and performance in dynamic tournaments,” Journal of Political Economy, Vol. 120, pp. 782– 813. Goltsman, Maria and Arijit Mukherjee (2011) “Interim performance feedback in multistage tournaments: The optimality of partial disclosure,” Journal of Labor Economics, Vol. 29, pp. 229–265. Gross, Daniel P (2015) “Creativity Under Fire: The Effects of Competition on Creative Production,” Available at SSRN 2520123. (2017) “Performance feedback in competitive product development,” The RAND Journal of Economics, Vol. 48, pp. 438–466. Halac, Marina, Navin Kartik, and Qingmin Liu (2014) “Contests for experimentation,” Journal of Political Economy, Forthcoming. Hendricks, Kenneth and Robert H Porter (1988) “An empirical study of an auction with asymmetric information,” The American Economic Review, pp. 865–883. Hinnosaar, Toomas (2017) “Dynamic common-value contests.” Huang, Yan, Param Vir Singh, and Kannan Srinivasan (2014) “Crowdsourcing new product ideas under consumer learning,” Management science, Vol. 60, pp. 2138– 2159. Jeppesen, Lars Bo and Karim R Lakhani (2010) “Marginality and problem-solving effectiveness in broadcast search,” Organization science, Vol. 21, pp. 1016–1033. Kireyev, Pavel (2016) “Markets for Ideas: Prize Structure, Entry Limits, and the Design of Ideation Contests,” HBS, Working Paper. Klein, Arnd Heinrich and Armin Schmutzler (2016) “Optimal effort incentives in dynamic tournaments,” Games and Economic Behavior.

42

Krasnokutskaya, Elena and Katja Seim (2011) “Bid preference programs and participation in highway procurement auctions,” The American Economic Review, Vol. 101, pp. 2653–2686. Lakhani, Karim R, Kevin J Boudreau, Po-Ru Loh, Lars Backstrom, Carliss Baldwin, Eric Lonstein, Mike Lydon, Alan MacCormack, Ramy A Arnaout, and Eva C Guinan (2013) “Prize-based contests can provide solutions to computational biology problems,” Nature biotechnology, Vol. 31, pp. 108–111. Landers, Richard N, Kristina N Bauer, and Rachel C Callan (2017) “Gamification of task performance with leaderboards: A goal setting experiment,” Computers in Human Behavior, Vol. 71, pp. 508–515. Landers, Richard N and Amy K Landers (2014) “An empirical test of the theory of gamified learning: The effect of leaderboards on time-on-task and academic performance,” Simulation & Gaming, Vol. 45, pp. 769–785. Lazear, Edward P and Sherwin Rosen (1979) “Rank-order tournaments as optimum labor contracts.” Lerner, Josh and Jean Tirole (2002) “Some simple economics of open source,” The journal of industrial economics, Vol. 50, pp. 197–234. Levin, Dan and James L Smith (1994) “Equilibrium in auctions with entry,” The American Economic Review, pp. 585–599. Li, Tong, Isabelle Perrigne, and Quang Vuong (2002) “Structural estimation of the affiliated private value auction model,” RAND Journal of Economics, pp. 171–193. Megidish, Reut and Aner Sela (2013) “Allocation of Prizes in Contests with Participation Constraints,” Journal of Economics & Management Strategy, Vol. 22, pp. 713–727. Moldovanu, Benny and Aner Sela (2001) “The optimal allocation of prizes in contests,” American Economic Review, pp. 542–558. (2006) “Contest architecture,” Journal of Economic Theory, Vol. 126, pp. 70– 96. 43

Moldovanu, Benny, Aner Sela, and Xianwen Shi (2007) “Contests for status,” Journal of Political Economy, Vol. 115, pp. 338–363. Olszewski, Wojciech and Ron Siegel (2015) “Effort-Maximizing Contests.” Sisak, Dana (2009) “Multiple-prize contests–the optimal allocation of prizes,” Journal of Economic Surveys, Vol. 23, pp. 82–114. Strack, Philipp (2016) “Risk-Taking in Contests: The Impact of Fund-Manager Compensation on Investor Welfare.” Takahashi, Yuya (2015) “Estimating a war of attrition: The case of the U.S. movie theater industry,” The American Economic Review, Vol. 105, pp. 2204–2241. Taylor, Curtis R (1995) “Digging for golden carrots: an analysis of research tournaments,” The American Economic Review, pp. 872–890. Terwiesch, Christian and Yi Xu (2008) “Innovation contests, open innovation, and multiagent problem solving,” Management science, Vol. 54, pp. 1529–1543. Xiao, Jun (2016) “Asymmetric all-pay contests with heterogeneous prizes,” Journal of Economic Theory, Vol. 163, pp. 178–221. Zivin, Joshua S Graff and Elizabeth Lyons (2018) “Can Innovators be Created? Experimental Evidence from an Innovation Contest,”Technical report, National Bureau of Economic Research.

44

Online Appendix: Not For Publication Dynamic Tournament Design: An Application to Prediction Contests Jorge Lemus and Guillermo Marshall

i

A

Additional Tables and Figures Contest

Name of the

Number

Competition

Total

1

Predict Grant Applications

5,000

2

RTA Freeway Travel Time Prediction

10,000

3

Deloitte/FIDE Chess Rating Challenge

4

Heritage Health Prize

5

Number of

Teams

Start Date

Deadline

2,800

204

12/13/2010

02/20/2011

2,958

348

11/23/2010

02/13/2011

10,000

1,428

167

02/07/2011

05/04/2011

500,000

23,421

1,221

04/04/2011

04/04/2013

Wikipedia’s Participation Challenge

10,000

995

88

06/28/2011

09/20/2011

6

Allstate Claim Prediction Challenge

10,000

1,278

102

07/13/2011

10/12/2011

7

dunnhumby’s Shopper Challenge

10,000

1,872

277

07/29/2011

09/30/2011

8

Give Me Some Credit

5,000

7,658

920

09/19/2011

12/15/2011 01/05/2012

Reward Submissions

9

Don’t Get Kicked!

10,000

7,167

570

09/30/2011

10

Algorithmic Trading Challenge

10,000

1,169

95

11/11/2011

01/08/2012

11

What Do You Know?

5,000

1,616

228

11/18/2011

02/29/2012

12

Photo Quality Prediction

13

Benchmark Bond Trade Price Challenge

14 15

5,000

1,315

194

10/29/2011

11/20/2011

17,500

2,487

248

01/27/2012

04/30/2012

KDD Cup 2012, Track 1

8,000

13,076

657

02/20/2012

06/01/2012

KDD Cup 2012, Track 2

8,000

5,276

163

02/20/2012

06/01/2012

16

Predicting a Biological Response

20,000

7,668

647

03/16/2012

06/15/2012

17

Online Product Sales

22,500

3,532

346

05/04/2012

07/03/2012

18

EMI Music Data Science Hackathon - July 21st - 24 hours

10,000

1,282

132

07/21/2012

07/22/2012

19

Belkin Energy Disaggregation Competition

25,000

1,399

160

07/02/2013

10/30/2013

20

Merck Molecular Activity Challenge

40,000

2,979

236

08/16/2012

10/16/2012

21

U.S. Census Return Rate Challenge

25,000

2,385

231

08/31/2012

11/11/2012

22

Amazon.com - Employee Access Challenge

5,000

16,872

1,687

05/29/2013

07/31/2013

23

The Marinexplore and Cornell University Whale Detection Challenge

10,000

3,282

244

02/08/2013

04/08/2013

24

See Click Predict Fix - Hackathon

1,000

1,001

79

09/28/2013

09/29/2013

25

KDD Cup 2013 - Author Disambiguation Challenge (Track 2)

7,500

2,216

235

04/19/2013

06/12/2013

26

Influencers in Social Networks

2,350

2,004

129

04/13/2013

04/14/2013

27

Personalize Expedia Hotel Searches - ICDM 2013

25,000

3,409

331

09/03/2013

11/04/2013

28

StumbleUpon Evergreen Classification Challenge

5,000

7,123

593

08/16/2013

10/31/2013

29

Personalized Web Search Challenge

9,000

3,021

177

10/11/2013

01/10/2014

30

See Click Predict Fix

4,000

5,314

517

09/29/2013

11/27/2013

31

Allstate Purchase Prediction Challenge

50,000

24,526

1,568

02/18/2014

05/19/2014

32

Higgs Boson Machine Learning Challenge

13,000

35,772

1,785

05/12/2014

09/15/2014

33

Acquire Valued Shoppers Challenge

30,000

25,138

952

04/10/2014

07/14/2014

34

The Hunt for Prohibited Content

25,000

4,992

285

06/24/2014

08/31/2014

35

Liberty Mutual Group - Fire Peril Loss Cost

25,000

14,751

634

07/08/2014

09/02/2014

36

Tradeshift Text Classification

5,000

4,632

296

10/02/2014

11/10/2014

37

Driver Telematics Analysis

30,000

36,065

1,528

12/15/2014

03/16/2015

38

Diabetic Retinopathy Detection

100,000

7,002

661

02/17/2015

07/27/2015

39

Click-Through Rate Prediction

15,000

27,202

1,417

11/18/2014

02/09/2015

40

Otto Group Product Classification Challenge

10,000

34,300

2,734

03/17/2015

05/18/2015

41

Crowdflower Search Results Relevance

20,000

23,237

1,326

05/11/2015

07/06/2015

42

Avito Context Ad Clicks

20,000

5,317

360

06/02/2015

07/28/2015

43

ICDM 2015: Drawbridge Cross-Device Connections

10,000

2,355

340

06/01/2015

08/24/2015

44

Caterpillar Tube Pricing

30,000

23,834

1,187

06/29/2015

08/31/2015

45

Liberty Mutual Group: Property Inspection Prediction

25,000

40,594

2,054

07/06/2015

08/28/2015

46

Coupon Purchase Prediction

50,000

18,477

1,076

07/16/2015

09/30/2015

47

Springleaf Marketing Response

100,000

34,861

1,914

08/14/2015

10/19/2015

48

Truly Native?

10,000

3,222

274

08/06/2015

10/14/2015

49

Rossmann Store Sales

35,000

58,915

2,861

09/30/2015

12/14/2015

50

Homesite Quote Conversion

20,000

28,571

1,334

11/09/2015

02/08/2016

51

Prudential Life Insurance Assessment

30,000

42,336

2,452

11/23/2015

02/15/2016

52

BNP Paribas Cardif Claims Management

30,000

48,442

2,702

02/03/2016

04/18/2016

53

Home Depot Product Search Relevance

40,000

32,937

1,935

01/18/2016

04/25/2016

54

Santander Customer Satisfaction

60,000

93,031

5,117

03/02/2016

05/02/2016

55

Expedia Hotel Recommendations

25,000

22,709

1,974

04/15/2016

06/10/2016

56

Avito Duplicate Ads Detection

20,000

8,134

548

05/06/2016

07/11/2016

57

Draper Satellite Image Chronology

75,000

2,734

401

04/29/2016

06/27/2016

Table A.1: Summary of the Competitions in the Data (Full List) Note: The table only considers submissions that received a score. The total reward is measured in US dollars at the moment of the competition.

ii

Public Ranking

Cumulative

of Winner

Frequency

Probability

Probability

1

29

50.88

50.88

2

13

22.81

73.68

3

3

5.26

78.95

4

6

8.77

87.72

5

1

1.75

89.47

6

2

3.51

92.98

11

3

5.26

98.25

54

1

1.75

100.00

Table A.2: Public Leaderboard Ranking of Competition Winners Note: An observation is a contest.

Number

Overall

Competitive

of Competitions

Frequency

Probability

Frequency

Probability

1

22,034

71.26

3,556

57.78

2

4,350

14.08

1,024

16.64

3

1,835

5.70

510

8.29

4

908

2.82

275

4.47

5 or more

1,976

6.14

789

12.82

Table A.3: Number of Competitions by User Note: An observation is a team member.

iii

unimelb RTA

µ

SE

λ

SE

σ

SE

ˆ log L(δ)/N

N

2.2518

0.2883

57.818

1.5502

0.0095

0.0018

-2.7902

1391 1502

1.58

0.1621

55.4337

1.4303

0.0018

0.0004

-2.7296

ChessRatings2

2.7814

0.6219

51.1858

2.2006

0.0014

0.0006

-2.753

541

hhp

2.585

0.1701

191.9182

1.6088

0.0013

0.0001

-4.123

14231

wikichallenge

2.3922

0.5349

63.9345

2.7114

0.0014

0.0006

-2.9593

556

ClaimPredictionChallenge

1.9972

0.4844

75.7161

3.1278

0.0008

0.0006

-3.1885

586

dunnhumbychallenge

2.1786

0.3112

43.0186

1.4799

0.0041

0.001

-2.4817

845

GiveMeSomeCredit

1.7886

0.115

62.0622

0.9749

0.0177

0.0015

-2.7498

4053

DontGetKicked

1.9214

0.1912

88.2444

1.5446

0.0022

0.0004

-3.2978

3264

AlgorithmicTradingChallenge

3.9965

1.0319

61.9513

2.5994

0.0015

0.0007

-2.9833

568

WhatDoYouKnow

2.5017

0.4906

59.9231

2.2649

0.003

0.0009

-2.8836

700

PhotoQualityPrediction

2.2782

0.3648

26.8969

1.1407

0.0029

0.0015

-2.0296

556

benchmark-bond-trade-price-challenge

3.062

0.5103

61.0601

1.7959

0.0039

0.0011

-2.9404

1156

kddcup2012-track1

2.9583

0.2392

100.2066

1.1482

0.0019

0.0002

-3.4658

7617

kddcup2012-track2

2.3511

0.3865

131.4569

2.5196

0.0003

0.0002

-3.8078

2722

bioresponse

2.0278

0.1883

79.7233

1.2755

0.002

0.0003

-3.2068

3907

online-sales

2.2814

0.2727

46.4402

1.1694

0.0021

0.0006

-2.6393

1577

MusicHackathon

3.8714

0.5974

18.0368

0.7602

0.001

0.0007

-1.6941

563

belkin-energy-disaggregation-competition

2.1421

0.5941

127.443

4.7397

0.0005

0.0003

-3.7337

723

MerckActivity

2.4988

0.34

53.0883

1.3046

0.001

0.0004

-2.8338

1656

us-census-challenge

2.1093

0.2723

54.9508

1.5675

0.0108

0.0014

-2.6729

1229

amazon-employee-access-challenge

2.5534

0.1252

49.1933

0.5004

0.0091

0.0007

-2.6679

9663

whale-detection-challenge

2.143

0.2916

56.0026

1.4322

0.0004

0.0002

-2.8826

1529

the-seeclickfix-311-challenge

2.5648

0.5884

40.8782

1.9122

0.0019

0.0016

-2.5319

457

kdd-cup-2013-author-disambiguation

2.6054

0.5318

70.4564

2.2082

0.0005

0.0004

-3.1417

1018

predict-who-is-more-influential-in-a-social-network

2.8405

0.4608

45.0543

1.4734

0.0022

0.0006

-2.5965

935

expedia-personalized-sort

2.2804

0.2944

46.9099

1.2169

0.0016

0.0005

-2.6658

1486

stumbleupon

2.6773

0.2078

52.0039

0.7794

0.0024

0.0003

-2.7523

4452

yandex-personalized-web-search-challenge

1.7275

0.2766

120.5711

2.92

0.0006

0.0002

-3.6677

1705

see-click-predict-fix

1.8024

0.1889

71.1992

1.3575

0.0018

0.0006

-3.1106

2751

allstate-purchase-prediction-challenge

1.9824

0.1274

125.6546

1.1708

0.0045

0.0005

-3.6895

11519

higgs-boson

2.3698

0.114

122.1003

0.8177

0.0013

0.0001

-3.6756

22298 16500

acquire-valued-shoppers-challenge

2.0772

0.1316

165.2723

1.2866

0.0011

0.0001

-3.9934

avito-prohibited-content

2.5922

0.3953

106.7729

2.0667

0.0005

0.0002

-3.5868

2669

liberty-mutual-fire-peril

3.3344

0.3194

122.6742

1.3403

0.0008

0.0002

-3.7291

8377

tradeshift-text-classification

3.0683

0.3897

63.2299

1.2398

0.0005

0.0002

-3.0464

2601

axa-driver-telematics-analysis

2.4383

0.1405

127.6736

0.8938

0.0009

0.0001

-3.7595

20405

diabetic-retinopathy-detection

1.7712

0.2073

107.1837

2.0274

0.0025

0.0005

-3.5187

2795

avazu-ctr-prediction

3.2027

0.1979

109.1072

0.8988

0.0008

0.0001

-3.5696

14735

otto-group-product-classification-challenge

3.2732

0.1397

55.4236

0.3993

0.0011

0.0001

-2.8812

19269

crowdflower-search-relevance

2.1332

0.1093

83.3317

0.6605

0.0021

0.0002

-3.2883

15919

avito-context-ad-clicks

1.8952

0.2188

81.4574

1.5926

0.0006

0.0002

-3.2495

2616

icdm-2015-drawbridge-cross-device-connections

2.0972

0.3448

57.2772

1.7643

0.0007

0.0004

-2.8946

1054

caterpillar-tube-pricing

3.2674

0.1757

68.2579

0.5565

0.001

0.0001

-3.0972

15047

liberty-mutual-group-property-inspection-prediction

3.1055

0.1152

67.0227

0.4112

0.0062

0.0004

-3.0375

26573

coupon-purchase-prediction

2.0586

0.1093

73.1853

0.6539

0.0009

0.0001

-3.146

12526

springleaf-marketing-response

3.0405

0.1689

97.3279

0.7153

0.0008

0.0001

-3.4726

18513

dato-native

4.4816

0.7802

50.9714

1.4203

0.0004

0.0004

-2.8317

1288

rossmann-store-sales

2.8735

0.0926

89.6868

0.4478

0.0019

0.0001

-3.3334

40105

homesite-quote-conversion

2.4958

0.1548

128.5381

0.9678

0.0004

0.0001

-3.7599

17638

prudential-life-insurance-assessment

2.1598

0.0799

78.0817

0.4707

0.004

0.0003

-3.2007

27512

bnp-paribas-cardif-claims-management

2.6826

0.0903

63.8265

0.3564

0.0005

0.0001

-3.0155

32069

home-depot-product-search-relevance

2.4153

0.1501

124.6592

0.9777

0.0005

0.0001

-3.724

16258

santander-customer-satisfaction

2.3098

0.0563

75.1579

0.3048

0.0055

0.0002

-3.1614

60816

expedia-hotel-recommendations

2.2792

0.086

43.2208

0.3422

0.0009

0.0001

-2.564

15948

avito-duplicate-ads-detection

3.565

0.4807

109.0688

1.6756

0.0004

0.0002

-3.6251

4237

draper-satellite-image-chronology

3.0409

0.4177

53.1817

1.4239

0.0019

0.0004

-2.7768

1395

Table A.4: Maximum Likelihood Estimates of the Cost and Arrival Distributions. Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’

iv

Type 1

Type 2

θ1mean

θ1st.dev

κ1

θ2mean

θ2st.dev

κ2

ˆ log L(δ)/N

N

unimelb

0.9394

0.0347

0.629

0.4085

0.8215

0.371

0.556

1391

RTA

0.7156

0.3742 0.8024 0.5208

0.092

0.1976

-0.3613

1502

ChessRatings2

0.6239

0.108

1.0804

0.059

0.2

0.4568

541

hhp

0.7286

0.0449

0.2079 0.6986

0.2455

0.7921

0.2902

14231

wikichallenge

0.7942

0.0684 0.3946

ClaimPredictionChallenge

0.4819

0.5208

dunnhumbychallenge

0.8162

0.1949

0.55

1.1991

0.14

0.45

-0.0137

845

0.52

0.0118

0.5788

0.5031

0.1241

0.4212

2.1073

4053

DontGetKicked

0.7161

0.0223

0.0433 0.7871 0.1468 0.9567

0.5451

3264

AlgorithmicTradingChallenge

0.7443

0.1452

0.8667 0.9965 0.0554

0.1333

0.3943

568

WhatDoYouKnow

0.714

0.1427 0.8079 0.9667 0.0483 0.1921

0.4415

700

PhotoQualityPrediction

0.6217

0.0329 0.4996 0.5347 0.0599 0.5004

benchmark-bond-trade-price-challenge

0.922

0.2442 0.9966

kddcup2012-track1

0.7519

kddcup2012-track2

0.8841

bioresponse online-sales

GiveMeSomeCredit

0.8

0.1654 0.6054

0.6672

556

0.4878 1.8202 0.3327 0.5122

0.639

-1.1033

586

1.4271

556

1.0822

0.0238

0.0034

-0.0021

1156

0.1488 0.6644

0.649

0.686

0.3356

-0.3581

7617

0.1323

0.6051 0.2358

0.393

0.187

2722

0.8702

0.1237 0.5218 0.5964 0.1821 0.4782

0.2207

3907

0.9105

0.096

0.5648 0.6489 0.1883 0.4352

0.3832

1577

MusicHackathon

0.955

0.1167 0.7317 0.5854 0.2242 0.2683

0.1575

563

belkin-energy-disaggregation-competition

0.342

0.2098

0.6154

-1.0782

723

MerckActivity

0.7758

0.1061 0.4942 0.5442 0.1847 0.5058

0.3299

1656

us-census-challenge

0.9404

0.4693 0.9663

0.8504

0.0927

0.0337

-0.6457

1229

amazon-employee-access-challenge

0.7641

0.0278 0.4149

0.7207

0.2298

0.5851

0.8953

9663

whale-detection-challenge

0.7301

0.0421

0.533

0.6181

0.068

0.467

1.1603

1529

the-seeclickfix-311-challenge

0.8266

0.1922

0.898

0.9736 0.0078

0.102

0.5234

457

kdd-cup-2013-author-disambiguation

1.5685

0.1976 0.3426

0.7627 0.4839 0.6574

-0.6579

1018

0.607

0.3846 1.0475

1.3003

predict-who-is-more-influential-in-a-social-network

0.7024

0.032

0.3158 0.6342 0.1171 0.6842

0.8856

935

expedia-personalized-sort

0.9952

0.2893 0.9124 0.7866 0.0437 0.0876

-0.1245

1486

stumbleupon

0.6631

0.0491 0.3801 0.6356 0.1601 0.6199

0.7829

4452

yandex-personalized-web-search-challenge

0.7436

0.1186

0.5998 0.9164

-0.8077

1705

see-click-predict-fix

0.7686

0.0342 0.5132 0.6697 0.3057 0.4868

0.9367

2751

allstate-purchase-prediction-challenge

0.5064

0.0054

0.566

2.7542

11519

higgs-boson

0.6729

0.0261 0.1342

0.6621 0.1811 0.8658

0.7565

22298

0.0836 0.434

0.963 0.5029

0.0608

acquire-valued-shoppers-challenge

0.937

0.0719 0.1384

0.8149

0.3858

0.8616

-0.2913

16500

avito-prohibited-content

0.4962

0.0079 0.3469

0.4646

0.0341

0.6531

2.4984

2669

liberty-mutual-fire-peril

0.832

0.1419 0.6025

0.5377

0.1638

0.3975

0.2251

8377

tradeshift-text-classification

0.8762

0.1204

0.4814 0.6032 0.1887 0.5186

0.2155

2601

0.5144

-0.093

20405

axa-driver-telematics-analysis

1.1396

0.1601

diabetic-retinopathy-detection

1.0707

0.3831 0.8415 1.7968 0.0898

0.7538

0.2585 0.4856

avazu-ctr-prediction

0.5801

0.121

0.81

otto-group-product-classification-challenge

0.8747

0.1304

crowdflower-search-relevance

0.7604

0.1151

0.1585

-0.5556

2795

0.19

0.2011

14735

0.8578

0.2151

0.5775

0.662

0.2334 0.4225

0.538

0.5133 0.2216

0.2187

19269

0.462

0.2767

15919 2616

avito-context-ad-clicks

0.7291

0.1534

0.1906 1.0586 0.5576 0.8094

-0.7497

icdm-2015-drawbridge-cross-device-connections

0.9903

0.2175

0.7308 1.4471

0.2692

0.0418

1054

caterpillar-tube-pricing

0.6515

0.0455

0.4406 0.5762 0.1804 0.5594

0.8269

15047

0.6398

0.063

liberty-mutual-group-property-inspection-prediction

0.8021

0.0346 0.3201

0.6799

0.4689

26573

coupon-purchase-prediction

0.7612

0.0026 0.0811 0.8071 0.6182 0.9189

0.2638

-0.538

12526 18513

springleaf-marketing-response

0.8698

0.041

0.1599 0.7947

0.218

0.8401

0.3295

dato-native

0.6013

0.074

0.6684

0.0118 0.3316

1.4778

1288

rossmann-store-sales

0.7237

0.0766

0.4658 0.5519

0.2002

0.5342

0.5379

40105

homesite-quote-conversion

0.5532

0.0269 0.2442

0.6472

0.2076

0.7558

0.5174

17638

prudential-life-insurance-assessment

0.7489

0.0409 0.4956

0.6576

0.2136

0.5044

0.8763

27512

bnp-paribas-cardif-claims-management

0.5618

0.0905

0.686

0.3672

0.671

-0.1163

32069

0.329

0.6838

home-depot-product-search-relevance

1.1326

0.4183 0.5038 0.5873 0.1628 0.4962

-0.5663

16258

santander-customer-satisfaction

0.4623

0.0083 0.4581 0.4469 0.1328

0.5419

2.4771

60816

expedia-hotel-recommendations

0.6841

0.0063

0.5284 0.5805

0.377

0.4716

1.7034

15948

avito-duplicate-ads-detection

0.975

0.063

0.4236

0.2305 0.5764

0.636

4237

-1.4304

1395

draper-satellite-image-chronology

-0.0352

0.1573 0.1485

0.8644 1.07

1.1763

0.8515

Table A.5: EM Algorithm Estimates for the Type-specific Distribution of Scores, qθ . Note: The model is estimated separately each contest. θist.dev and θist.dev are the parameters in forst.dev v s−θi ˆ type i’s distribution of scores Qi (s) = Φ θst.dev . κi is the fraction of players of type i. log L(δ)/N i

is the value of the log-likelihood function evaluated at the EM estimates. Standard errors are available.

α

SE

SE

N

0.0561

2800

1.0021

0.8608

3129

0.1823 1.0176

0.1908

1563

1.0025

0.0737

25316

0.3367

0.9998

0.3497

1020

0.0787

0.9437

0.1388

1278

β

unimelb

-0.0115

0.0413 1.0233

RTA

-0.0023

0.8605

ChessRatings2

-0.0177

hhp

0.0022

0.0717

wikichallenge

0.0001

ClaimPredictionChallenge

0.0244

dunnhumbychallenge

0.0047

0.073

1.0153

0.1041

1872

GiveMeSomeCredit

0.0016

0.0609

0.9989

0.066

7730

DontGetKicked

0.0019

0.0611

1.0013

0.0716

7261

AlgorithmicTradingChallenge

0.0013

0.3561

0.9987

0.3581

1406

WhatDoYouKnow

0.006

0.1801

0.994

0.1893

1747

PhotoQualityPrediction

-0.0156

0.2402

1.0174

0.254

1356

kddcup2012-track1

-0.0149

0.0259

0.9655

0.0379

13076

kddcup2012-track2

-0.0057

0.0444

0.9968

0.0587

5276

bioresponse

0.0374

0.127

0.9635

0.1293

8837

online-sales

-0.0321

0.4462

1.0323

0.4524

3755

MusicHackathon

0.0003

0.8621

0.9997

0.8634

1319 1526

belkin-energy-disaggregation-competition

0.0128

0.2171 1.0157

0.2346

MerckActivity

-0.0074

0.0731

1.0083

0.0908

2979

us-census-challenge

0.0082

0.5591 0.9918

0.5601

2666

amazon-employee-access-challenge

0.0263

0.0306 0.9748

0.0368

16872

whale-detection-challenge

0.0039

0.0806

0.9961

0.0928

3293

the-seeclickfix-311-challenge

0.0121

0.3505

0.9877

0.3701

1051

kdd-cup-2013-author-disambiguation

-0.0003

0.2136 0.9996

0.2228

2304

predict-who-is-more-influential-in-a-social-network

0.0479

0.1612

0.9494

0.1741

2105

expedia-personalized-sort

0.0496

0.0908

0.9407

0.1078

3502

stumbleupon

0.0297

0.0736

0.9875

0.0815

7495

yandex-personalized-web-search-challenge

-0.0002

0.1021

1.0004

0.1074

3570

see-click-predict-fix

-0.0023

0.9291

1.0022

0.9321

5570

allstate-purchase-prediction-challenge

0.0005

0.0197

1.0043

0.0221

24526

higgs-boson

-0.0002

0.0183 1.0168

0.0224

35772

acquire-valued-shoppers-challenge

-0.0139

0.033

1.0105

0.043

25195

avito-prohibited-content

-0.0003

0.0521

0.9999

0.0574

4992

liberty-mutual-fire-peril

0.044

0.0449 0.9083

0.054

14812

tradeshift-text-classification

0.0004

0.4734

0.9996

0.4746

5648

axa-driver-telematics-analysis

-0.0019

0.0179 1.0019

0.0253

36065

diabetic-retinopathy-detection

-0.0082

0.0285

1.0106

0.0489

7002

avazu-ctr-prediction

0.0006

0.0683

0.9994

0.0688 31015

otto-group-product-classification-challenge

0.0002

0.0309

0.9997

0.0321

43525

crowdflower-search-relevance

0.0174

0.0362

0.986

0.0426

23244

avito-context-ad-clicks

-0.0001

0.2656

1.0001

0.2665

5949

icdm-2015-drawbridge-cross-device-connections

-0.0007

0.0326

1.0014

0.0579

2355

caterpillar-tube-pricing

-0.014

0.3919

1.014

0.393

26360

liberty-mutual-group-property-inspection-prediction

0.0061

0.0383 0.9961

0.0407

45875

coupon-purchase-prediction

0.033

0.0134

0.9022

0.0279

18477

springleaf-marketing-response

0.0092

0.052

0.9894

0.0553

39444

dato-native

0.008

0.0721

0.9931

0.0823

3223

homesite-quote-conversion

0.0028

0.0401

0.997

0.0417

36368 45490

prudential-life-insurance-assessment

0.0092

0.042

0.9933

0.0447

bnp-paribas-cardif-claims-management

0.0036

0.073

0.9964

0.0737

54516

home-depot-product-search-relevance

-0.0002

0.044

0.9999

0.047

35619

santander-customer-satisfaction

0.028

0.0267

0.972

0.0282

93559

expedia-hotel-recommendations

0.0006

0.019

0.9983

0.0269

22709

avito-duplicate-ads-detection

0.0016

0.0632

0.9985

0.0754

8153

draper-satellite-image-chronology

0.1325

0.0643

0.8211

0.1141

2734

Table A.6: Maximum Likelihood Estimates of the Distribution of Private Scores Conditional on Public Scores. Note: The Conditional Distribution is Assumed to be Given by pprivate = α + βppublic + , with Distributed According to a Double Exponential Distribution. The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’

vi

Contest

(1)

(2)

(3)

higgs-boson

hhp

avazu-ctr-prediction

Number of submissions per time interval (with leaderboard) -12.3462∗∗∗

-11.6532∗∗∗

-7.4416∗∗∗

(0.9047)

(0.8924)

(0.5777)

-30.9281∗∗∗

-24.4146∗∗∗

-16.4589∗∗∗

(1.2164)

(1.1899)

(0.7692)

Observations

20,000

20,000

20,000

R2

0.911

0.818

0.860

p-value F-test

0.0000

0.0000

0.0000

max(Deviation of max score relative to expected max score,0) min(Deviation of max score relative to expected max score,0)

Table A.7: The effect of the leaderboard on participation over time (selected contests) Note: Robust standard errors in parentheses.

∗

p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01. p-value F-test reports

the p-value for a test of equality of coefficients. Data was simulated using the model estimates. Each contest was simulated 1,000 times. An observation is a contest simulation–time interval combination, where each time interval captures 5 percent of the overall contest length. The expected max score was obtained by averaging the max score across all simulations for each time interval. The model estimates suggest that a leaderboard increases the number of submissions (relative to the case without leaderboard) in contests higgs-boson and hhp, while it would decrease the number of submissions in the contest avazu-ctr-prediction.

vii

Fraction teams 91 percent 93 percent 95 percent 97 percent 99 percent 100 percent

hhp

higgs-boson

amazon-employee-access

12400.04**

18582.17**

3612.16**

(12354.45,12445.63)

(18534.46,18629.89)

(3599.28,3625.05)

12499.77**

18754.15**

3641.93**

(12454.43,12545.11)

(18706.82,18801.49)

(3629.23,3654.64)

12605.75**

18934.70**

3670.92

(12560.56,12650.93)

(18888.13,18981.27)

(3658.01,3683.83)

12695.48**

19102.17**

3700.74

(12650.72,12740.25)

(19056.00,19148.35)

(3687.98,3713.49)

12790.69

19271.08

3720.23

(12746.16,12835.22)

(19225.18,19316.98)

(3707.60,3732.86)

12780.99

19294.11

3694.61

(12737.08,12824.90)

(19248.08,19340.14)

(3681.27,3707.95)

Table A.8: Number of submissions with limited participation, by fraction of teams Note: The contest outcomes are contest-level averages computed using 2, 000 simulations for each contest using our model estimates. 95-percent confidence interval in parenthesis. ** indicates whether the difference between the estimate with x percent of the number of teams is statistically different than the estimate with 100 percent of the teams (with a significance level of 5 percent).

viii

1.5 log mu 1 .5

3

3.5

4 4.5 log lambda

5

5.5

Figure A.1: Correlation between µ (arrival rate of new teams) and λ (arrival rate of new submissions) Note: The estimates of λ and µ are reported in Table A.4. The coefficient of correlation between λ and µ is -0.18.

ix

Density 0

0

.01

.2

Density .02

.4

.03

.6

.04

Panel A: No leaderboard

20

-8

-6 -4 -2 Pct. point change in max score

0

0

0

.01

.01

Density .02

Density .02 .03

.03

.04

.05

-60 -40 -20 0 Pct. point change in total submissions

.04

-80

-80 -60 -40 -20 0 20 Pct. point change in av. submissions per team

-80 -60 -40 -20 0 Pct. point change in av. submissions per high-type team

0

0

.05

1

Density 2

Density .1 .15

.2

3

.25

Panel B: No noise

10 20 Pct. point change in total submissions

30

0

.5 Pct. point change in max score

1

0

0

.05

.05

Density .1

Density .1 .15

.15

.2

.2

.25

0

0 10 20 30 Pct. point change in av. submissions per team

-10 0 10 20 30 Pct. point change in av. submissions per high-type team

Figure A.2: Equilibrium outcome comparison Note: An observation is a contest. The contest outcomes are contest-level averages computed using 2, 000 simulations for each contest using our model estimates.

x

Density 2 0

0

1

.1

Density .2

3

.3

4

.4

Panel C: Limited participation

-1

-.5 0 Pct. point change in max score

.5

.3 Density .2 .1 0

0

.1

Density .2

.3

.4

-10 -8 -6 -4 -2 Pct. point change in total submissions

.4

-12

-12 -10 -8 -6 -4 -2 Pct. point change in av. submissions per team

-20 -15 -10 -5 0 5 Pct. point change in av. submissions per high-type team

0

0

.5

5

Density 1

Density 10

1.5

15

2

Panel D: Single prize

1 2 Pct. point change in total submissions

3

-.1

0 .1 Pct. point change in max score

.2

Density 0

0

.5

.5

Density 1

1

1.5

2

1.5

0

0 1 2 3 Pct. point change in av. submissions per team

0 1 2 3 4 5 Pct. point change in av. submissions per high-type team

Figure A.2: Equilibrium outcome comparison Note: An observation is a contest. The contest outcomes are contest-level averages computed using 2, 000 simulations for each contest using our model estimates.

xi

B

Estimation Details

In this section, we provide an overview of the estimation procedure used for a subset of the primitives of the model.

B.1

Distribution of entry times – ML

We assume that the time at which a player enters a competition follows an exponential distribution with a contest-specific parameter, µ. Given the vector of entry times for the set of players I in a given contest, {ti }i∈I , we estimate µ by using the maximum likelihood estimator:

µ ˆ = arg max log L(µ) = arg max

1 log(µ) − µti = , t¯ i∈I

X

P where t¯ = i∈I ti /|I|.

B.2

Distribution of time between submissions – ML

We assume that the time between submissions follows an exponential distribution with a contest-specific parameter, λ. Given the vector of times between submissions for the set of players I in a given contest, {ti,m }m∈Mi ,i∈I , we estimate λ by using the maximum likelihood estimator: ˆ = arg max log L(λ) = arg max λ P P where t¯ = i∈I m∈Mi ti,m /|{ti,m }m∈Mi ,i∈I |.

xii

1 log(λ) − λti,m = , t¯ i∈I m∈Mi

X X

B.3

Type-specific distribution of scores – EM Algorithm

The probability that player i is of type θj ∈ Θ, given player i’s observed scores si = {si1 , . . . , siMi }, is (by Bayes’ identity) QMi

m=1 f (sim |θj ) , QMi k∈Θ κk m=1 f (sim |θk )

h(θj |si ) = P

κj

where f (·|θ j ) is the density of scores of players of type θj —which depends on a vector of parameters, θ j —and κj is the fraction of players of type θj . We assume f (·|θ j ) is the density function of a normal distribution with mean θjmean and standard deviation θjst.dev , where θ = {(θjmean , θjst.dev )}j=1,...,k and κ = {κj }j=1,...,k . The expectation for the EM algorithm is given by E(θ, κ|θ t , κt ) =

Mi X X X

h(θk |si , θ t , κt ) log(κk f (sim |θk )).

i θk ∈Θ m

Given (θ t , κt ), E(θ, κ|θ t , κt ) has a unique maximum given by (θ t+1 , κt+1 ). Given our assumptions, one can obtain an analytic solution for (θ t+1 , κt+1 ). The estimates of the model are obtained by iterating over the expectation and maximization steps until convergence of the estimates: ρ((θ t , κt ), (θ t+1 , κt+1 )) < ε, where ρ(·) is the Euclidean metric. We use a tolerance level of 1E-8.27

B.4

Conditional distribution of the private scores – ML

We assume that the relationship between private and public scores is given by pprivate = α + βppublic + ε, where ε is distributed according to a standard double exponential distribution, and α and β are contest-specific parameters. Given the pairs of scores for all M submissions in a contest, {(ppublic , pprivate )}m∈M , we estimate (α, β) by using the m m maximum likelihood estimator: ˆ = arg max log L(α, β) = arg max (ˆ α, β)

X

−εm + exp{−εm },

m∈M

where εm = pprivate − α − βppublic . m m 27

Alternatively, we can iterate until the log-likelihood converges.

xiii

C

Description of the Experiment

Description of the Competition A large restaurant chain owns restaurants located along major highways. The average revenue of a restaurant located at distance x from the highway is R(x). For simplicity, the variable distance to the highway is normalized to be in the interval [1,2]. The function R(x) is unknown. The goal of this competition is to predict the value of R(x) for several values of distances to the highway. Currently, the restaurant chain is located at 40 different locations. You will have access to {(xi , R(xi ))}30 i=1 , i.e., the distance to the highway and average revenue for 30 of these restaurants. Using these data, you must submit a prediction of average revenue for the remaining 10 restaurants, using their distances to the highway. You will find the necessary datasets in the Data tab. You can send up to 10 different submission each day until the end of the competition. The deadline of the competition is Sunday April 15th at 23:59:59. Evaluation We will compare the actual revenue and the revenue predictions for each value (xj )40 j=31 . The score will be calculated according to the Root Mean Square Deviation:

RMSD =

v uP u 40 (R(x 2 t j=31 ˆ j ) − R(xj ))

10

,

which is a measure of the distance between your predictions and the actual values R(x). The deadline of the competition is Sunday April 15th at 23:59:59. Note. Following the convention used throughout the paper, we multiplied the RSM D scores by minus one to make the winning score maximize private score in the competition. xiv

Description of the Data The goal of this competition is to predict the value of R(x) for a number of values of distance to the highway. The csv file “train” contains data on the distance to the highway and average revenue for 30 restaurants {(xi , R(xi ))}30 i=1 , You can use these data to create predictions of average revenue for the remaining 10 restaurants. For these 10 restaurants you only observe their distances to the highway in the csv file “test.” You can find an example of how your submission must look like in the csv file “sample_submission.” File descriptions:

• train.csv - the training set • test.csv - the test set • sample_submission.csv- an example of a submission file in the correct format Submission File: The submission file must be in csv format. For every distance to the highway of the 10 restaurants, your submission files should contain two columns: distance to the highway (x) and average revenue prediction (R). The file should contain a header and have the following format: x

R

1.047579

34.43375

1.926801

36.83077

etc. A correct submission must be a csv file with one row of headers and 10 rows of numerical data, as displayed above. To ensure that you are uploading your predictions in the correct format, we recommend that you upload your predictions by editing the sample submission file. There is a limit of 10 submissions per day. Figure A.3 shows a screenshot of the leaderboard in one of our student competitions hosted in Kaggle. xv

Figure A.3: Snapshot of the leaderboard in one of our competitions with leaderboard. Names are hidden for privacy reasons.

xvi

Dynamic Tournament Design: An Application to ...

May 14, 2018 - must include the outcome variable prediction for each observation in the ...... Chesbrough, Henry, Wim Vanhaverbeke, and Joel West (2006) ...

Download PDF

1MB Sizes 1 Downloads 191 Views

Report

Dynamic Tournament Design: An Application to ...

Recommend Documents