Dynamic Tournament Design: An Application to Prediction Contests∗ Jorge Lemus†
Guillermo Marshall‡
May 14, 2018
Abstract Online contests have become a prominent form of innovation procurement. How do elements of contest design shape players’ incentives throughout the competition? Does a real-time leaderboard improve outcomes? We provide two complementary approaches to answer these questions. First, we build a tractable dynamic model of competition and estimate it using data from the online platform Kaggle. We evaluate contest outcomes under counterfactual contest designs, which modify information disclosure, allocation of prizes, and participation restrictions. And second, we present experimental evidence from student competitions hosted on Kaggle. Our main finding is that a public leaderboard improves contest outcomes: it increases the total number of submissions and the score of the best submission.
Keywords: Dynamic contest, contest design, dynamic games, Kaggle, big data. JEL codes: C51, C57, C72, O31. ∗
We thank participants and discussants at the Conference on Internet Commerce and Innovation
(Northwestern), IIOC 2017, Rob Porter Conference (Northwestern), Second Triangle Microeconomics Conference (UNC), University of Georgia, Cornell (Dyson), and University of Technology Sydney for helpful suggestions. Approval from the University of Illinois Human Subjects Committee, IRB18644. † University of Illinois at Urbana-Champaign, Department of Economics;
[email protected] ‡ University of Illinois at Urbana-Champaign, Department of Economics;
[email protected]
1
1
Introduction
Online competitions have become a valuable resource for government agencies and private companies to procure innovation. For instance, U.S. government agencies have sponsored over 830 competitions that have awarded over $250 million in prizes for software, ideas, or designs through the website www.challenge.gov—e.g., DARPA sponsored a $500,000 competition to accurately predict cases of chikungunya virus.1 In the UK, the website www.datasciencechallenge.org was created to “drive innovation that will help to keep the UK safe and prosperous in the future.” Multiple platforms allow private companies to sponsor online competitions.2 Given that the design of online competitions varies across different platforms, several economic questions arise. How are players’ incentives shaped by the design of a competition? Does a real-time public leaderboad encourage or discourage participation? Is a winner-takes-all competition better than one that allocates multiple prizes? Our contribution is to empirically study how contest design affects players’ incentives. Although the literature on contest design has advanced our knowledge on static settings, the amount of research on dynamic contest design with heterogeneous players is still limited. We advance this literature by presenting results from two complementary approaches. First, we build and estimate a tractable structural model using data on online competitions. And second, we run a randomized control trial to provide an answer, independent of our modeling assumptions, to the question of how contest design impacts competition outcomes. Our setting is Kaggle,3 an online platform that hosts prediction contests, i.e., competitions where the winner is the player with the most accurate prediction of some random variable.4 We use Kaggle to create and host 44 student competitions for our randomized control trial; additionally, we use data from 57 large Kaggle competitions to estimate the structural parameters of our model. Participants in Kaggle competitions have access to a training and a test dataset. An observation in the training dataset includes both an outcome variable and covariates; 1
http://www.darpa.mil/news-events/2015-05-27 Examples include CrowdAnalytix, Tunedit, InnoCentive, Topcoder, HackerRank, and Kaggle. 3 https://www.kaggle.com/ 4 For instance, IEEE sponsored a $60,000 contest to diagnose schizophrenia; The National Data 2
Science Bowl sponsored a $175,000 contest to identify plankton species from multiple images.
2
whereas, the test dataset only includes covariates. A valid submission in a contest must include the outcome variable prediction for each observation in the test dataset. To avoid overfitting, Kaggle partitions the test dataset into two subsets and does not inform participants which observations correspond to each subset. The first subset is used to generate a public score that is posted in real-time on a public leaderboard. The second subset is used to generate a private score that is never made public during the contest—it is revealed only at the end of the competition. Prizes are awarded according to the private score ranking. Thus, the public score, which is highly correlated with the private score, provides a noisy signal about performance.5 In Kaggle competitions, players can submit multiple solutions and observe in realtime a public leaderboard that displays the public score of each submission throughout the contest.6 Modeling this class of dynamic contests pose some technical challenges. First, participants’ final standings are uncertain, because the public-score ranking only provides a noisy signal of their private-score ranking. Thus, players need to keep track of the complete public history to compute the benefit of an extra submission: a state space that keeps track of the complete public history is computationally intractable. Second, there is a large number of heterogeneous participants, sending thousands of submissions, in each competition. An analytic solution for the equilibrium of a dynamic model with heterogeneous players is cumbersome, and computationally expensive. Our descriptive evidence indicates that each player sends multiple submissions, and players are heterogeneous in their ability to produce high scores. To capture these features in our model, we assume that players work on at most one submission at a time, and that a player’s type determines the distribution from which scores are drawn. After entering the contest, a player draws a cost from a distribution, which represents the cost of making a new submission. The player then decides to make a new submission or to exit the competition. To make this decision, the player compares the expected payoff of a new submission minus its cost versus the payoff of finishing the competition with her current set of submissions. If the player decides to make a new submission, then she works on that submission (and only that submission) for a random amount 5
In our data, the correlation between public and private is 0.99 and 79 percent of the contest’s
winners finish in the top 3 of the public leaderboard. 6 Other websites (e.g., www.datasciencechallenge.org and www.drivendata.org) share these features.
3
of time. Immediately after the submission is completed, the submission is evaluated, and the public score of that submission is revealed in the public leaderboard.7 At this point, and after observing the public leaderboard, the player draws a new cost and again decides to make a new submission or to quit. In computing the benefit of a new submission, a player considers the chances of winning a prize at the end of the contest given the current public leaderboard, her type, the current scores, and the expected number of rival submissions that will arrive before the end of the contest—more rival submissions will lower the player’s chance of winning a prize. To deal with the problem of a computationally-unmanageable state space, we assume that players are small—i.e., a player’s belief of how many rival submissions will arrive in the future is unaffected by the action of sending a new submission—and we also limit the amount of information that players believe is relevant for computing their chances of winning the contest. Under these assumptions, our model can be tractably estimated and used to generate a series of counterfactual contest designs. Our counterfactual simulations show that different contest designs have heterogeneous effects on the incentives to make a submission and contest outcomes. We evaluate counterfactual contest designs by several performance indicators: the total number of submissions, the number of submissions by player type, and the upper-tail statistics of the score distribution. We find that information disclosure has an economically significant effect both on the number and the quality of the submissions. Without a public leaderboard, the number of submissions would decrease on average by 23.6 percent. This decline in the number of submissions is explained mostly by a reduction in the number of submissions by high-type players. Consistent with this finding, the maximum score decreases by 1.3 percent without a leaderboard. Reducing the noise between public and private scores, limiting the number of players, or changing the number of prizes, have smaller effects on competition performance. First, a fully informative leaderboard increases the number of submissions by 3.3 percent and the maximum score by 0.2 percent. Second, limiting the entry to 90 percent of the actual participants reduces the number of submissions by 8.2 percent and the maximum score by 0.2 percent. And third, awarding a single or multiple prizes, does not have a significant effect in any of 7
We do not model the choice of keeping a submission secret. As we explain in Section 2, the
evidence does not indicate that players are strategic in the timing of their submissions.
4
our indicators of contest performance. Finally, the results from our randomized control trials corroborate the main finding of our structural estimates. We created and hosted 44 student competitions on Kaggle to study how a public leaderboard impacts incentives and contest outcomes.8 Half of the competitions were randomly assigned to the control group (i.e., no public leaderboard) and the other half were assigned to the treatment group (i.e., public leaderboard), with competitions being otherwise equal. The results suggest that displaying a public leaderboard has a significant and positive effect on both the number of submissions and the maximum score. These “model free” results provide further evidence that a public leaderboard improves competition outcomes.
1.1
Related Literature
Contests are a widely used open innovation mechanism (Chesbrough et al., 2006). They attract talented individuals with different backgrounds (Jeppesen and Lakhani, 2010; Lakhani et al., 2013) and procure a diverse set of solutions (Terwiesch and Xu, 2008). An extensive literature on static contests has focused on design features such as the number and allocation of prizes, and the number of participants. The optimal allocation of prizes includes the work of Lazear and Rosen (1979), Taylor (1995), Moldovanu and Sela (2001), Che and Gale (2003), Cohen et al. (2008), Sisak (2009), Olszewski and Siegel (2015), Kireyev (2016), Xiao (2016), Strack (2016), and Balafoutas et al. (2017). This literature, surveyed by Sisak (2009), has found that the convexity of the cost of effort plays an important role in determining the optimal allocation of prizes. Taylor (1995) and Fullerton and McAfee (1999), among others, show that restricting the number of competitors in winner-takes-all tournaments increases the equilibrium level of effort. Intuitively, players have less incentives to exert costly effort when they face many competitors, because they have a smaller chance of winning. In dynamic settings, the role of information disclosure and feedback has only recently been explored. Aoyagi (2010) compares the provision of effort by agents under full disclosure of information (i.e., players observe their relative position) versus no infor8
All of the participants were University of Illinois at Urbana-Champaign students.
5
mation disclosure, in a dynamic tournament. Ederer (2010) adds private information to this setting whereas Klein and Schmutzler (2016) add different forms of performance evaluation. Goltsman and Mukherjee (2011) study when to disclose workers’ performance. Other recent articles studying dynamic contest design include Halac et al. (2014), Bimpikis et al. (2014), Benkert and Letina (2016), and Hinnosaar (2017). Some authors have analyzed design tools other than prizes, limited entry, or feedback. Megidish and Sela (2013) consider contests that require players to exert an exogenously given minimal level of effort to particpate. They show that a single prize is dominated by giving each participant an equal share of the prize, when the required minimal level of effort is high. Moldovanu and Sela (2006) show that it is optimal to split competitors into two divisions when the number of competitors is large. In the first round, participants compete within each of these divisions, and in the second round the winners of each division compete to determine the final winner. A growing empirical literature on contests includes Boudreau et al. (2011), Genakos and Pagliero (2012), Takahashi (2015), Boudreau et al. (2016), Bhattacharya (2016) and Zivin and Lyons (2018). Gross (2015) studies how the number of participants changes the incentives for creating novel solutions versus marginally better ones. In a static environment, Kireyev (2016) uses an empirical model to study how elements of contest design affect participation and the quality of outcomes. Huang et al. (2014) estimates a dynamic structural model to study individual behavior and outcomes in a platform where individuals can contribute ideas, some of which will be implemented at the end of the contest. This paper focuses on learning the value of ideas rather than on contest design. Finally, Gross (2017) studies how performance feedback impacts participation in design contests, but the analysis abstracts away from the dynamics of competition. Stopping decisions are based on each players’ past outcomes and not on a dynamic leaderboard. This is in contrast with our paper, where we allow for sequential participation and dynamic feedback based on other competitors’ results. Also related to our paper is the “gamification” literature, which studies the application of game-design elements (e.g., leaderboards) to areas such as education, marketing, health, or labor markets, among others. Most of these research is conducted with experiments. Landers and Landers (2014) show that adding a leaderboard improves
6
‘time-on-task” in a education setting. Landers et al. (2017) show that a leaderboard motivates agents to set more ambitious goals. Athanasopoulos and Hyndman (2011) find that a leaderboard improves forecasting accuracy. Finally, the literature on effort provision for non-pecuniary motives is also related. Lerner and Tirole (2002) argue that good quality contributions are a signal of ability to potential employers. Moldovanu et al. (2007) studies a setting where status motivates participation. Finally, it is possible to establish a parallel between a contest and an auction. While there is a well-established empirical literature on bidding behavior in auctions (Hendricks and Porter, 1988; Li et al., 2002; Bajari and Hortacsu, 2003, among others), there are only a few papers analyzing dynamic behavior in contests.
2
Background, Data, and Motivating Facts
2.1
Background and Data
To motivate and estimate our model, we use publicly available information on 57 featured competitions hosted by Kaggle.9 These competitions offered a monetary prize of at least $1,000, received at least 1,000 submissions, used between 10 and 90 percent of the test dataset to generate public scores, and evaluated submissions according to a well-defined rule. In these competitions, there was an average of 894 teams per contest, competing for rewards that ranged between $1,000 and $500,000 and averaged $30,489. A partial list of competition characteristics are summarized in Table 1 (see Table A.1 in the Online Appendix for the full list). All of these competitions, with the exception of the Heritage Health Prize, granted prizes to the top three scores.10 For example, in the Coupon Purchase Prediction competition, the three submissions with the highest scores were awarded $30,000, $15,000, and $5,000, respectively. Kaggle computes the public score and private score by evaluating a player’s submission in two subsamples of a test dataset. For example, in the Heritage Health Prize, the 9 10
https://www.kaggle.com/kaggle/meta-kaggle The following contests also granted a prize to the fourth position: Don’t Get Kicked!, Springleaf
Marketing Response, and KDD Cup 2013 - Author Disambiguation Challenge (Track 2).
7
Name of the
Total
Number of
Competition
Reward
Submissions
Heritage Health Prize
Teams
Start Date
Deadline
500,000
23,421
1,221
04/04/2011
04/04/2013
Allstate Purchase Prediction Challenge
50,000
24,526
1,568
02/18/2014
05/19/2014
Higgs Boson Machine Learning Challenge
13,000
35,772
1,785
05/12/2014
09/15/2014
Acquire Valued Shoppers Challenge
30,000
25,138
952
04/10/2014
07/14/2014
Liberty Mutual Group - Fire Peril Loss Cost
25,000
14,751
634
07/08/2014
09/02/2014
Driver Telematics Analysis
30,000
36,065
1,528
12/15/2014
03/16/2015
Crowdflower Search Results Relevance
20,000
23,237
1,326
05/11/2015
07/06/2015
Caterpillar Tube Pricing
30,000
23,834
1,187
06/29/2015
08/31/2015
Liberty Mutual Group: Property Inspection Prediction
25,000
40,594
2,054
07/06/2015
08/28/2015
Coupon Purchase Prediction
50,000
18,477
1,076
07/16/2015
09/30/2015
100,000
34,861
1,914
08/14/2015
10/19/2015
Homesite Quote Conversion
Springleaf Marketing Response
20,000
28,571
1,334
11/09/2015
02/08/2016
Prudential Life Insurance Assessment
30,000
42,336
2,452
11/23/2015
02/15/2016
Santander Customer Satisfaction
60,000
93,031
5,117
03/02/2016
05/02/2016
Expedia Hotel Recommendations
25,000
22,709
1,974
04/15/2016
06/10/2016
Table 1: Summary of the Competitions in the Data (Partial List) Note: The table only considers submissions that received a score. The total reward is measured in US dollars at the moment of the competition. See Table A.1 in the Online Appendix for the complete list of competitions.
test data was divided into a 30 percent subsample to compute the public scores and a 70 percent subsample to compute the private scores. Kaggle discloses the percentage of the data in each subsample, but players do not know which observations belong to each subsample, which creates imperfect correlation between public and private scores. All the competitions we consider display, in real-time, a public leaderboard which contains the public score of every submission made at each point in time. Because these public scores are calculated by only using part of the test dataset (e.g., 30 percent in the Heritage Health Prize competition), players’ final standing may be different than the final standing displayed in the public leaderboard. Although the correlation between public and private scores is very high in our sample (the coefficient of correlation is 0.99), the ranking in the public leaderboard and the private leaderboard may diverge. Hence, the public leaderboard provides informative, yet noisy, signals on the performance of all players throughout the contest. In about 79 percent of the competitions, the winner finished the competition within the top three of the final public leaderboard (see Table A.2 in the Online Appendix).
8
2.2
Motivating Facts
Our modeling choices are guided by a series of empirical facts. For each contest, we observe information on all submissions including when they were made (time of submission), who made them (team identity), and their score (public and private scores). Using this information, we reconstruct both the public and private leaderboard at every instant of time. To make comparisons across contests, we normalize the contest length and the total prize to one, and we standardize public and private scores. In each competition, the score heterogeneity at the lower end of the distribution is driven by participants that may be not trying to win the competition, but instead they are participating for non-pecuniary motives such as learning or as a recreational activity. Given that we are interested in modeling competitive players—those who are trying to win the competition—we divide teams into two categories: “competitive” and “noncompetitive.” Competitive teams are defined as teams that obtain scores above the 75th percentile of the score distribution in a competition.11 Table 2 presents summary statistics at the competition level, team level, and submission level. Table 2 (Panel A) presents summary statistics at the competition level. On average, there are 893.7 teams per competition, the reward is about $30,489, and competitions last for about 81.69 days. Panels B and C in Table 2 show summary statistics for all teams and competitive teams, respectively. About 25 percent of the teams are competitive and these teams send an average of 40 submissions per competition, which exceeds the overall sample average of 16.531 submissions. The number of members in a competitive team is on average 1.2 members, which is not significantly different than the average number of members in a team when considering the full sample. Panels D and E in Table 2 present summary statistics for all submissions and submissions by competitive teams, respectively. Public and private scores are standardized and range between -4 and 5.659 and -4 and 5.432, respectively. Competitive teams obtain higher public and private scores on average, where the average public scores are 0.358 and 0.004 for competitive teams and the overall sample, respectively. Com11
Table A.3 in the Online Appendix shows that competitive teams are more experienced: 63 percent
participate in more than one competition.
9
Share of teams with 1 submission or more 0 .2 .4 .6 .8 1
.2 Fraction of submissions .05 .1 .15 0
0
.2 .4 .6 .8 Fraction of time completed
1
0
.2
.4 .6 .8 Fraction of time completed
1
kernel = epanechnikov, degree = 0, bandwidth = .12, pwidth = .18
(a)
(b)
Figure 1: Submissions and Entry of Teams Over Time Across all Competitions Note: An observation is a submission. Panel (a) shows a histogram of submission by elapsed time categories. Panel (b) shows a local polynomial regression of the number of teams with 1 or more submissions as a function of time.
petitive teams also present significant variation in their scores (standard deviation of 0.75), and play more frequently than the rest of the teams—the average time between submissions for competitive teams and all teams is 1.2 and 1.5 percent of the contest time, respectively. Observation 1. Most teams are composed of a single member. Figure 1 shows the evolution of the number of submissions and teams over time. Figure 1 (Panel A) partitions all the submissions into time intervals based on their submission time. The figure shows that the number of submissions increases over time, with roughly 20 percent of them being submitted when 10 percent of the contest time remains, and only 6 percent of submissions occurring when 10 percent of the contest time has elapsed. Figure 1 (Panel B) shows the timing of entry of new teams into the competition. The figure shows that the rate of entry is roughly constant over time, with about 20 percent of teams making their first submission when 20 percent of the contest time remains. Observation 2. New teams enter at a constant rate throughout the contest. We also explore the time between submissions at the team level. Figure 2 shows a local polynomial regression for the average time between submission as a function of 10
Panel A: Competition-level statistics N
Mean
St. Deviation
Min
Max
Number of teams
57
893.702
963.081
79
5,117
Reward quantity
57
30,488.596
66,736.377
1,000
500,000
Length (days)
57
81.69
87.90
1
700
Panel B: Overall team-level statistics Number of submissions
N
Mean
St. Deviation
Min
Max
50,941
16.531
29.538
1
665
Number of members
50,941
1.127
0.604
1
40
Competitive team (indicator)
50,941
0.247
0.431
0
1
Panel C: Team-level statistics — competitive teams N
Mean
St. Deviation
Min
Max
Number of submissions
12,591
40.078
47.904
1
665
Number of members
12,591
1.228
0.881
1
24
Panel D: Overall submission statistics N
Mean
St. Deviation
Min
Max
842,089
0.004
0.991
-4.000
5.659
Private score
842,089
0.005
0.991
-4.000
5.432
Time of submission
842,089
0.601
0.289
0.000
1.000
Time between submissions
791,146
0.015
0.053
0.000
0.998
Public score
Panel E: Overall submission statistics — competitive teams N
Mean
St. Deviation
Min
Max
504,621
0.358
0.751
-3.999
5.659
Private score
504,621
0.355
0.749
-4.000
5.432
Time of submission
504,621
0.623
0.281
0.000
1.000
Time between submissions
492,030
0.012
0.044
0.000
0.985
Public score
Table 2: Summary Statistics Note: An observation in Panel D and E is a submission; an observation is a team–competition combination in Panels B and C; an observation in Panel A is a contest. Scores are standardized and time is rescaled to be contained in the unit interval. Time between submissions is the time between two consecutive submissions by the same team. Competitive teams are teams that achieved a public score above the 75th percentile of a contest’s final distribution of scores.
11
Time between submissions .005 .01 .015 .02
0
.2
.4 .6 .8 Fraction of time completed
1
kernel = epanechnikov, degree = 0, bandwidth = .08, pwidth = .12
Figure 2: Time Between Submissions Note: An observation is a submission. The figure shows a local polynomial regression of the time between submissions as a function of time.
time. The figure shows that the average time between submissions increases over time, suggesting that either teams are experimenting when they enter the contest or that building a new submission becomes increasingly difficult over time. Combined, Figure 1 and Figure 2 suggest that the increase in submissions at the end of contests is not driven by all teams making submissions at a faster pace, but simply because there are more active teams at the end of the contest and potentially more incentives to play. Observation 3. The rate of arrival of submissions increases with time. Table 3 decomposes the variance of public scores using a regression analysis. In column 1, we find that 40 percent of the variation in public score is between-team variation, suggesting that teams differ systematically in the scores that they achieve. In column 2, we control for the number of submissions that a team has submitted up to the time of each submission (e.g., the variable takes the value n − 1 for a team’s nth submission). This variable allows us to capture whether learning can explain some of the variation in scores. Column 2 shows that later submissions obtain higher scores, but only an extra 2.3 percent of the variance in scores is explained by this control. This suggests that while learning may be present, between-team variation explains the majority of the systematic variation in scores. Columns 3 and 4 repeat the analysis for competitive teams. In this restricted sample, teams are more homogeneous, so team fixed-effects explains less of the variation when compared to the whole sample. 12
(1)
(2)
Observations R2
(4)
All teams
Competitive teams
Public Score
Public Score
Submission number Competition × Team FE
(3)
0.0047∗∗∗
0.0041∗∗∗
(0.0000)
(0.0000)
Yes
Yes
Yes
Yes
833,970
833,970
504,410
504,410
0.490
0.513
0.226
0.270
Table 3: Decomposing the Public Score Variance Note: Robust standard errors in parentheses.
∗
p < 0.1,
∗∗
p < 0.05,
∗∗∗
p < 0.01. An observation is
a submission. Submission number is defined at the competition–team–submission level and measures the number of submissions made by a team up to the time of a submission.
Observation 4. Teams systematically differ in their ability to produce high scores. To understand how the public leaderboard shapes incentives to participate, we regress an indicator for whether a given submission was a team’s last submission on the distance between the team’s best public score up to that time and the best public score across all teams up to that time. Table 4 (Column 1) shows that it is more likely for teams to drop out of the competition when they start falling behind in the public score leaderboard. A one standard deviation increase in a team’s deviation from the maximum public score at time t is associated with a 2.84 percent increase in the likelihood of a team dropping out of the competition at time t. Column 2 explores whether this result is heterogeneous between competitive and non-competitive teams, and shows that competitive teams are less discouraged to quit the competition when they are falling behind, compared to non-competitive teams. In Table 5, we analyze how incentives to make a new submission are affected by a submission that increases the maximum public score by a sufficient amount (e.g., 0.01 for our analysis in Table 5). We call such a submission disruptive . To measure how a disruptive submission affects incentives to make new submissions, we first partition time into intervals of length 0.001 and compute the number of submissions in each of these intervals. We then perform a comparison of the number of submissions before13
(1)
(2)
Last submission (indicator) Deviation from max
0.0284∗∗∗
0.0138∗∗∗
public score (standardized)
(0.000 3)
(0.0005) -0.0156∗∗∗
Deviation * Competitive
(0.0005) -0.0646∗∗∗
Competitive team
(0.0009) Competition FE Observations R
2
Yes
Yes
842089
842089
0.020
0.040
p-value F-test
0.0000
Table 4: Indicator for Last Submission as a Function of a Team’s Deviation from the Maximum Public Score Note: Robust standard errors in parentheses.∗ (p < 0.1),
∗∗
(p < 0.05),
∗∗∗
(p < 0.01). Deviation from
max public score is the difference between the maximum public score minus the score of a submission, at the time of that submission. We standardize this variable using its competition-level standard deviation. See Table 2 for the definition of competitive team.
and-after the arrival of the disruptive submission, restricting attention to periods that are within 0.05 time units of the disruptive submission. Column 1 in Table 5 shows that the number of submissions decreases immediately after the disruptive submission by an average of 2.24 percent. We take this as further evidence of players using to the public leaderboard to make their decisions to continue or to quit. Table 5 (Column 2) shows a positive discouragement effect for non-competitive teams, while a near zero discouragement effect for competitive teams. Column 3 repeats the exercise in Column 2 using instead an indicator for teams who ended in the top 50, and shows similar results. Column 4 repeats the exercise in Column 3 using instead an indicator for whether the team ended in the top 10, and shows that top 10 teams are encouraged by a disruptive submission (i.e., they increase their number of submissions after a disruptive submission). Table 5 complements Table 4 in showing that the leaderboard shapes participation incentives and the leaderboard has heterogeneous effects across players. 14
(1)
(2)
(3)
(4)
Number of submissions (in logs) After disruptive submission
-0.0224∗∗∗
-0.0373∗∗∗
-0.0506∗∗∗
-0.0395∗∗∗
(0.0081)
(0.0095)
(0.0107)
(0.0097)
After * Competitive
0.0196 (0.0129) 0.0586∗∗∗
After * Top 50
(0.0157) 0.0885∗∗∗
After * Top 10
(0.0195) Competition FE Observations R
2
Yes
Yes
Yes
Yes
21545
37666
36701
31657
0.819
0.729
0.637
0.694
0.0522
0.4926
0.0045
p-value F-test
Table 5: The Impact of Disruptive Submissions on Participation Note: Robust standard errors in parentheses.
∗
(p < 0.1),
∗∗
(p < 0.05),
∗∗∗
(p < 0.01). Disruptive
submissions are those that increase the maximum public score by at least 0.01. Number of submissions is the number of submissions in time intervals of length 0.001. The regressions restrict the sample to before and after 0.05 time-units of the disruptive submission. All specifications control for time and time squared. See Table 2 for the definition of competitive team. Top 50 and Top 10 are indicators for whether the team ended the competition within the top 50 and 10 participants, respectively.
Observation 5. The public leaderboard shapes participation incentives. This effect is heterogeneous among players. Players may strategically choose when to release a disruptive submission, if they knew that a submission is disruptive. In this case, teams would have incentives to submit a disruptive submission as late as possible in the competition, to avoid encouraging players that are capable to generating good scores (Column 4 in Table 5). Empirically, however, we do not find this effect. Figure 3 plots the timing of submissions that increased the maximum public score by at least 0.01. In the figure we restrict attention to submissions that were during the final 75 percent of the contest time, because score processes are noisier earlier in contests. The figure suggests that disruptive submissions arrive uniformly over time and the pattern suggests that teams are either not strategic or they do not know when a submission will be disruptive. This may be driven by 15
the fact that teams only learn about the out-of-sample performance of a submission after Kaggle has evaluated it. That is, before making a submission, the teams can only evaluate the solution using the training data, which is not fully informative about its out-of-sample performance. Observation 6. Submissions that disrupt the public leaderboard are submitted uniformly over time.
Cumulative Probability
1 .8 .6 .4 .2 0 .2
.4
.6 Time of submission
.8
1
Figure 3: Timing of Drastic Changes in the Public Leaderboard’s Maximum Score (i.e., Disruptive Submissions): Cumulative Probability Functions Note: An observation is a submission that increases the maximum public score by at least 0.01. The figure plots submissions that were made when at least 25 percent of the contest time had elapsed.
3
Empirical Model
We consider a contest in which a number of players enter over time. Observation 2 does not suggest that players strategically choose their time of entry, but rather that they enter at a random time, possibly related to idiosyncratic shocks such as when they find out about the contest. We model the time of entry of a player as a random variable, τentry , drawn from an exponential distribution of parameter µ > 0. Players have heterogeneous ability (Observation 4),12 which is captured by the set of types Θ = {θ1 , ..., θp }. The distribution of types, κ(θk ) = Pr(θ = θk ), k = 1, ..., p is 12
We ignore team incentives and we treat each team as a single player (Observation 1)
16
known by all players. A player type reflects the player’s ability to produce good results in the competition. Players can send multiple submissions throughout the contest, but they can work at most on one submission at a time. We assume that the rate of arrival of submissions is constant (Observation 3) and that finishing a submission takes a random time τ distributed according to an exponential distribution of constant parameter λ. After the arrival of a submission, players immediately make the decision of whether to continue playing or to quit forever. The cost of building a new submission, c ∈ [0, 1], is an independent draw from the distribution K(c) = cσ . Figure 4 shows the timing of the game before the end of the competition at T . τentry ∼ exp(µ)
τ ∼ exp(λ) time
0
t1
t2
T
Figure 4: Timing of the game. A player enters at time t1 . At this time, the player decides to continue playing. The next submission takes time t2 − t1 to arrive. At time t = 2 the player again decides to quit or play again.
We model the score of a submission as a type-dependent random variable. A player of type θ draws a public-private score pair (ppublic,θ , pprivate,θ ) from a joint distribution Hθ . Players know the joint distribution Hθ , but they do not observe the realization (ppublic,θ , pprivate,θ ). This pair of scores is private information of the contest designer. In the baseline case, the contest designer discloses, in real-time, only the public score ppublic,θ but not the private score pprivate,θ . The final ranking, however, is constructed with the private scores.13 At the end of the contest, players are ranked according to their private scores and the first j-th players in the ranking receive prizes of value VP1 ≥ ... ≥ VPj , and we normalize
Pj
i=1
VPi = 1.
The collection of pairs (identity, score) from the beginning of the contest until instant 13
Players are allowed to send multiple submissions—each player sends about 20 submissions on
average. However, the final ranking is computed with at most two submissions selected by each player. About 50 percent of the players do not make a choice, in which case Kaggle picks the two submissions with the largest public scores. Out of the 50 percent remaining that indeed choose, 70 percent choose the two scores with the highest public score.
17
t t conforms the public leaderboard, denoted by Lt = {(identity, score)j }Jj=1 , where Jt
is the total number of submissions up to time t. Conditional on the terminal public history LT , player i is able to compute pfinal `,i = Pr(i’s private ranking is `|LT ), which is the probability of player i being ranked in position ` in the private leaderboard at the end of the contest, conditional on the final public leaderboard LT . > 0 even if The public leaderboard is just a noisy ranking. It is possible that pfinal 1,i player i is ranked last in the public leaderboard, albeit this is a low probability event. Hence, all of the information in the public leaderboard is relevant for deciding to play or to quit. Keeping track of the complete history of submissions, with as many as 15,000 submissions in some competitions, is computationally intractable. In contrast, in a competition with a fully-informative ranking, players would only need to keep track of a single number (e.g., the current highest jth public scores) to make their investment decision, which can be captured by a low-dimensional state space. This is because in our model draws are independent (there is no effort choice) so players only need to “beat a benchmark.” Conditional on a type, players outside the top j players have the same incentives to play regardless on the distance to the leaders. For this reason, if there is no noise, the j public scores are a sufficient statistic for all players. To overcome the computational difficulty posed by keeping track of the whole public > 0 for ` = 1, 2, 3 if and only if player i is among the history, we assume that pfinal `,i three highest scores in the final public leaderboard. In other words, we assume that the final three highest private scores are a permutation of the final three highest public scores. Table A.2 in the Online Appendix shows that in 79 percent of the contests that we study, the winner is among the three highest public scores, suggesting that this assumption is not too restrictive.14 Small and Myopic Players There are thousands of submissions and players in each contest. Fully-rational players would take into account the effect of their submissions on the strategy of the rival players. However, solving analytically or computationally a dynamic model with fullyrational and heterogeneous players turns out to be infeasible. As a simplification, we assume that players are small, i.e., they do not consider how their actions affect the 14
This could be relaxed with more computational power.
18
incentives of other players. This price-taking-like assumption is not unreasonable for our application and it is not in contradiction with Observations 5 and 6. Our model captures competitive effects through beliefs over the number of submissions that rival players will send in the competition. We derive these beliefs as an equilibrium object by equating the actual number of submissions in a contest with the realized number of submissions in the model. In the counterfactual simulations, we find these beliefs as a fixed-point: given a belief about the total number of submission, the model predicts certain number of submissions and in equilibrium these two quantities must coincide.15 In addition to assuming that players are small, we assume that when players decide to play or to quit, they expect more submissions in the future by rival players, but not by themselves. That is, myopic players think this current opportunity to play is their last one. It is worth noting that under this assumption players might play multiple times, however they think that they will never have a future opportunity to play or in case they do they will choose not to play. This assumption can be relaxed with more computational power. State Space and Incentives to Play The relevant state space is defined by three sets. First, we define the set of scores Y = {y = (y1 , y2 , y3 ) ∈ [−4, 6]3 : y1 ≥ y2 ≥ y3 }. Second, we define the set of score ownership RS = {∅, 1, 2, 3, (1, 2), (1, 3), (2, 3)} to be . An element r ∈ RS indicates which of the top 3 public scores (if any) belong to a player. And third, T = [0, 1] represents the contest’s time. With a public leaderboard, y ∈ Y and t ∈ T are public information common to all players. Under the small-player assumption, the relevant state for each player is characterized by si = (t, ri , y) ∈ S ≡ T × RS × Y. To be precise, s = (t, ri , y) ∈ S means that at time t the three scores on the leaderboard are given by y, and player i owns the components of vector y indicated by ri . For example, if player i’s state is (t, (1, 3), (0.6, 0.25, 0.1)) it means that at time t, player i owns components one and three in vector y, i.e., the player two out of the three highest public scores, 0.6 and 0.1, belong to player i. The small-player assumption reduces the dimensionality of the state space, because players care only about the three highest public scores and which one of them they 15
Similar assumptions are made in Bhattacharya (2016).
19
own. Players do not observe the private scores, but they are able to compute the conditional distribution of private scores given the set of public scores. Because prizes are allocated at the end of the contest, the payoff-relevant states are the terminal states s ∈ {T } × RS × Y. We denote by π(s) the payoff of a player at terminal state s. In vector notation, we denote the vector of terminal payoffs by π. We consider a finite grid of m values for the public scores, Y = {y 1 , ..., y m }. If a player of type θ decides to play and sends a new submission, the public score of that submission is distributed according to qθ (k) ≡ Pr(y = y k |θ), k = 1, ..., m. Although players are small, they form beliefs over the number of future submissions sent by their rivals. At time t, a player believes that with probability pt (n) exactly n additional rival submissions will arrive before the end of the competition with scores independently drawn from the distribution G, where PrG (y = y k ) =
P
κ(θ)qθ (k). We
θ∈Θ
assume
γtn e−γt , n = 0, 1, ... (1) n! Under this functional form, players believe that the expected number of remaining pt (n) =
rival’s submissions at time t is γt . We impose γt to be a decreasing function of t. To derive the expected payoff of sending an additional submission we proceed in two steps. First, we solve for the case in which a player thinks she is the last one to play, i.e., pt (0) = 1, and then we solve for the belief pt (n) given in Equation 1. Denote by Btθ (s) the expected benefit of building a new submission for a player of type θ at state s, when pt (0) = 1. For clarification, consider the following example. A player of type θ is currently at a state s = (t, r = (1, 2), y = (y1 , y2 , y3 )) and has an opportunity to play. If she plays and the new submission arrives before T (which happens with probability 1 − e−(T −t)λ ), the transition of the state depends on the score of the new submission y˜. The state (r, y) can transition to (r0 , y 0 ) where: r0 = (1, 2) and y 0 = (y1 , y2 , y3 ) when y˜ < y2 ;16 or r0 = (1, 2) and y 0 = (y1 , y˜, y3 ) when y2 ≤ y˜ < y1 ; or r0 = (1, 2) and y 0 = (˜ y , y1 , y3 ) when y1 ≤ y˜. More generally, we can repeat this exercise for all states s ∈ S and put all these transition probabilities in a |RS × Y| × |RS × Y| matrix denoted by Ωθ . Each row of this matrix corresponds to the probability distribution over states (r0 , y 0 ) starting from state (r, y), conditional on 16
See footnote 13.
20
the arrival of a new submission. If the new submission does not arrive, then there is no transition and the state remains (r, y). In matrix notation, where each row is a different state, the expected benefit of sending one extra submission is given by B θt = (1 − e−(T −t)λθ )Ωθ π + e−(T −t)λθ π. Consider a given state s. With probability (1 − e−(T −t)λ ) the new submission arrives before the end of the contest. The score of that submission (drawn from qθ ) determines the probability distribution over final payoffs. This is given by the s-row of the matrix Ωθ . The expected payoff is computed as (Ωθ )s• ·π which corresponds to the dot-product between the probability distribution over final states starting from state s and the payoff of each terminal state. With probability e−(T −t)λ the new submission is not finished in time and therefore the final payoff for the player is given by πs (the transition matrix is the identity matrix). A player chooses to plays if and only if the expected benefit of playing net of the cost of building a submission is larger than the expected payoff of not playing, i.e., B θt − c ≥ π
⇐⇒
(1 − e−(T −t)λ )[Ωθ − I]π ≥ c.
(2)
We can now easily incorporate the belief pt (n) in Equation 2. With myopic players, the final state does not depend on the order of submissions, because payoffs are realized at the end of the competition,17 so each player cares only about their ownership at the final state. Thus, we can replace the final payoff by the expected payoff after n rival submissions and then let the agent decide whether to make her last submission considering this new expected payoff. That is, from state s, there is a probability distribution over S after n rival submissions (of scores drawn from the distribution G) ˆ n , where Ω ˆ is constructed similarly to Ωθ but replacing given by the s-th row of matrix Ω qθ (·) by the mixture probability g(·). Instead of considering the payoff π before the last ˆ n π with probability pt (n). Hence, the play, the player considers the expected payoff Ω player plays if and only if: (1 − e−(T −t)λ )[Ωθ − I]
∞ X
ˆ n πpt (n) ≥ c. Ω
(3)
n=0
Equation 3 is similar to Equation 2, except that now the final payoff depends on the player’s belief about the number of submissions made by rival players in the future. 17
Except for ties, but we deal with this issue in the numerical implementation.
21
Using the definition of pt (n) and the exponential of a matrix we obtain18 ˆ
Γθ,t ≡ (1 − e−(T −t)λ )[Ωθ − I]eγt [Ω−I] π ≥ c.
(4)
Equation 4 provides a tractable equation that can be used for estimation, by making use of efficient algorithms to compute the exponential of a matrix. Future competition in this equation is captured through γt , the beliefs over the total number of remaining rival submissions in the contest. Conditional on a state s = (t, r, y) there are two effects driving the comparative statics on t: As the competition approaches its end, on the one hand a player has less incentives to make an extra submission because she is less likely to finish building it before the end of the competition. On the other hand, it faces fewer rival submissions, which gives her more incentives to send an extra submission later on in the contest. The comparative statics on γ is intuitively clear: When γt > γt0 players have less incentives to play because they expect more competition in the future. Finally, given that higher types draw better scores than low types, high types have larger incentives to play conditional on a given state.
3.1
Discussion of Model Assumptions
Some of the assumptions in our model are made for computational tractability or to keep the model parsimonious, whereas others are justified from empirical observations. Our analysis does not incorporate learning. Teams might experiment (Figure 2) and get a better understanding of the problem over time, which may lead them to potentially improve their performance over time.19 We do not incorporate learning for two reasons. First, learning would make the model more involved. And second, Table 3 shows that between-team differences explain the majority of the systematic variation in scores. A second assumption of our model is that entry is exogenous. In reality, players choose which contests to participate in. Azmat and Möller (2009) show that contest design (in particular, the allocation of prizes) affects players decisions when they choose among multiple contests. Also Levin and Smith (1994), Bajari and Hortacsu (2003), and Krasnokutskaya and Seim (2011) explore how endogenous entry affects equilibrium outcomes P∞ n The exponential of a matrix A is defined by eA ≡ n=0 An! 19 Clark and Nilssen (2013), for example, present a theory of learning by doing in contests.
18
22
and optimal design. Although we acknowledge this shortcoming of our analysis, we have various reasons to make this assumption. First, in our data most players participate in a single contest (see Table A.3 in the Online Appendix), so it is hard to define a group of potential entrants. Second, all contests in Kaggle display a leaderboard, so we cannot identify how this feature of contest design (displaying a leaderboard) affects entry using the observational data. Finally, and as we will discuss below, our experiment reveals that contests with and without a public leaderboard draw the same amount of participants on average, which alleviates the concern of endogeneous entry. A third potential concern is the assumption that players do not strategically choose when to send their submissions. Ding and Wolfstetter (2011) show that players could withhold their best solutions and negotiate with the sponsor of the contest after the contest has ended. This selection introduces a bias on the quality of submitted solutions. In our setting, players benefit by sending a submission, because they receive a noisy signal about the performance of the submission. We also find that the timing of disruptive submissions is roughly uniformly distributed over time (as shown in Figure 3), which alleviates the concern of strategic timing of submissions. Also related is the assumption that players make a decision to continue or to quit immediately after the arrival of a submission. If we observe two submissions by a player at times t1 and t2 , we know that this player must have spent some time t ∈ [0, t2 − t1 ] working on the submission. Instead of modeling the distribution of idle time between submissions— similar to the random time of play assumption in Arcidiacono et al. (2016)—we assume that the idle time is zero, i.e, t = t2 − t1 . We make this assumption because we observe a short time between submissions. Thus, the effect of idle time is likely to be small, but adding this effect incorporates an extra burden in the estimation of our model. We present a model of myopic players, purely motivated for computational tractability. The problem is that the estimation of a contest (by backward induction) takes a long time given the size of our state space.20 Myopic players bias the results towards less participation and more exit than a model with fully rational players would predict. A non-myopic player would expect a higher continuation payoff in the future (because of the option value of playing again), so conditional on a cost realization the myopic 20
In an earlier draft, we included estimates for the dynamic model for a handful of contests. The
estimates suggested that the myopic assumption only caused a small bias in the cost estimates.
23
player has less incentives to play than forward looking agent.
4
Estimation
We estimate the parameters of the model in two steps. First, we estimate a number of primitives directly from the data. Second, using the estimates of the first step, we estimate the remaining parameters using a likelihood function constructed based on the model. We repeat this procedure for each contest. When estimating the model, we restrict attention to the subsample of competitive teams (see Table 2). The full set of parameters for a given contest include: i) the distribution of new player arrival times, which we assume follow an exponential distribution with parameter µ; ii) the distribution of submission arrival times, which we assume follow an exponential distribution with parameter λ; iii) the distribution of private score conditional on public score, H(·|ppublic ), which we assume is given by pprivate = α + βppublic + , with distributed according to a double exponential distribution; iv) the type-specific cumulative distribution of public scores, which we assume is given by the standard normal distribution, Qj (x) = Φ
x−θjmean θjst.dev
for type θj ; v) the distribution of types, κ, which we assume
is a discrete distribution over the set of player types, Θ; vi) the time-specific distribution of the number of submissions that will be made in the remainder of the contest, pt (n), which we assume follows a Poisson distribution with parameter γt = γ · (T − t); and, lastly, vii) the distribution of submission costs, which we assume has a support that is bounded above by 1 (i.e., the normalized value of the total prize money), and has a cumulative distribution function given by K(c; σ) = cσ (with σ > 0). We estimate primitives i) through vi) in the first step, and vii) using the likelihood function implied by the model. i), ii), and iii) are estimated using the maximum likelihood estimators for µ, λ, and (α, β), respectively. We estimate iv) and v) using a Gaussian mixture model that we estimate using the EM algorithm. The EM algorithm estimates the k Gaussian distributions (and their weights, κ(θk )) that best predict the observed distribution of public scores. Throughout our empirical analysis we assume that there are k = 2 player types.21 Appendix B in the Online Appendix provides 21
We experimented with different number of types. k = 2 is parsimonious and gave us a good fit.
24
additional details of the estimation of these objects. Lastly, for vi), and as discussed above, we impose that γ must equal the observed number of submissions in each contest (see Table 2), as a way of capturing γ as an equilibrium object. The linearity assumption, γt = γ · (T − t), is made to simplify the computation of the equilibrium. Under this assumption, finding an equilibrium entails finding a single number as a fixed point, rather than a function. In each of the counterfactuals, we recompute γ as an equilibrium object. The likelihood function implied by the model is based on the decision of a player to make a new submission. Recall that a player chooses to make a new submission immediately after the arrival of each of his submissions. A player facing state variables s chooses to make a new submission at time t if and only if Γθ,t (s) ≥ c,
(5)
ˆ
where Γθ,t = (1 − e−(T −t)λθ )[Ωθ − I]eγ(T −t)[Ω−I] π is the vector of the net benefits of making a new submission at time t for all posible states s (before deducting the cost of making a submission) and c is the cost of a submission. Γθ,t depends only on primitives estimated in the first step of the estimation, which simplifies the rest of the estimation. When computing Γθ,t we partitioned the contest time [0, 1] in 200 time intervals. Based on Equation 5, a θ-type player facing state variables s plays at time t with probability P r(Γθ,t (s) > c), so we have Pr(play|s, t, θ) = K (Γθ,t (s)) . Given that we do not observe the player’s type, we take the expectation with respect to θ, which yields Pr(play|s, t) =
X
κ(θ)K (Γθ,t (s)) ,
θ
where κ(θ) is the probability of a player being of type θ. The likelihood is constructed using tuples {(si , ti , t0i )}i∈N , where i is a submission, si is the vector of state variables at the moment of making the submission, ti is the submission time, and t0i is the arrival time of the next submission, which may or may not be observed. If the next submission is observed, then ti < t0i ≤ T , if not, t0i > T . If 25
the new submission arrives at t0i ≤ T , then the player must have chosen to make a new submission at ti , and the likelihood of the observation (si , ti , t0i ) is given by l(si , ti , t0i ) = 0
0
Pr(play|si , ti ) · λe(−λ(ti −ti )) , where λe(−λ(ti −ti )) is the density of the submission arrival time. If we do not observe a new submission after the player’s decision at time t (i.e., t0i > T ), then the likelihood of (si , ti , t0i > T ) is given by l(si , ti , t0i > T ) = Pr(play|si , ti ) · e(−λ(T −ti )) + 1 − Pr(play|si , ti ), which considers both the events of i) the player choosing to make a new submission at ti and the submission arriving after the end of the contest; and ii) the event of the player choosing not to make a new submission. The log-likelihood function is then given by L(δ) =
X
log l(si , ti , t0i ),
i∈N
where δ is the vector of structural parameters. We perform inference using the asymptotic distribution of the maximum likelihood estimator.
4.1
Model Estimates
Table 6 presents the maximum likelihood estimates for the submission-cost distribution as well as for the distributions of entry time and submission arrival time. Table A.5 in the Online Appendix presents the EM algorithm estimates for the type-specific distributions of scores, and Table A.6 in the Online Appendix presents estimates for the distribution of private scores conditional on public scores. The model was estimated separately for each contest. Table 6 (Column 1) shows estimates for the players’ rate of entry in a given competition. The estimates imply that the average entry time ranges between 22 and 63 percent of the contest time, and the mean average entry time across all contests is 41 percent of the contest time. Table 6 (Column 3) presents the estimates for the rate at which submissions are completed. In line with Table 2, the estimates suggest that the average time between submissions ranges between 0.5 and 5.5 percent of the contest time, and the mean average time between submissions across all contests is 1.5 percent of the contest time. 26
Table 6 (Column 5) presents estimates for the coefficients governing the distribution of submission costs. These estimates imply that the expected submission cost ranges between $1.89 and $649.22 Figure 5 shows some implications of our estimates. Figure 5(a) shows the distribution of the expected cost of making a submission (in dollars), and Figure 5(b) shows the daily cost of working on a submission (in dollars). The average values for the expected cost of a submission and the daily cost of a submission are $57.1 and $47.4, respectively. Figure 5(c) shows a scatter plot of the total expected cost spent by all participants of a contest and the prize, both measured in logs. We can see that in the majority of the contests the total expected cost is greater than the prize, suggesting rent dissipation. µ
SE
λ
SE
σ
SE
ˆ log L(δ)/N
N
hhp
2.585
0.1701
191.9182
1.6088
0.0013
0.0001
-4.123
14231
allstate-purchase-prediction-challenge
1.9824 0.1274
125.6546
1.1708
0.0045
0.0005
-3.6895
11519
higgs-boson
2.3698
0.114
122.1003
0.8177
0.0013
0.0001
-3.6756
22298
acquire-valued-shoppers-challenge
2.0772
0.1316
165.2723
1.2866
0.0011
0.0001
-3.9934
16500
liberty-mutual-fire-peril
3.3344
0.3194
122.6742
1.3403
0.0008
0.0002
-3.7291
8377
axa-driver-telematics-analysis
2.4383
0.1405
127.6736
0.8938
0.0009
0.0001
-3.7595
20405
crowdflower-search-relevance
2.1332
0.1093
83.3317
0.6605
0.0021
0.0002
-3.2883
15919
caterpillar-tube-pricing
3.2674
0.1757
68.2579
0.5565
0.001
0.0001
-3.0972
15047
liberty-mutual-group-property-inspection-prediction
3.1055
0.1152
67.0227
0.4112
0.0062
0.0004
-3.0375
26573
coupon-purchase-prediction
2.0586
0.1093
73.1853
0.6539
0.0009
0.0001
-3.146
12526
springleaf-marketing-response
3.0405
0.1689
97.3279
0.7153
0.0008
0.0001
-3.4726
18513
homesite-quote-conversion
2.4958
0.1548
128.5381
0.9678 0.0004
0.0001
-3.7599
17638
prudential-life-insurance-assessment
2.1598 0.0799
78.0817
0.4707
0.004
0.0003
-3.2007
27512
santander-customer-satisfaction
2.3098
0.0563
75.1579
0.3048
0.0055
0.0002
-3.1614
60816
expedia-hotel-recommendations
2.2792
0.086
43.2208
0.3422
0.0009
0.0001
-2.564
15948
Table 6: Maximum Likelihood Estimates of the Cost and Arrival Distributions (partial list).
Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’ See Table A.4 in the Online Appendix for the full table. 22
The expected cost in dollars is given by
σV 1+σ ,
where V is the total reward.
27
.04 .03
.03
Density .02
Density .02
0
.01
.01 0
0
200 400 600 Expected cost of a submission (in dollars)
100 200 300 Cost of submission per day
400
(b) Cost of a submission per day (in dollars)
8
Prize (in logs) 10 12 14
16
(a) Cost of a submission (in dollars)
0
8
10 12 14 Expected cost of all submissions (in logs)
16
(c) Rent dissipation Figure 5: Estimates for the cost of making a submission Note: An observation is a contest. Cost of a submission per day is the expected cost divided by the average number of days between submissions. The average values for the expected cost of a submission and the daily cost of a submission are 57.1 and 47.4 dollars, respectively. The expected cost of all submissions is the expected cost of a submission multiplied by the predicted number of submissions for each contest. The predicted number of submissions is based on 2, 000 simulations of each contest using our model estimates.
Table 7 studies how the estimates for the average cost of making a submission, the rate of team entry, and the rate of arrival of submissions vary as a function of contest prize. The table shows a positive correlation between the contest reward and both the average cost of making a submission and the rate of team entry. This suggests that contests with larger prizes are more difficult and lead teams to enter sooner. The greater difficulty is consistent with the empirical observation that teams remain active for less time in 28
log Prize (in USD)
(1)
(2)
(3)
log E[c]
log λ
log µ
0.8906∗∗∗
0.2085∗∗∗
-0.0021
(0.1040)
(0.0388)
(0.0228)
57
57
57
0.494
0.232
0.000
Observations R
2
Table 7: Parameter estimates and contest observables Note: Robust standard errors in parentheses.
∗
p < 0.1,
∗∗
p < 0.05,
∗∗∗
p < 0.01. An observation is a
contest. Expected cost is given by σ/(1 + σ). The values of σ, µ, λ are reported in Table A.4.
competitions with greater rewards (i.e., the exit rate is higher). To capture this pattern in the data, the model needs a larger cost in order to fit the larger exit rate.23 Finally, Figure A.1 in the Online Appendix presents a scatter plot of the entry rate of teams and the arrival rate of submissions, and shows a weak negative correlation. With respect to how well the model fits the data, Figure 6 plots the actual versus the predicted number of submissions in each contest. The predicted number of submissions in a contest is computed by averaging the number of submissions across 2,000 simulations of the contest. The simulations make use of the estimates of the model and take the number of teams that participate in each contest as given. The correlation between the actual and the predicted number of submissions is 0.97. The figure shows that the model does not systematically over or under predict participation. Figure 7 shows the fit of the EM algorithm for one of the competitions. These figures suggest a good fit along the dimensions of the number of submissions and distribution scores. 23
Almost every competition in our data lasts three months, so the data offers little variation in
contest length to establish relationships between estimates and contest length.
29
Predicted participation (in logs)
11
10
9
8
7
6 6
7
8
9
10
11
12
Actual participation (in logs)
Figure 6: Number of Submissions Predicted by the Model Versus Actual Number of Submissions Note: An observation is a contest. The coefficient of correlation between the actual and predicted number of submissions is 0.97. The predicted number of submissions is based on 2, 000 simulations of each contest using our model estimates.
1.2
0.8 Type 1 Type 2
Model Data
0.7 1 0.6 0.8
Density
Density
0.5
0.4
0.6
0.3 0.4 0.2 0.2 0.1
0 -10
-8
-6
-4
-2
0
2
0 -10
4
Score
(a) Distribution of scores by type.
-5
0
5
Score
(b) Distribution of scores in the data and predicted by the EM algorithm.
Figure 7: Estimates of the distribution of scores by type for the contest ‘Homesite Quote Conversion’
30
5
Counterfactual Contest Designs
In this section, we use our model estimates to compare the baseline contest design with a number of counterfactual designs. We evaluate each counterfactual contest design by three performance indicators: the total number of submissions, the 99th percentile of the distribution of scores, and the maximum score. The total number of submissions is a proxy for diversity whereas the moments of the score distribution measure the quality of the “best solutions.” In the counterfactual exercises, the expected number of submissions, γ, is estimated as an equilibrium object. An equilibrium γ ∗ equates the number of submissions implied by the model, N (γ ∗ ) with the total expected number of submissions, γ ∗ T . The equilibrium exists and it is uniquely determined because, conditional on all the other parameters of the model (including the information design), the expected number of submissions N (γ) is decreasing in γ—players have less incentives to play when they expect more rival submissions—and γT is increasing in γ. Therefore, there is a unique fixed point.
5.1
Information Design
We first study the role of information disclosure. Kaggle competitions score each submission according to a partition of the test dataset—e.g., 60 percent of test data to generate public scores and 40 percent to generate private scores. The contest designer chooses the size of this partition and whether to disclose public scores. In the competitions in our data, the contest designer discloses the size of each of the subsets that partition the test-dataset, and the public scores. However, the final standings are computed using the private scores, which are revealed only at the end of the competition. We consider two counterfactual designs. In the first one, the sponsor does not display a leaderboard: participants only observe their own scores but they do not observe their rivals’ scores. In the second one, a public leaderboard is displayed but the sponsor eliminates the noise in the evaluation (the public score equals the private score). No Public Leaderboard
31
With a public leaderboard, at any given time t, players have more incentives to play after histories where the maximum score is low, relative to histories where the maximum score is high. Without a public leaderboard, players’ decisions to continue playing or to quit are based on their belief about the current history, which is an average across all feasible histories. When taking this average, the favorable histories “cross-subsidize” unfavorable ones. Depending on the strength of the cross-subsidization, a public leaderboard may encourage or discourage participation. We simulate the counterfactual case where all competitions are held without a public leaderboard. Perfect Correlation between Public and Private Scores The sponsor of a competition decides the correlation between public and private scores. All of the competitions in our data feature imperfect correlation between public and private (although this correlation is 0.99). We simulate a counterfactual where the sponsor removes the noise between public and private scores, perfectly informing participants about their current ranking. Noisy signals can encourage or discourage participation— players’ incentives in a competition with an extremely noisy leader are the same as players’ incentives in a competition without leaderboard.
5.2
Number of Prizes
Most of the competitions in our data award three prizes. Instead, the sponsor of a competition could decide to award a single prize. We study the counterfactual where the sponsor allocates a single prize to the winner, keeping the total reward fixed. Relative to a single prize, multiple prizes strengthen the incentives to be “at the top” of the ranking, but weaken the incentives to reach the first place, conditional on being at the top of the ranking. Hence, total participation in a contest may increase or decrease when there are multiple prizes rather than a single prize.
5.3
Limiting the Number of Participants
The majority of Kaggle competitions are open to anyone willing to participate. Some competitions, however, restrict participation. We consider the counterfactual case 32
where the number of entrants is limited to 90 percent of the actual number of entrants in each competition. Limiting participation has a direct effect (fewer participants) that could worsen outcomes, and an indirect equilibrium effect (less competition) that could improve outcomes.
5.4
Estimation Results for Counterfactual Designs
Table 8 reports estimates for how contest outcomes change in each counterfactual contest design, measured relative to the contest outcomes in the baseline case with the public leaderboard. (1)
(2)
(3)
(4)
Contest-level outcomes (in logs) Number of
Max
99th pct
submissions
score
score
∗∗∗
No leaderboard
-0.236
∗∗∗
-0.013
∗∗∗
-0.008
(5)
(6)
Team-level outcomes (in logs) Av number of submissions All teams ∗∗∗
-0.236
Low-type
High-type
-0.015
-0.322∗∗∗
(0.036)
(0.002)
(0.002)
(0.036)
(0.051)
(0.033)
Fully informative
0.033∗∗
0.002∗∗
0.000
0.033∗∗
0.012
0.037∗∗
leaderboard
(0.014)
(0.001)
(0.001)
(0.014)
(0.021)
(0.014)
-0.082∗∗∗
-0.002∗∗
0.002∗∗
-0.082∗∗∗
-0.247∗∗∗
-0.052∗∗∗
participation
(0.011)
(0.001)
(0.001)
(0.011)
(0.042)
(0.014)
Single prize
0.005
0.000
0.000
0.005
0.002
0.006
(0.011)
(0.000)
(0.001)
(0.011)
(0.019)
(0.010)
Limited
N
285
285
285
285
285
285
R2
0.992
0.999
0.999
0.961
0.937
0.957
Table 8: Contest outcomes on contest design Notes: An observation is a contest–design combination. All specifications include contest fixed effects. The definition of the variables is as follows: ‘all’ is total submissions, ‘av’ is the average number of submissions per team, ‘sm’ is the max score in a competition, ‘s99p’ is the score at the 99th percentile in a competition, ‘m’ is the average number of submissions per low-type teams, and ‘h’ is the average number of submissions per high-type team. Type θi is the high-type if θimean + 3θist.dev > θjmean + 3θjst.dev .
The first row of Table 8 presents the results of comparing the contest performance indicators in the counterfactual case without a public leaderboard relative to the baseline 33
case with a public leaderboard. The results in columns 1 to 3 indicate that hiding the public leaderboard on average leads to a lower maximum score as well as fewer submissions. Columns 4 to 6 explore heterogeneity results and suggest that these results are driven by an average decrease in the number of submissions made by high-type teams. We define type θi to be the high type if θimean + 3θist.dev > θjmean + 3θjst.dev (see Table A.5 for the type-specific parameter estimates). Figure A.2 in the Online Appendix plots the distribution of effects across contests, and reveals that the number of submissions and the maximum score decrease for the great majority of contests without a leaderboard. When comparing participation with and without a leaderboard, there are several forces at play: the rate at which players enter, the rate at which submissions arrive, the distribution of costs, and the distribution of players’ types. All of these effects combined determine the distribution of histories, which in turn determines participation decisions. Without a leaderboard, and given that prizes are awarded at the end of the competition, at any time t players consider a fixed benchmark—the expected final maximum score of their rivals. With a leaderboard, players update their beliefs about the maximum score at the end of the competition, considering the remaining contest time and the current maximum score. Participation improves with a leaderboard when the increase in participation at favorable histories (those with a low maximum score) outweighs the decrease in participation at unfavorable histories (those with a high maximum score). Table A.7 in the Online Appendix shows that at any moment of time players send more submissions when the maximum score is lower, which is in line with our descriptive analysis (e.g., see Table 4 and Table 5). Furthermore, the table also shows an asymmetric response to the maximum score when it is below or above the expected maximum score: a maximum score that is below the expected maximum score by x, increases the number of submissions by more than the decrease in submissions caused by a maximum score that is greater than the expected maximum score by x. This asymmetry suggests that the increase in participation in favorable histories outweighs the decrease in participation in unfavorable histories, explaining the result of increased participation with a public leaderboard. In explaining this asymmetry, we note that exit decisions are irreversible in our model, so observing a favorable history in the leaderboard may encourage a player to stay in the competition, adding future opportunities to play for that player, whereas without 34
a leaderboard the same player may be discouraged and quit the competition. Another beneficial aspect of displaying a leaderboard are non-pecuniary incentives, which we do not model. For instance, Lerner and Tirole (2002) rationalize collaboration in open-software as a signaling mechanism to potential employers, Moldovanu et al. (2007) provides examples where players care about the relative position (status), and Brunt et al. (2012) present evidence that non-monetary awards (such as medals) are more important than monetary awards to encourage competition. The leaderboard in a competition may provide all of these elements: a signalling device, status, and a progression system.24 If anything, modeling non-pecuniary motives would magnify our estimated benefits of showing a leaderboard. The second row of Table 8 presents the results of comparing the contest performance indicators in the counterfactual case where the public leaderboard is fully informative relative to the baseline case with a noisy public leaderboard. This row shows that a fully informative leaderboard on average improves performance. The direction of this effect is consistent with the previous result (no leaderboard and large noise should move in the same direction), although the magnitude of the change in participation and upper tail outcomes is small. A fully informative leaderboard, however, increases the risk of participants engaging in overfitting—working on solutions that maximize the public score ranking but are not robust outside of the test data. Our model does not incorporate overfitting, but knowing that a fully informative leaderboard does not have a large impact on outcomes provides an argument in favor of a noisy public leaderboard. The third row of Table 8 compares outcomes in the case where the number of participants in each contest are decreased by 10 percent relative to the baseline case. We find that the measures of contest performance worsen, implying that the encouragement players get by facing fewer competitors does not outweigh the direct effect of 10 percent fewer players.25 24
The Kaggle website states: “Kaggle’s Progression System uses performance tiers to track your
growth as a data scientist on Kaggle. Along the way, you’ll earn medals for your achievements and compete for data science glory on live leaderboards.” 25 Table A.8 in the Online Appendix further explores counterfactual designs that limit participation. The table shows that decreasing the number of teams between 1 and 9 percent on the total number of submissions weakly decreases the number of submissions (statistically speaking).
35
The last row of Table 8 shows how contest performance changes in the counterfactual case where a single prize is awarded. Our results show that changing the allocation of prizes has a small and statistically insignificant overall effect on participation. In our model, conditional on a type, players outside the top 3 have the same incentives to play, because each new submission is a random draw from a distribution (and there is no learning). And because the difference between the first score and the third score at the end of every competition is very small, this is a second order effect on incentives.
6
Experimental Evidence
To complement our structural estimates, we ran a randomized control trial on Kaggle.26 The objective of the experiment is to provide additional evidence—independent of our model’s assumptions—on how information disclosure impacts participation and outcomes. The experiment allowed us to observe contest outcomes in competitions with and without a leaderboard, keeping other aspects of the contest fixed (e.g., difficulty, prize, duration, number of participants).
6.1
Description of the Experiment
We hosted 44 competitions on Kaggle and each competition was randomly assigned to the treatment or control groups. Our treatment competitions displayed a real-time leaderboard, providing information about the performance of all participants, whereas our control competitions did not provide feedback to players. All of the competitions were identical in other aspects of design. The competitions were run simultaneously and lasted for 10 days. The competitions consisted on solving a simple prediction problem: to interpolate a function (see Online Appendix C for details). Participants were allowed to submit up to 10 sets of predictions per day. The most accurate predictions in each competition were awarded an Amazon gift card worth $50. We recruited 220 students (both undergraduates and graduates) from the University of Illinois at Urbana Champaign, via emails, department newsletters, and flyers. Par26
Approval from the University of Illinois Human Subjects Committee, IRB18644.
36
ticipants were asked to complete an initial survey from which we obtained information about participants such as past experience with online competitions and data analysis. There were also asked to create a Kaggle username. With this pool of potential players, we formed 44 competitions of 5 players each. Participants were randomly allocated to these 44 competitions. Table 9 shows the outcome of the randomization. The left panel (“Invited players”) shows the balance of covariates across competitions in the treatment and control groups. The table shows no statistically significant differences across groups in a number of covariates related to the participants’ knowledge of statistical tools and experience. The right panel (“Entrants”) repeats the analysis, but restricts attention to the participants who submitted at least one solution during the competition. This second panel shows no statistically significant differences across contests in the control and treatment groups both in the number of players who submitted at least one solution and in the composition of their participants. Invited players Variable
Entrants
Control
Treatment
t-stat
5
5
-
3.227
3.545
1.151
participated_past
0.236
0.191
-0.954
0.202
0.244
0.507
software_code
0.964
0.973
0.403
0.968
0.987
0.909
stat_tools
0.882
0.836
-0.883
0.887
0.934
0.822
mach_learning
0.536
0.518
-0.276
0.615
0.607
-0.082
regression
0.736
0.709
-0.487
0.808
0.747
-0.77
Participants
Control Treatment
t-stat
Table 9: Average covariates at the contest level: Randomization results Notes: An observation is a contest. ‘Invited players’ is the pool of players who were invited to enter a competition, and ‘Entrants’ is the pool of players who submitted at least one submission during the competition. Treated contests are the contests where a leaderboard was displayed. All variables are defined at the contest level as follows: ‘participants’ is the number of players in a competition, ‘participated_past’ is the share of players who have participated in a prediction contest in the past, ‘software_code’ is the share of players who know how to use a statistical software, ‘stat_tools’ is the share of players who have statistical skills, ‘mach_learning’ is the share of players who have machine learning skills, and ‘regression’ is the share of players who have regression analysis skills.
37
6.2
Experimental Results
Table 10 shows the main results of the experiment, which are in line with the main finding of our structural estimates: Participation and outcomes improve when a realtime leaderboard is displayed. Columns 1, 2 and 3 in Table 10 show that outcomes are worse in competitions that do not display a leaderboard, relative to competitions with leaderboard. Column 1 shows that the maximum score was on average 0.057 lower in competitions without a leaderboard, a magnitude that is 29.68 percent of the average maximum score across all contests. This result is robust to controlling for the number of entrants in each competition (column 2) and after controlling for player covariates (column 3). (1)
(2)
(3)
Maximum score No leaderboard
(4)
(5)
(6)
Number of submissions
-0.057∗∗
-0.045∗∗
-0.050∗∗
-31.636∗∗∗
-26.767∗∗∗
-26.758∗∗∗
(0.022)
(0.019)
(0.020)
(7.073)
(5.250)
(5.973)
[0.014]
[0.03]
[0.042]
[0.000]
[0.000]
[0.000]
0.038∗∗∗
0.038∗∗∗
15.302∗∗∗
15.843∗∗∗
(0.013)
(0.014)
(3.050)
(3.244)
Entrants Controls
No
No
Yes
No
No
Yes
Observations
44
44
44
44
44
44
0.135
0.332
0.398
0.323
0.565
0.610
0.192
0.192
0.192
23.636
23.636
23.636
R
2
Dep. variable mean
Table 10: The effect of the leaderboard on contest outcomes: Experimental results Notes: Robust standard errors in parentheses. p-values for Monte Carlo permutation tests to allow for arbitrary randomization procedures in squared brackets (based on 1,000 replications). ∗∗
p < 0.05,
∗∗∗
∗
p < 0.1,
p < 0.01. An observation is a contest. The definition of the variables is as follows:
‘No leaderboard’ is an indicator for contests without a leaderboard and ‘Entrants’ is the number of entrants. Controls include the share of participants in the contest who have i) participated in a prediction contest in the past, ii) know how to use a statistical software, iii) have statistical skills, iv) have machine learning skills, and v) have regression analysis skills.
Columns 4, 5, and 6 in Table 10 show that the number of submissions is on average lower in competitions without a leaderboard versus competitions with a leaderboard. 38
Column 4 shows that competitions without a leaderboard received an average of 31.636 fewer submissions than competitions with a leaderboard, which is a large effect relative to the average number of submissions across all contests. This result is also robust to controlling for the number of entrants in each competition (column 5) and player covariates (column 6).
7
Discussion
We contribute to the literature of competition design by investigating how different elements of design affect players’ incentives in a dynamic environment. We use field data from Kaggle.com to build and estimate a structural model used for counterfactual analysis, and we complement this analysis with experimental evidence from competitions also run on Kaggle.com. To build a computationally tractable model, we rely on various simplifications, most of them motivated by empirical evidence. We assume that players enter the contest at some exogenous time and, after observing the cost of building a submission, they decide to exit or to build a new submission. If they exit, they cannot reenter the competition. If they play, they must wait a random time until the next opportunity to play. Different contest designs affect players’ expectations about the benefit of continuing playing, hence distorting players’ participation incentives. Our counterfactual simulations show that a public leadearboard improves performance in a competition: the total number of submissions, the 99 percentile of the distribution of scores, and the maximum score increase, relative to the counterfactual competition without a leaderboard. Even more, when a real-time leaderboard increases participation, it is because it encourages high-type players to stay in the competition. The intuition behind this result is that the increase in participation created by favorable histories (those with a low maximum score) outweighs the decrease in participation created by unfavorable histories (those with a high maximum score). This asymmetric effect is caused by irreversible exit decisions, because players that stay in the competition face multiple future opportunities to play.
39
To complement our analysis, we ran an experiment in which we randomly allocated participants into competitions without a leaderboard (control group) or with a leaderboard (treatment group), to measure the impact of a public leaderboard on participation and contest outcomes. The experiment is independent from our modeling assumptions, so it serves the double purpose of providing experimental evidence for the question of contest design and contest outcomes and testing the predictions of our model. Our experimental findings are consistent with our counterfactual simulations results: the number of submissions and the maximum score increase in competitions that display a public leaderboard.
8
References
Aoyagi, Masaki (2010) “Information feedback in a dynamic tournament,” Games and Economic Behavior, Vol. 70, pp. 242–260. Arcidiacono, Peter, Patrick Bayer, Jason R Blevins, and Paul B Ellickson (2016) “Estimation of dynamic discrete choice models in continuous time with an application to retail competition,” The Review of Economic Studies, Vol. 83, pp. 889–931. Athanasopoulos, George and Rob J Hyndman (2011) “The value of feedback in forecasting competitions,” International Journal of Forecasting, Vol. 27, pp. 845–849. Azmat, Ghazala and Marc Möller (2009) “Competition among contests,” The RAND Journal of Economics, Vol. 40, pp. 743–768. Bajari, Patrick and Ali Hortacsu (2003) “The winner’s curse, reserve prices, and endogenous entry: Empirical insights from eBay auctions,” RAND Journal of Economics, pp. 329–355. Balafoutas, Loukas, E Glenn Dutcher, Florian Lindner, and Dmitry Ryvkin (2017) “The Optimal Allocation of Prizes in Tournaments of Heterogeneous Agents,” Economic Inquiry, Vol. 55, pp. 461–478. Benkert, Jean-Michel and Igor Letina (2016) “Designing dynamic research contests,” University of Zurich, Department of Economics, Working Paper.
40
Bhattacharya, Vivek (2016) “An Empirical Model of R&D Procurement Contests: An Analysis of the DOD SBIR Program,” MIT, Department of Economics, Working Paper. Bimpikis, Kostas, Shayan Ehsani, and Mohamed Mostagir (2014) “Designing dynamic contests,” Working paper, Stanford University. Boudreau, Kevin J, Nicola Lacetera, and Karim R Lakhani (2011) “Incentives and problem uncertainty in innovation contests: An empirical analysis,” Management Science, Vol. 57, pp. 843–863. Boudreau, Kevin J, Karim R Lakhani, and Michael Menietti (2016) “Performance responses to competition across skill levels in rank-order tournaments: field evidence and implications for tournament design,” The RAND Journal of Economics, Vol. 47, pp. 140–165. Brunt, Liam, Josh Lerner, and Tom Nicholas (2012) “Inducement prizes and innovation,” The Journal of Industrial Economics, Vol. 60, pp. 657–696. Che, Yeon-Koo and Ian Gale (2003) “Optimal design of research contests,” The American Economic Review, Vol. 93, pp. 646–671. Chesbrough, Henry, Wim Vanhaverbeke, and Joel West (2006) Open innovation: Researching a new paradigm: Oxford University Press on Demand. Clark, Derek J and Tore Nilssen (2013) “Learning by doing in contests,” Public Choice, Vol. 156, pp. 329–343. Cohen, Chen, Todd R Kaplan, and Aner Sela (2008) “Optimal rewards in contests,” The RAND Journal of Economics, Vol. 39, pp. 434–451. Ding, Wei and Elmar G Wolfstetter (2011) “Prizes and lemons: procurement of innovation under imperfect commitment,” The RAND Journal of Economics, Vol. 42, pp. 664–680. Ederer, Florian (2010) “Feedback and motivation in dynamic tournaments,” Journal of Economics & Management Strategy, Vol. 19, pp. 733–769.
41
Fullerton, Richard L and R Preston McAfee (1999) “Auctioning entry into tournaments,” Journal of Political Economy, Vol. 107, pp. 573–605. Genakos, Christos and Mario Pagliero (2012) “Interim rank, risk taking, and performance in dynamic tournaments,” Journal of Political Economy, Vol. 120, pp. 782– 813. Goltsman, Maria and Arijit Mukherjee (2011) “Interim performance feedback in multistage tournaments: The optimality of partial disclosure,” Journal of Labor Economics, Vol. 29, pp. 229–265. Gross, Daniel P (2015) “Creativity Under Fire: The Effects of Competition on Creative Production,” Available at SSRN 2520123. (2017) “Performance feedback in competitive product development,” The RAND Journal of Economics, Vol. 48, pp. 438–466. Halac, Marina, Navin Kartik, and Qingmin Liu (2014) “Contests for experimentation,” Journal of Political Economy, Forthcoming. Hendricks, Kenneth and Robert H Porter (1988) “An empirical study of an auction with asymmetric information,” The American Economic Review, pp. 865–883. Hinnosaar, Toomas (2017) “Dynamic common-value contests.” Huang, Yan, Param Vir Singh, and Kannan Srinivasan (2014) “Crowdsourcing new product ideas under consumer learning,” Management science, Vol. 60, pp. 2138– 2159. Jeppesen, Lars Bo and Karim R Lakhani (2010) “Marginality and problem-solving effectiveness in broadcast search,” Organization science, Vol. 21, pp. 1016–1033. Kireyev, Pavel (2016) “Markets for Ideas: Prize Structure, Entry Limits, and the Design of Ideation Contests,” HBS, Working Paper. Klein, Arnd Heinrich and Armin Schmutzler (2016) “Optimal effort incentives in dynamic tournaments,” Games and Economic Behavior.
42
Krasnokutskaya, Elena and Katja Seim (2011) “Bid preference programs and participation in highway procurement auctions,” The American Economic Review, Vol. 101, pp. 2653–2686. Lakhani, Karim R, Kevin J Boudreau, Po-Ru Loh, Lars Backstrom, Carliss Baldwin, Eric Lonstein, Mike Lydon, Alan MacCormack, Ramy A Arnaout, and Eva C Guinan (2013) “Prize-based contests can provide solutions to computational biology problems,” Nature biotechnology, Vol. 31, pp. 108–111. Landers, Richard N, Kristina N Bauer, and Rachel C Callan (2017) “Gamification of task performance with leaderboards: A goal setting experiment,” Computers in Human Behavior, Vol. 71, pp. 508–515. Landers, Richard N and Amy K Landers (2014) “An empirical test of the theory of gamified learning: The effect of leaderboards on time-on-task and academic performance,” Simulation & Gaming, Vol. 45, pp. 769–785. Lazear, Edward P and Sherwin Rosen (1979) “Rank-order tournaments as optimum labor contracts.” Lerner, Josh and Jean Tirole (2002) “Some simple economics of open source,” The journal of industrial economics, Vol. 50, pp. 197–234. Levin, Dan and James L Smith (1994) “Equilibrium in auctions with entry,” The American Economic Review, pp. 585–599. Li, Tong, Isabelle Perrigne, and Quang Vuong (2002) “Structural estimation of the affiliated private value auction model,” RAND Journal of Economics, pp. 171–193. Megidish, Reut and Aner Sela (2013) “Allocation of Prizes in Contests with Participation Constraints,” Journal of Economics & Management Strategy, Vol. 22, pp. 713–727. Moldovanu, Benny and Aner Sela (2001) “The optimal allocation of prizes in contests,” American Economic Review, pp. 542–558. (2006) “Contest architecture,” Journal of Economic Theory, Vol. 126, pp. 70– 96. 43
Moldovanu, Benny, Aner Sela, and Xianwen Shi (2007) “Contests for status,” Journal of Political Economy, Vol. 115, pp. 338–363. Olszewski, Wojciech and Ron Siegel (2015) “Effort-Maximizing Contests.” Sisak, Dana (2009) “Multiple-prize contests–the optimal allocation of prizes,” Journal of Economic Surveys, Vol. 23, pp. 82–114. Strack, Philipp (2016) “Risk-Taking in Contests: The Impact of Fund-Manager Compensation on Investor Welfare.” Takahashi, Yuya (2015) “Estimating a war of attrition: The case of the U.S. movie theater industry,” The American Economic Review, Vol. 105, pp. 2204–2241. Taylor, Curtis R (1995) “Digging for golden carrots: an analysis of research tournaments,” The American Economic Review, pp. 872–890. Terwiesch, Christian and Yi Xu (2008) “Innovation contests, open innovation, and multiagent problem solving,” Management science, Vol. 54, pp. 1529–1543. Xiao, Jun (2016) “Asymmetric all-pay contests with heterogeneous prizes,” Journal of Economic Theory, Vol. 163, pp. 178–221. Zivin, Joshua S Graff and Elizabeth Lyons (2018) “Can Innovators be Created? Experimental Evidence from an Innovation Contest,”Technical report, National Bureau of Economic Research.
44
Online Appendix: Not For Publication Dynamic Tournament Design: An Application to Prediction Contests Jorge Lemus and Guillermo Marshall
i
A
Additional Tables and Figures Contest
Name of the
Number
Competition
Total
1
Predict Grant Applications
5,000
2
RTA Freeway Travel Time Prediction
10,000
3
Deloitte/FIDE Chess Rating Challenge
4
Heritage Health Prize
5
Number of
Teams
Start Date
Deadline
2,800
204
12/13/2010
02/20/2011
2,958
348
11/23/2010
02/13/2011
10,000
1,428
167
02/07/2011
05/04/2011
500,000
23,421
1,221
04/04/2011
04/04/2013
Wikipedia’s Participation Challenge
10,000
995
88
06/28/2011
09/20/2011
6
Allstate Claim Prediction Challenge
10,000
1,278
102
07/13/2011
10/12/2011
7
dunnhumby’s Shopper Challenge
10,000
1,872
277
07/29/2011
09/30/2011
8
Give Me Some Credit
5,000
7,658
920
09/19/2011
12/15/2011 01/05/2012
Reward Submissions
9
Don’t Get Kicked!
10,000
7,167
570
09/30/2011
10
Algorithmic Trading Challenge
10,000
1,169
95
11/11/2011
01/08/2012
11
What Do You Know?
5,000
1,616
228
11/18/2011
02/29/2012
12
Photo Quality Prediction
13
Benchmark Bond Trade Price Challenge
14 15
5,000
1,315
194
10/29/2011
11/20/2011
17,500
2,487
248
01/27/2012
04/30/2012
KDD Cup 2012, Track 1
8,000
13,076
657
02/20/2012
06/01/2012
KDD Cup 2012, Track 2
8,000
5,276
163
02/20/2012
06/01/2012
16
Predicting a Biological Response
20,000
7,668
647
03/16/2012
06/15/2012
17
Online Product Sales
22,500
3,532
346
05/04/2012
07/03/2012
18
EMI Music Data Science Hackathon - July 21st - 24 hours
10,000
1,282
132
07/21/2012
07/22/2012
19
Belkin Energy Disaggregation Competition
25,000
1,399
160
07/02/2013
10/30/2013
20
Merck Molecular Activity Challenge
40,000
2,979
236
08/16/2012
10/16/2012
21
U.S. Census Return Rate Challenge
25,000
2,385
231
08/31/2012
11/11/2012
22
Amazon.com - Employee Access Challenge
5,000
16,872
1,687
05/29/2013
07/31/2013
23
The Marinexplore and Cornell University Whale Detection Challenge
10,000
3,282
244
02/08/2013
04/08/2013
24
See Click Predict Fix - Hackathon
1,000
1,001
79
09/28/2013
09/29/2013
25
KDD Cup 2013 - Author Disambiguation Challenge (Track 2)
7,500
2,216
235
04/19/2013
06/12/2013
26
Influencers in Social Networks
2,350
2,004
129
04/13/2013
04/14/2013
27
Personalize Expedia Hotel Searches - ICDM 2013
25,000
3,409
331
09/03/2013
11/04/2013
28
StumbleUpon Evergreen Classification Challenge
5,000
7,123
593
08/16/2013
10/31/2013
29
Personalized Web Search Challenge
9,000
3,021
177
10/11/2013
01/10/2014
30
See Click Predict Fix
4,000
5,314
517
09/29/2013
11/27/2013
31
Allstate Purchase Prediction Challenge
50,000
24,526
1,568
02/18/2014
05/19/2014
32
Higgs Boson Machine Learning Challenge
13,000
35,772
1,785
05/12/2014
09/15/2014
33
Acquire Valued Shoppers Challenge
30,000
25,138
952
04/10/2014
07/14/2014
34
The Hunt for Prohibited Content
25,000
4,992
285
06/24/2014
08/31/2014
35
Liberty Mutual Group - Fire Peril Loss Cost
25,000
14,751
634
07/08/2014
09/02/2014
36
Tradeshift Text Classification
5,000
4,632
296
10/02/2014
11/10/2014
37
Driver Telematics Analysis
30,000
36,065
1,528
12/15/2014
03/16/2015
38
Diabetic Retinopathy Detection
100,000
7,002
661
02/17/2015
07/27/2015
39
Click-Through Rate Prediction
15,000
27,202
1,417
11/18/2014
02/09/2015
40
Otto Group Product Classification Challenge
10,000
34,300
2,734
03/17/2015
05/18/2015
41
Crowdflower Search Results Relevance
20,000
23,237
1,326
05/11/2015
07/06/2015
42
Avito Context Ad Clicks
20,000
5,317
360
06/02/2015
07/28/2015
43
ICDM 2015: Drawbridge Cross-Device Connections
10,000
2,355
340
06/01/2015
08/24/2015
44
Caterpillar Tube Pricing
30,000
23,834
1,187
06/29/2015
08/31/2015
45
Liberty Mutual Group: Property Inspection Prediction
25,000
40,594
2,054
07/06/2015
08/28/2015
46
Coupon Purchase Prediction
50,000
18,477
1,076
07/16/2015
09/30/2015
47
Springleaf Marketing Response
100,000
34,861
1,914
08/14/2015
10/19/2015
48
Truly Native?
10,000
3,222
274
08/06/2015
10/14/2015
49
Rossmann Store Sales
35,000
58,915
2,861
09/30/2015
12/14/2015
50
Homesite Quote Conversion
20,000
28,571
1,334
11/09/2015
02/08/2016
51
Prudential Life Insurance Assessment
30,000
42,336
2,452
11/23/2015
02/15/2016
52
BNP Paribas Cardif Claims Management
30,000
48,442
2,702
02/03/2016
04/18/2016
53
Home Depot Product Search Relevance
40,000
32,937
1,935
01/18/2016
04/25/2016
54
Santander Customer Satisfaction
60,000
93,031
5,117
03/02/2016
05/02/2016
55
Expedia Hotel Recommendations
25,000
22,709
1,974
04/15/2016
06/10/2016
56
Avito Duplicate Ads Detection
20,000
8,134
548
05/06/2016
07/11/2016
57
Draper Satellite Image Chronology
75,000
2,734
401
04/29/2016
06/27/2016
Table A.1: Summary of the Competitions in the Data (Full List) Note: The table only considers submissions that received a score. The total reward is measured in US dollars at the moment of the competition.
ii
Public Ranking
Cumulative
of Winner
Frequency
Probability
Probability
1
29
50.88
50.88
2
13
22.81
73.68
3
3
5.26
78.95
4
6
8.77
87.72
5
1
1.75
89.47
6
2
3.51
92.98
11
3
5.26
98.25
54
1
1.75
100.00
Table A.2: Public Leaderboard Ranking of Competition Winners Note: An observation is a contest.
Number
Overall
Competitive
of Competitions
Frequency
Probability
Frequency
Probability
1
22,034
71.26
3,556
57.78
2
4,350
14.08
1,024
16.64
3
1,835
5.70
510
8.29
4
908
2.82
275
4.47
5 or more
1,976
6.14
789
12.82
Table A.3: Number of Competitions by User Note: An observation is a team member.
iii
unimelb RTA
µ
SE
λ
SE
σ
SE
ˆ log L(δ)/N
N
2.2518
0.2883
57.818
1.5502
0.0095
0.0018
-2.7902
1391 1502
1.58
0.1621
55.4337
1.4303
0.0018
0.0004
-2.7296
ChessRatings2
2.7814
0.6219
51.1858
2.2006
0.0014
0.0006
-2.753
541
hhp
2.585
0.1701
191.9182
1.6088
0.0013
0.0001
-4.123
14231
wikichallenge
2.3922
0.5349
63.9345
2.7114
0.0014
0.0006
-2.9593
556
ClaimPredictionChallenge
1.9972
0.4844
75.7161
3.1278
0.0008
0.0006
-3.1885
586
dunnhumbychallenge
2.1786
0.3112
43.0186
1.4799
0.0041
0.001
-2.4817
845
GiveMeSomeCredit
1.7886
0.115
62.0622
0.9749
0.0177
0.0015
-2.7498
4053
DontGetKicked
1.9214
0.1912
88.2444
1.5446
0.0022
0.0004
-3.2978
3264
AlgorithmicTradingChallenge
3.9965
1.0319
61.9513
2.5994
0.0015
0.0007
-2.9833
568
WhatDoYouKnow
2.5017
0.4906
59.9231
2.2649
0.003
0.0009
-2.8836
700
PhotoQualityPrediction
2.2782
0.3648
26.8969
1.1407
0.0029
0.0015
-2.0296
556
benchmark-bond-trade-price-challenge
3.062
0.5103
61.0601
1.7959
0.0039
0.0011
-2.9404
1156
kddcup2012-track1
2.9583
0.2392
100.2066
1.1482
0.0019
0.0002
-3.4658
7617
kddcup2012-track2
2.3511
0.3865
131.4569
2.5196
0.0003
0.0002
-3.8078
2722
bioresponse
2.0278
0.1883
79.7233
1.2755
0.002
0.0003
-3.2068
3907
online-sales
2.2814
0.2727
46.4402
1.1694
0.0021
0.0006
-2.6393
1577
MusicHackathon
3.8714
0.5974
18.0368
0.7602
0.001
0.0007
-1.6941
563
belkin-energy-disaggregation-competition
2.1421
0.5941
127.443
4.7397
0.0005
0.0003
-3.7337
723
MerckActivity
2.4988
0.34
53.0883
1.3046
0.001
0.0004
-2.8338
1656
us-census-challenge
2.1093
0.2723
54.9508
1.5675
0.0108
0.0014
-2.6729
1229
amazon-employee-access-challenge
2.5534
0.1252
49.1933
0.5004
0.0091
0.0007
-2.6679
9663
whale-detection-challenge
2.143
0.2916
56.0026
1.4322
0.0004
0.0002
-2.8826
1529
the-seeclickfix-311-challenge
2.5648
0.5884
40.8782
1.9122
0.0019
0.0016
-2.5319
457
kdd-cup-2013-author-disambiguation
2.6054
0.5318
70.4564
2.2082
0.0005
0.0004
-3.1417
1018
predict-who-is-more-influential-in-a-social-network
2.8405
0.4608
45.0543
1.4734
0.0022
0.0006
-2.5965
935
expedia-personalized-sort
2.2804
0.2944
46.9099
1.2169
0.0016
0.0005
-2.6658
1486
stumbleupon
2.6773
0.2078
52.0039
0.7794
0.0024
0.0003
-2.7523
4452
yandex-personalized-web-search-challenge
1.7275
0.2766
120.5711
2.92
0.0006
0.0002
-3.6677
1705
see-click-predict-fix
1.8024
0.1889
71.1992
1.3575
0.0018
0.0006
-3.1106
2751
allstate-purchase-prediction-challenge
1.9824
0.1274
125.6546
1.1708
0.0045
0.0005
-3.6895
11519
higgs-boson
2.3698
0.114
122.1003
0.8177
0.0013
0.0001
-3.6756
22298 16500
acquire-valued-shoppers-challenge
2.0772
0.1316
165.2723
1.2866
0.0011
0.0001
-3.9934
avito-prohibited-content
2.5922
0.3953
106.7729
2.0667
0.0005
0.0002
-3.5868
2669
liberty-mutual-fire-peril
3.3344
0.3194
122.6742
1.3403
0.0008
0.0002
-3.7291
8377
tradeshift-text-classification
3.0683
0.3897
63.2299
1.2398
0.0005
0.0002
-3.0464
2601
axa-driver-telematics-analysis
2.4383
0.1405
127.6736
0.8938
0.0009
0.0001
-3.7595
20405
diabetic-retinopathy-detection
1.7712
0.2073
107.1837
2.0274
0.0025
0.0005
-3.5187
2795
avazu-ctr-prediction
3.2027
0.1979
109.1072
0.8988
0.0008
0.0001
-3.5696
14735
otto-group-product-classification-challenge
3.2732
0.1397
55.4236
0.3993
0.0011
0.0001
-2.8812
19269
crowdflower-search-relevance
2.1332
0.1093
83.3317
0.6605
0.0021
0.0002
-3.2883
15919
avito-context-ad-clicks
1.8952
0.2188
81.4574
1.5926
0.0006
0.0002
-3.2495
2616
icdm-2015-drawbridge-cross-device-connections
2.0972
0.3448
57.2772
1.7643
0.0007
0.0004
-2.8946
1054
caterpillar-tube-pricing
3.2674
0.1757
68.2579
0.5565
0.001
0.0001
-3.0972
15047
liberty-mutual-group-property-inspection-prediction
3.1055
0.1152
67.0227
0.4112
0.0062
0.0004
-3.0375
26573
coupon-purchase-prediction
2.0586
0.1093
73.1853
0.6539
0.0009
0.0001
-3.146
12526
springleaf-marketing-response
3.0405
0.1689
97.3279
0.7153
0.0008
0.0001
-3.4726
18513
dato-native
4.4816
0.7802
50.9714
1.4203
0.0004
0.0004
-2.8317
1288
rossmann-store-sales
2.8735
0.0926
89.6868
0.4478
0.0019
0.0001
-3.3334
40105
homesite-quote-conversion
2.4958
0.1548
128.5381
0.9678
0.0004
0.0001
-3.7599
17638
prudential-life-insurance-assessment
2.1598
0.0799
78.0817
0.4707
0.004
0.0003
-3.2007
27512
bnp-paribas-cardif-claims-management
2.6826
0.0903
63.8265
0.3564
0.0005
0.0001
-3.0155
32069
home-depot-product-search-relevance
2.4153
0.1501
124.6592
0.9777
0.0005
0.0001
-3.724
16258
santander-customer-satisfaction
2.3098
0.0563
75.1579
0.3048
0.0055
0.0002
-3.1614
60816
expedia-hotel-recommendations
2.2792
0.086
43.2208
0.3422
0.0009
0.0001
-2.564
15948
avito-duplicate-ads-detection
3.565
0.4807
109.0688
1.6756
0.0004
0.0002
-3.6251
4237
draper-satellite-image-chronology
3.0409
0.4177
53.1817
1.4239
0.0019
0.0004
-2.7768
1395
Table A.4: Maximum Likelihood Estimates of the Cost and Arrival Distributions. Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’
iv
Type 1
Type 2
θ1mean
θ1st.dev
κ1
θ2mean
θ2st.dev
κ2
ˆ log L(δ)/N
N
unimelb
0.9394
0.0347
0.629
0.4085
0.8215
0.371
0.556
1391
RTA
0.7156
0.3742 0.8024 0.5208
0.092
0.1976
-0.3613
1502
ChessRatings2
0.6239
0.108
1.0804
0.059
0.2
0.4568
541
hhp
0.7286
0.0449
0.2079 0.6986
0.2455
0.7921
0.2902
14231
wikichallenge
0.7942
0.0684 0.3946
ClaimPredictionChallenge
0.4819
0.5208
dunnhumbychallenge
0.8162
0.1949
0.55
1.1991
0.14
0.45
-0.0137
845
0.52
0.0118
0.5788
0.5031
0.1241
0.4212
2.1073
4053
DontGetKicked
0.7161
0.0223
0.0433 0.7871 0.1468 0.9567
0.5451
3264
AlgorithmicTradingChallenge
0.7443
0.1452
0.8667 0.9965 0.0554
0.1333
0.3943
568
WhatDoYouKnow
0.714
0.1427 0.8079 0.9667 0.0483 0.1921
0.4415
700
PhotoQualityPrediction
0.6217
0.0329 0.4996 0.5347 0.0599 0.5004
benchmark-bond-trade-price-challenge
0.922
0.2442 0.9966
kddcup2012-track1
0.7519
kddcup2012-track2
0.8841
bioresponse online-sales
GiveMeSomeCredit
0.8
0.1654 0.6054
0.6672
556
0.4878 1.8202 0.3327 0.5122
0.639
-1.1033
586
1.4271
556
1.0822
0.0238
0.0034
-0.0021
1156
0.1488 0.6644
0.649
0.686
0.3356
-0.3581
7617
0.1323
0.6051 0.2358
0.393
0.187
2722
0.8702
0.1237 0.5218 0.5964 0.1821 0.4782
0.2207
3907
0.9105
0.096
0.5648 0.6489 0.1883 0.4352
0.3832
1577
MusicHackathon
0.955
0.1167 0.7317 0.5854 0.2242 0.2683
0.1575
563
belkin-energy-disaggregation-competition
0.342
0.2098
0.6154
-1.0782
723
MerckActivity
0.7758
0.1061 0.4942 0.5442 0.1847 0.5058
0.3299
1656
us-census-challenge
0.9404
0.4693 0.9663
0.8504
0.0927
0.0337
-0.6457
1229
amazon-employee-access-challenge
0.7641
0.0278 0.4149
0.7207
0.2298
0.5851
0.8953
9663
whale-detection-challenge
0.7301
0.0421
0.533
0.6181
0.068
0.467
1.1603
1529
the-seeclickfix-311-challenge
0.8266
0.1922
0.898
0.9736 0.0078
0.102
0.5234
457
kdd-cup-2013-author-disambiguation
1.5685
0.1976 0.3426
0.7627 0.4839 0.6574
-0.6579
1018
0.607
0.3846 1.0475
1.3003
predict-who-is-more-influential-in-a-social-network
0.7024
0.032
0.3158 0.6342 0.1171 0.6842
0.8856
935
expedia-personalized-sort
0.9952
0.2893 0.9124 0.7866 0.0437 0.0876
-0.1245
1486
stumbleupon
0.6631
0.0491 0.3801 0.6356 0.1601 0.6199
0.7829
4452
yandex-personalized-web-search-challenge
0.7436
0.1186
0.5998 0.9164
-0.8077
1705
see-click-predict-fix
0.7686
0.0342 0.5132 0.6697 0.3057 0.4868
0.9367
2751
allstate-purchase-prediction-challenge
0.5064
0.0054
0.566
2.7542
11519
higgs-boson
0.6729
0.0261 0.1342
0.6621 0.1811 0.8658
0.7565
22298
0.0836 0.434
0.963 0.5029
0.0608
acquire-valued-shoppers-challenge
0.937
0.0719 0.1384
0.8149
0.3858
0.8616
-0.2913
16500
avito-prohibited-content
0.4962
0.0079 0.3469
0.4646
0.0341
0.6531
2.4984
2669
liberty-mutual-fire-peril
0.832
0.1419 0.6025
0.5377
0.1638
0.3975
0.2251
8377
tradeshift-text-classification
0.8762
0.1204
0.4814 0.6032 0.1887 0.5186
0.2155
2601
0.5144
-0.093
20405
axa-driver-telematics-analysis
1.1396
0.1601
diabetic-retinopathy-detection
1.0707
0.3831 0.8415 1.7968 0.0898
0.7538
0.2585 0.4856
avazu-ctr-prediction
0.5801
0.121
0.81
otto-group-product-classification-challenge
0.8747
0.1304
crowdflower-search-relevance
0.7604
0.1151
0.1585
-0.5556
2795
0.19
0.2011
14735
0.8578
0.2151
0.5775
0.662
0.2334 0.4225
0.538
0.5133 0.2216
0.2187
19269
0.462
0.2767
15919 2616
avito-context-ad-clicks
0.7291
0.1534
0.1906 1.0586 0.5576 0.8094
-0.7497
icdm-2015-drawbridge-cross-device-connections
0.9903
0.2175
0.7308 1.4471
0.2692
0.0418
1054
caterpillar-tube-pricing
0.6515
0.0455
0.4406 0.5762 0.1804 0.5594
0.8269
15047
0.6398
0.063
liberty-mutual-group-property-inspection-prediction
0.8021
0.0346 0.3201
0.6799
0.4689
26573
coupon-purchase-prediction
0.7612
0.0026 0.0811 0.8071 0.6182 0.9189
0.2638
-0.538
12526 18513
springleaf-marketing-response
0.8698
0.041
0.1599 0.7947
0.218
0.8401
0.3295
dato-native
0.6013
0.074
0.6684
0.0118 0.3316
1.4778
1288
rossmann-store-sales
0.7237
0.0766
0.4658 0.5519
0.2002
0.5342
0.5379
40105
homesite-quote-conversion
0.5532
0.0269 0.2442
0.6472
0.2076
0.7558
0.5174
17638
prudential-life-insurance-assessment
0.7489
0.0409 0.4956
0.6576
0.2136
0.5044
0.8763
27512
bnp-paribas-cardif-claims-management
0.5618
0.0905
0.686
0.3672
0.671
-0.1163
32069
0.329
0.6838
home-depot-product-search-relevance
1.1326
0.4183 0.5038 0.5873 0.1628 0.4962
-0.5663
16258
santander-customer-satisfaction
0.4623
0.0083 0.4581 0.4469 0.1328
0.5419
2.4771
60816
expedia-hotel-recommendations
0.6841
0.0063
0.5284 0.5805
0.377
0.4716
1.7034
15948
avito-duplicate-ads-detection
0.975
0.063
0.4236
0.2305 0.5764
0.636
4237
-1.4304
1395
draper-satellite-image-chronology
-0.0352
0.1573 0.1485
0.8644 1.07
1.1763
0.8515
Table A.5: EM Algorithm Estimates for the Type-specific Distribution of Scores, qθ . Note: The model is estimated separately each contest. θist.dev and θist.dev are the parameters in forst.dev v s−θi ˆ type i’s distribution of scores Qi (s) = Φ θst.dev . κi is the fraction of players of type i. log L(δ)/N i
is the value of the log-likelihood function evaluated at the EM estimates. Standard errors are available.
α
SE
SE
N
0.0561
2800
1.0021
0.8608
3129
0.1823 1.0176
0.1908
1563
1.0025
0.0737
25316
0.3367
0.9998
0.3497
1020
0.0787
0.9437
0.1388
1278
β
unimelb
-0.0115
0.0413 1.0233
RTA
-0.0023
0.8605
ChessRatings2
-0.0177
hhp
0.0022
0.0717
wikichallenge
0.0001
ClaimPredictionChallenge
0.0244
dunnhumbychallenge
0.0047
0.073
1.0153
0.1041
1872
GiveMeSomeCredit
0.0016
0.0609
0.9989
0.066
7730
DontGetKicked
0.0019
0.0611
1.0013
0.0716
7261
AlgorithmicTradingChallenge
0.0013
0.3561
0.9987
0.3581
1406
WhatDoYouKnow
0.006
0.1801
0.994
0.1893
1747
PhotoQualityPrediction
-0.0156
0.2402
1.0174
0.254
1356
kddcup2012-track1
-0.0149
0.0259
0.9655
0.0379
13076
kddcup2012-track2
-0.0057
0.0444
0.9968
0.0587
5276
bioresponse
0.0374
0.127
0.9635
0.1293
8837
online-sales
-0.0321
0.4462
1.0323
0.4524
3755
MusicHackathon
0.0003
0.8621
0.9997
0.8634
1319 1526
belkin-energy-disaggregation-competition
0.0128
0.2171 1.0157
0.2346
MerckActivity
-0.0074
0.0731
1.0083
0.0908
2979
us-census-challenge
0.0082
0.5591 0.9918
0.5601
2666
amazon-employee-access-challenge
0.0263
0.0306 0.9748
0.0368
16872
whale-detection-challenge
0.0039
0.0806
0.9961
0.0928
3293
the-seeclickfix-311-challenge
0.0121
0.3505
0.9877
0.3701
1051
kdd-cup-2013-author-disambiguation
-0.0003
0.2136 0.9996
0.2228
2304
predict-who-is-more-influential-in-a-social-network
0.0479
0.1612
0.9494
0.1741
2105
expedia-personalized-sort
0.0496
0.0908
0.9407
0.1078
3502
stumbleupon
0.0297
0.0736
0.9875
0.0815
7495
yandex-personalized-web-search-challenge
-0.0002
0.1021
1.0004
0.1074
3570
see-click-predict-fix
-0.0023
0.9291
1.0022
0.9321
5570
allstate-purchase-prediction-challenge
0.0005
0.0197
1.0043
0.0221
24526
higgs-boson
-0.0002
0.0183 1.0168
0.0224
35772
acquire-valued-shoppers-challenge
-0.0139
0.033
1.0105
0.043
25195
avito-prohibited-content
-0.0003
0.0521
0.9999
0.0574
4992
liberty-mutual-fire-peril
0.044
0.0449 0.9083
0.054
14812
tradeshift-text-classification
0.0004
0.4734
0.9996
0.4746
5648
axa-driver-telematics-analysis
-0.0019
0.0179 1.0019
0.0253
36065
diabetic-retinopathy-detection
-0.0082
0.0285
1.0106
0.0489
7002
avazu-ctr-prediction
0.0006
0.0683
0.9994
0.0688 31015
otto-group-product-classification-challenge
0.0002
0.0309
0.9997
0.0321
43525
crowdflower-search-relevance
0.0174
0.0362
0.986
0.0426
23244
avito-context-ad-clicks
-0.0001
0.2656
1.0001
0.2665
5949
icdm-2015-drawbridge-cross-device-connections
-0.0007
0.0326
1.0014
0.0579
2355
caterpillar-tube-pricing
-0.014
0.3919
1.014
0.393
26360
liberty-mutual-group-property-inspection-prediction
0.0061
0.0383 0.9961
0.0407
45875
coupon-purchase-prediction
0.033
0.0134
0.9022
0.0279
18477
springleaf-marketing-response
0.0092
0.052
0.9894
0.0553
39444
dato-native
0.008
0.0721
0.9931
0.0823
3223
homesite-quote-conversion
0.0028
0.0401
0.997
0.0417
36368 45490
prudential-life-insurance-assessment
0.0092
0.042
0.9933
0.0447
bnp-paribas-cardif-claims-management
0.0036
0.073
0.9964
0.0737
54516
home-depot-product-search-relevance
-0.0002
0.044
0.9999
0.047
35619
santander-customer-satisfaction
0.028
0.0267
0.972
0.0282
93559
expedia-hotel-recommendations
0.0006
0.019
0.9983
0.0269
22709
avito-duplicate-ads-detection
0.0016
0.0632
0.9985
0.0754
8153
draper-satellite-image-chronology
0.1325
0.0643
0.8211
0.1141
2734
Table A.6: Maximum Likelihood Estimates of the Distribution of Private Scores Conditional on Public Scores. Note: The Conditional Distribution is Assumed to be Given by pprivate = α + βppublic + , with Distributed According to a Double Exponential Distribution. The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’
vi
Contest
(1)
(2)
(3)
higgs-boson
hhp
avazu-ctr-prediction
Number of submissions per time interval (with leaderboard) -12.3462∗∗∗
-11.6532∗∗∗
-7.4416∗∗∗
(0.9047)
(0.8924)
(0.5777)
-30.9281∗∗∗
-24.4146∗∗∗
-16.4589∗∗∗
(1.2164)
(1.1899)
(0.7692)
Observations
20,000
20,000
20,000
R2
0.911
0.818
0.860
p-value F-test
0.0000
0.0000
0.0000
max(Deviation of max score relative to expected max score,0) min(Deviation of max score relative to expected max score,0)
Table A.7: The effect of the leaderboard on participation over time (selected contests) Note: Robust standard errors in parentheses.
∗
p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01. p-value F-test reports
the p-value for a test of equality of coefficients. Data was simulated using the model estimates. Each contest was simulated 1,000 times. An observation is a contest simulation–time interval combination, where each time interval captures 5 percent of the overall contest length. The expected max score was obtained by averaging the max score across all simulations for each time interval. The model estimates suggest that a leaderboard increases the number of submissions (relative to the case without leaderboard) in contests higgs-boson and hhp, while it would decrease the number of submissions in the contest avazu-ctr-prediction.
vii
Fraction teams 91 percent 93 percent 95 percent 97 percent 99 percent 100 percent
hhp
higgs-boson
amazon-employee-access
12400.04**
18582.17**
3612.16**
(12354.45,12445.63)
(18534.46,18629.89)
(3599.28,3625.05)
12499.77**
18754.15**
3641.93**
(12454.43,12545.11)
(18706.82,18801.49)
(3629.23,3654.64)
12605.75**
18934.70**
3670.92
(12560.56,12650.93)
(18888.13,18981.27)
(3658.01,3683.83)
12695.48**
19102.17**
3700.74
(12650.72,12740.25)
(19056.00,19148.35)
(3687.98,3713.49)
12790.69
19271.08
3720.23
(12746.16,12835.22)
(19225.18,19316.98)
(3707.60,3732.86)
12780.99
19294.11
3694.61
(12737.08,12824.90)
(19248.08,19340.14)
(3681.27,3707.95)
Table A.8: Number of submissions with limited participation, by fraction of teams Note: The contest outcomes are contest-level averages computed using 2, 000 simulations for each contest using our model estimates. 95-percent confidence interval in parenthesis. ** indicates whether the difference between the estimate with x percent of the number of teams is statistically different than the estimate with 100 percent of the teams (with a significance level of 5 percent).
viii
1.5 log mu 1 .5
3
3.5
4 4.5 log lambda
5
5.5
Figure A.1: Correlation between µ (arrival rate of new teams) and λ (arrival rate of new submissions) Note: The estimates of λ and µ are reported in Table A.4. The coefficient of correlation between λ and µ is -0.18.
ix
Density 0
0
.01
.2
Density .02
.4
.03
.6
.04
Panel A: No leaderboard
20
-8
-6 -4 -2 Pct. point change in max score
0
0
0
.01
.01
Density .02
Density .02 .03
.03
.04
.05
-60 -40 -20 0 Pct. point change in total submissions
.04
-80
-80 -60 -40 -20 0 20 Pct. point change in av. submissions per team
-80 -60 -40 -20 0 Pct. point change in av. submissions per high-type team
0
0
.05
1
Density 2
Density .1 .15
.2
3
.25
Panel B: No noise
10 20 Pct. point change in total submissions
30
0
.5 Pct. point change in max score
1
0
0
.05
.05
Density .1
Density .1 .15
.15
.2
.2
.25
0
0 10 20 30 Pct. point change in av. submissions per team
-10 0 10 20 30 Pct. point change in av. submissions per high-type team
Figure A.2: Equilibrium outcome comparison Note: An observation is a contest. The contest outcomes are contest-level averages computed using 2, 000 simulations for each contest using our model estimates.
x
Density 2 0
0
1
.1
Density .2
3
.3
4
.4
Panel C: Limited participation
-1
-.5 0 Pct. point change in max score
.5
.3 Density .2 .1 0
0
.1
Density .2
.3
.4
-10 -8 -6 -4 -2 Pct. point change in total submissions
.4
-12
-12 -10 -8 -6 -4 -2 Pct. point change in av. submissions per team
-20 -15 -10 -5 0 5 Pct. point change in av. submissions per high-type team
0
0
.5
5
Density 1
Density 10
1.5
15
2
Panel D: Single prize
1 2 Pct. point change in total submissions
3
-.1
0 .1 Pct. point change in max score
.2
Density 0
0
.5
.5
Density 1
1
1.5
2
1.5
0
0 1 2 3 Pct. point change in av. submissions per team
0 1 2 3 4 5 Pct. point change in av. submissions per high-type team
Figure A.2: Equilibrium outcome comparison Note: An observation is a contest. The contest outcomes are contest-level averages computed using 2, 000 simulations for each contest using our model estimates.
xi
B
Estimation Details
In this section, we provide an overview of the estimation procedure used for a subset of the primitives of the model.
B.1
Distribution of entry times – ML
We assume that the time at which a player enters a competition follows an exponential distribution with a contest-specific parameter, µ. Given the vector of entry times for the set of players I in a given contest, {ti }i∈I , we estimate µ by using the maximum likelihood estimator:
µ ˆ = arg max log L(µ) = arg max
1 log(µ) − µti = , t¯ i∈I
X
P where t¯ = i∈I ti /|I|.
B.2
Distribution of time between submissions – ML
We assume that the time between submissions follows an exponential distribution with a contest-specific parameter, λ. Given the vector of times between submissions for the set of players I in a given contest, {ti,m }m∈Mi ,i∈I , we estimate λ by using the maximum likelihood estimator: ˆ = arg max log L(λ) = arg max λ P P where t¯ = i∈I m∈Mi ti,m /|{ti,m }m∈Mi ,i∈I |.
xii
1 log(λ) − λti,m = , t¯ i∈I m∈Mi
X X
B.3
Type-specific distribution of scores – EM Algorithm
The probability that player i is of type θj ∈ Θ, given player i’s observed scores si = {si1 , . . . , siMi }, is (by Bayes’ identity) QMi
m=1 f (sim |θj ) , QMi k∈Θ κk m=1 f (sim |θk )
h(θj |si ) = P
κj
where f (·|θ j ) is the density of scores of players of type θj —which depends on a vector of parameters, θ j —and κj is the fraction of players of type θj . We assume f (·|θ j ) is the density function of a normal distribution with mean θjmean and standard deviation θjst.dev , where θ = {(θjmean , θjst.dev )}j=1,...,k and κ = {κj }j=1,...,k . The expectation for the EM algorithm is given by E(θ, κ|θ t , κt ) =
Mi X X X
h(θk |si , θ t , κt ) log(κk f (sim |θk )).
i θk ∈Θ m
Given (θ t , κt ), E(θ, κ|θ t , κt ) has a unique maximum given by (θ t+1 , κt+1 ). Given our assumptions, one can obtain an analytic solution for (θ t+1 , κt+1 ). The estimates of the model are obtained by iterating over the expectation and maximization steps until convergence of the estimates: ρ((θ t , κt ), (θ t+1 , κt+1 )) < ε, where ρ(·) is the Euclidean metric. We use a tolerance level of 1E-8.27
B.4
Conditional distribution of the private scores – ML
We assume that the relationship between private and public scores is given by pprivate = α + βppublic + ε, where ε is distributed according to a standard double exponential distribution, and α and β are contest-specific parameters. Given the pairs of scores for all M submissions in a contest, {(ppublic , pprivate )}m∈M , we estimate (α, β) by using the m m maximum likelihood estimator: ˆ = arg max log L(α, β) = arg max (ˆ α, β)
X
−εm + exp{−εm },
m∈M
where εm = pprivate − α − βppublic . m m 27
Alternatively, we can iterate until the log-likelihood converges.
xiii
C
Description of the Experiment
Description of the Competition A large restaurant chain owns restaurants located along major highways. The average revenue of a restaurant located at distance x from the highway is R(x). For simplicity, the variable distance to the highway is normalized to be in the interval [1,2]. The function R(x) is unknown. The goal of this competition is to predict the value of R(x) for several values of distances to the highway. Currently, the restaurant chain is located at 40 different locations. You will have access to {(xi , R(xi ))}30 i=1 , i.e., the distance to the highway and average revenue for 30 of these restaurants. Using these data, you must submit a prediction of average revenue for the remaining 10 restaurants, using their distances to the highway. You will find the necessary datasets in the Data tab. You can send up to 10 different submission each day until the end of the competition. The deadline of the competition is Sunday April 15th at 23:59:59. Evaluation We will compare the actual revenue and the revenue predictions for each value (xj )40 j=31 . The score will be calculated according to the Root Mean Square Deviation:
RMSD =
v uP u 40 (R(x 2 t j=31 ˆ j ) − R(xj ))
10
,
which is a measure of the distance between your predictions and the actual values R(x). The deadline of the competition is Sunday April 15th at 23:59:59. Note. Following the convention used throughout the paper, we multiplied the RSM D scores by minus one to make the winning score maximize private score in the competition. xiv
Description of the Data The goal of this competition is to predict the value of R(x) for a number of values of distance to the highway. The csv file “train” contains data on the distance to the highway and average revenue for 30 restaurants {(xi , R(xi ))}30 i=1 , You can use these data to create predictions of average revenue for the remaining 10 restaurants. For these 10 restaurants you only observe their distances to the highway in the csv file “test.” You can find an example of how your submission must look like in the csv file “sample_submission.” File descriptions:
• train.csv - the training set • test.csv - the test set • sample_submission.csv- an example of a submission file in the correct format Submission File: The submission file must be in csv format. For every distance to the highway of the 10 restaurants, your submission files should contain two columns: distance to the highway (x) and average revenue prediction (R). The file should contain a header and have the following format: x
R
1.047579
34.43375
1.926801
36.83077
etc. A correct submission must be a csv file with one row of headers and 10 rows of numerical data, as displayed above. To ensure that you are uploading your predictions in the correct format, we recommend that you upload your predictions by editing the sample submission file. There is a limit of 10 submissions per day. Figure A.3 shows a screenshot of the leaderboard in one of our student competitions hosted in Kaggle. xv
Figure A.3: Snapshot of the leaderboard in one of our competitions with leaderboard. Names are hidden for privacy reasons.
xvi