Dynamic Tournament Design: An Application to ...

Viewer
Transcript

Dynamic Tournament Design: An Application to Prediction Contests∗ Jorge Lemus†

Guillermo Marshall‡

July 14, 2017

Abstract Online competitions allow government agencies and private companies to procure innovative solutions from talented individuals. How does contest design shape incentives throughout the contest? Does a real-time leaderboard encourage players during the competition? To answer these questions, we build a tractable dynamic model of competition and estimate it using 55 prediction contests hosted by Kaggle.com. We evaluate players’ incentives under counterfactual competition designs, which modify information disclosure, allocation of prizes, and participation restrictions. We find that contest outcomes are most sensitive to information design: without a public leaderboard the total number of submissions increases but high-type players are discouraged, which worsens contest outcomes.

Keywords: Dynamic contest, contest design, prediction, Kaggle, big data

∗

We thank participants and discussants at the Conference on Internet Commerce and Innovation

(Northwestern), IIOC 2017, Rob Porter Conference (Northwestern), Second Triangle Microeconomics Conference (UNC), and University of Georgia for helpful comments and suggestions. † University of Illinois at Urbana-Champaign, Department of Economics; [email protected] ‡ University of Illinois at Urbana-Champaign, Department of Economics; [email protected]

1

1

Introduction

Online tournaments have become a valuable resource for government agencies and private companies to procure innovative solutions. For instance, U.S. government agencies have sponsored over 730 competitions that have awarded over $250 million in prizes to procure software, ideas, or designs through the website www.challenge.gov—e.g., DARPA sponsored a $500,000 competition to accurately predict cases of chikungunya virus.1 In the UK, the website www.datasciencechallenge.org was created to “drive innovation that will help to keep the UK safe and prosperous in the future.” Multiple platforms that match private companies’ problems and data scientists have also become popular.2 How are players’ incentives shaped by the design of a competition? Does a real-time public leaderboad encourage or discourage participation? Is a winner-takes-all competition better than one that allocates multiple prizes? Our main contribution is to provide a tractable empirical framework to study players’ incentives during the competition: we study a dynamic environment with heterogeneous players. Although the theory of contest design has advanced our knowledge on static settings, the amount of research on dynamic contest design with heterogeneous players is still limited. We shed light on dynamic contest design by estimating a tractable structural model using publicly available data on 55 prediction contests—contests to procure a model (algorithm) that delivers accurate out-of-sample predictions of a random variable. Prediction contests have been used to tackle a variety of problems including the diagnosis of diseases, the forecast of epidemic outbreaks, or the management of inventory under fluctuating demand. The advances in computer power and storage technology have permitted the accumulation of large datasets. However, the “Big Data” revolution requires the analysis of the data to extract useful insights;3 companies can procure this data using their in-house workers, hiring new workers, or sponsoring an online competition to attract participants with different skills and expertise. It has been documented that in some cases the best solution to a problem comes from industry ‘outsiders’ (Lakhani et al., 2013). Hence, 1

http://www.darpa.mil/news-events/2015-05-27 Examples include CrowdAnalytix, Tunedit, InnoCentive, Topcoder, HackerRank, and Kaggle. 3 http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal 2

2

part of the value of an online competition is in the procurement of a diverse set of solutions to solve a problem. . We use public information from Kaggle4 —a company primarily dedicated to hosting prediction contests for other companies. For instance, EMI sponsored a $10,000 contest to predict if listeners would like a new song; IEEE sponsored a $60,000 contest to diagnose schizophrenia; The National Data Science Bowl sponsored a $175,000 contest to identify plankton species from multiple images. Kaggle and the sponsoring companies have sponsored over 200 competitions that have awarded more than $5 million dollars in prizes. Each competition in Kaggle provides a training and a test dataset. An observation in the training dataset includes both an outcome variable and covariates. These data are used to develop a prediction algorithm. Unlike the training dataset, the test dataset only includes covariates. A valid submission must include the outcome variable prediction for each observation in the test dataset. To avoid overfitting, Kaggle partitions the test dataset in two subsets and does not inform participants which observations correspond to each subset. The first subset of the test dataset is used to generate a public score that is posted in real-time on a public leaderboard on the website. The second one is used to generate a private score that is never made public during the contest, and it is revealed only at the end. The winner of a competition is the player with the maximum private score. Thus, the public score, which is highly correlated with the private score, provides a noise signal about the final ranking of the players.5 Importantly, the evaluation criterion is objective and disclosed at the beginning of the contest.6 This is in contrast to other settings including ideation contests (Huang et al., 2014; Kireyev, 2016), innovation contests (Boudreau et al., 2016), design contests (Gross, 2015), or labor promotions (Lazear and Rosen, 1979; Baker et al., 1988), where evaluation (or some part of it) has a subjective component. 4 5

https://www.kaggle.com/ In our data, the correlation between public and private is 0.99, but only 76 percent of the contest

winners finish in the top 3 of the public leaderboard. 6 For example, in the ocean’s health competition, the winning predictions (pij ) minimized N M 1 XX logloss = − yij log(pij ). For more details, visit: N i=1 j=1 https://www.kaggle.com/c/datasciencebowl/details/evaluation.

3

Our paper contributes to the fairly recent empirical literature on contest design by presenting a tractable framework to study participation incentives in prediction contests. In the prediction contests that we analyze players can submit multiple solutions—which are evaluated in real-time—and players have access to a public leaderboard, which discloses the public score of each submission throughout the contest.7 This class of dynamic contests pose various economic questions and technical challenges. First, the partition of the test dataset makes participants uncertain of their actual position, because the public-score ranking only provides a noisy signal of their position. From a contest design perspective, we show that information design matters and the decision to disclose a public ranking may create an encouragement or discouragement effect. Second, on the technical side, these contests feature a large number of heterogeneous participants sending thousands of submissions. An analytic solution for a dynamic model with heterogeneous and fully-rational players is cumbersome. Even more, because participants are unsure of their position in the leaderboard, they need to keep track of the complete public history to compute the benefit of an extra submission: a state space that keeps track of the complete public history is computationally intractable. Our descriptive evidence indicates that there is a constant rate of entry of new players during the competition, each player sends multiple submissions, and players are heterogeneous in their ability to produce high scores. To capture these features in our model, we assume that players enter the contest at a random time, that they work on at most one submission at a time, and that a player’s type determines the distribution from which scores are drawn. After entering the contest, a player decides to make a new submission or to stop making them (i.e., to exit the contest). If a player decides to make a new submission, the player works on that submission (and only that submission) for a random amount of time. Immediately after the submission is completed, the submission is evaluated, and the public score of that submission is revealed.8 At this point, and after observing the public leaderboard, the player again decides to continue participating or to quit. To make this decision, the player compares the expected value of a new submission minus its cost versus the value of finishing the competition with her current set of submission. In computing the benefit of a new submission, a player 7 8

Other online competition websites, including www.datasciencechallenge.org, share these features. We do not model the choice of keeping a submission secret. As we explain in Section 2, the

evidence does not indicate that players are strategic in the timing of their submissions.

4

considers the chances of winning a prize at the end of the contest given the current public leaderboard, her type and current scores, and acknowledging that other players will make more submissions in the remaining time of the contest—more rival submissions will lower the player’s chance of winning a prize. To deal with the problem of a computationally-unmanageable state space, we assume that players are small—i.e., a player’s belief of how many rival submissions will arrive in the future is unaffected by the action of sending a new submission —and we also limit the amount of information that players believe is relevant for computing their chances of winning the contest. Under these assumptions, we obtain a tractable model that can be estimated and used in a series of counterfactual exercises to study how contest design shapes participation incentives and contest outcomes. Our results show that contest design matters for players’ incentives and there is no one-size-fits-all policy prescription. Our counterfactual simulations show that different contest designs produce heterogeneous responses for both incentives to make submissions and contest outcomes. We present our results in terms of how contest design impacts the total number of submissions, the number of submissions by high-type players, and the upper-tail of the score distribution. Given the heterogeneity in responses across contests, we summarize our results by averaging outcomes across the 55 contests. We find that manipulating the amount of information disclosed to participants has an economically significant effect both on the number and the quality of the submissions. If the contest designer hid the public leaderboard—that is, if the contest designer did not provide public information about contestants’ performance—the number of submissions would increase on average by 23 percent. However, without a public leaderboard high-type players send 16 percent fewer submissions, which shifts the upper-tail of the score distribution to the left, and worsens contests outcomes. Increasing the correlation between the private and public scores (providing a more precise signal about the players’ ranking) would decrease the number of submissions by all player types, with the total number of submissions decreasing on average by 3 percent. Because decreasing the correlation between private and public score also promotes overfitting, our results suggest that the contest designer is better-off using a noisy public leaderboard. Allocating a single prize rather than several prizes has a small and insignificant effect

5

on contest outcomes. This in in part due to the large number of players in each contest. The incentives for a player who is not among the top performers are not heavily affected by whether the contests allocates one or three prizes (keeping the total reward constant). Limiting the number of players on the one hand reduces the amount of competition, so players are more likely to win when they send a submission. On the other hand, limited participation also increases the “replacement effect” of the leader: faced with fewer competitors the leader may find it optimal to send fewer submissions. We find that when the number of participants is reduced by 10 percent in each contest, the total number of submissions declines by 8.7 percent and the maximum score also declines. In summary, these results suggest that information design has a first order effect on contest outcomes, whereas the allocation of prizes has only a small effect, and limiting participation only worsens contest outcomes. Finally, participation in these online competitions may be also driven by non-pecuniary motives. Contestants can develop new skills by working with new types of problems and by sharing their ideas with other researchers. Also, as in open-source software (Lerner and Tirole, 2002), performing well in a data-science competition signals the agent’s level of skill to potential future employers. Our estimates of the cost of making a submission also capture non-pecuniary incentives.

1.1

Related Literature

Contests are a widely used open innovation mechanism (Chesbrough et al., 2006), because they attract talented individuals with different backgrounds (Jeppesen and Lakhani, 2010; Lakhani et al., 2013). Diversity has been explicitly incorporated in the preference of a contest designer by Terwiesch and Xu (2008). The extensive literature on static contests has focused on design features such as the number and allocation of prizes and the number of participants. The role of information disclosure and feedback has also been explored in dynamic settings. The optimal allocation of prizes includes the work of Lazear and Rosen (1979), Taylor (1995), Moldovanu and Sela (2001), Che and Gale (2003), Cohen et al. (2008), Sisak (2009), Olszewski and Siegel (2015), Kireyev (2016), Xiao (2016), Strack (2016), and

6

Balafoutas et al. (2017). This literature, surveyed by Sisak (2009), has found that the shape of the cost function plays an important role in determining the optimal prize allocation in the provision of effort. Regarding the number of participants, Taylor (1995) and Fullerton and McAfee (1999), among others, show that restricting the number of competitors in winner-takes-all tournaments increases the equilibrium level of effort. Intuitively, with many competitors players have less incentives to exert costly effort because they have a smaller chance of winning. Regarding information design, Aoyagi (2010) explores a dynamic tournament and compares the provision of effort by agents under full disclosure of information (i.e., players observe their relative position) versus no information disclosure. Ederer (2010) adds private information to this setting whereas Klein and Schmutzler (2016) adds different forms of performance evaluation. Goltsman and Mukherjee (2011) studies when to disclose workers’ performance. Other recent articles studying dynamic contest design include Halac et al. (2014), Bimpikis et al. (2014), Benkert and Letina (2016), and Hinnosaar (2017). There are other design tools in addition to prizes, number of competitors and feedback. Megidish and Sela (2013) consider contests in which participants must exert some (exogenously given) minimal effort and show that awarding a single prize is dominated by giving each participant an equal share of prize when minimal level of effort is high. Moldovanu and Sela (2006) show that for a large number of competitors it is optimal to split them in two divisions. In the first round participants compete within each of these divisions, and in the second round the winners of each division compete to determine the final winner. Chawla et al. (2015) study optimal contest design when the value to participants’ of winning a contest is heterogeneous and private information. A growing empirical literature on contests includes Boudreau et al. (2011), Takahashi (2015), Boudreau et al. (2016) and Bhattacharya (2016). Gross (2015) studies how the number of participants changes the incentives for creating novel solutions versus marginally better ones. In a static environment, Kireyev (2016) uses an empirical model to study how elements of contest design affect participation and quality of outcomes. Huang et al. (2014) estimates a dynamic structural model to study individual behavior 7

and outcomes in a platform where individuals can contribute ideas, some of which are implemented. Finally, Gross (2017) studies how performance feedback impacts participation in design contests. Finally, our paper relates to two other strands of the literature. First, to the literature studying why people spend time and effort participating in contests with a small or non-existent monetary reward. Lerner and Tirole (2002) argue that good quality contributions are a signal of ability to potential employers. Alternatively, people may just enjoy participating in a contest because it gives them social status (Moldovanu et al., 2007). Second, it is possible to establish a parallel between a contest and an auction. While there is a well-established empirical literature on bidding behavior in auctions (Hendricks and Porter, 1988; Li et al., 2002; Bajari and Hortacsu, 2003), there are only a few papers analyzing dynamic behavior in contests. Our contribution is to be one of the first papers that empirically studies contest design in a dynamic setting with objective evaluations.

2

Background, Data, and Motivating Facts

2.1

Background and Data

We use publicly available information on contests hosted by Kaggle.9 The dataset contains several types of competitions, the majority of which are public competitions to solve commercial problems (featured competitions). The winners grant the sponsor a non-exclusive license to their submissions in exchange for a monetary award.10 These competitions represent about 75 percent of the competitions in the data. Research competitions (16 percent of the competitions in the data) are public competitions with the goal of providing a public good. Prizes for research competitions include monetary awards, conference invitations, and publications in peer-reviewed journals. Other contest categories include competitions for recruiting (0.32 percent of the competitions in 9 10

https://www.kaggle.com/kaggle/meta-kaggle Licensing terms vary among competitions. In most of the competitions we analyze, a winning

participant must grant the competition sponsor a royalty-free and perpetual license, for any purpose whatsoever, commercial or otherwise, without further approval by or payment to the participant.

8

our data), competitions for data visualization (2.25 percent of the competitions in the data), and competitions for fun (4.5 percent of the competitions in the data). We work with a subset of 55 featured competitions that offered a monetary prize of at least $1,000, received at least 1,000 submissions, used between 10 and 90 percent of the test dataset to generate public scores, and evaluated submissions according to a welldefined function. In these competitions, there was an average of 1,755 teams per contest, competing for rewards that ranged between $1,000 and $500,000 and averaged $30,642. On average, 15,169 submissions were made per contest. The characteristics of a partial list of competitions are summarized in Table 1 (see Table A.1 in the Online Appendix for the full list). All of these competitions, with the exception of the Heritage Health Prize, granted prizes to the top three scores.11 For example, in the Coupon Purchase Prediction competition, the three submissions with the highest scores were awarded $30,000, $15,000, and $5,000, respectively. Name of the

Total

Number of

Competition

Reward

Submissions

Heritage Health Prize

Teams

Start Date

Deadline

500,000

25,316

1,353

04/04/2011

04/04/2013

Allstate Purchase Prediction Challenge

50,000

24,526

1,568

02/18/2014

05/19/2014

Higgs Boson Machine Learning Challenge

13,000

35,772

1,785

05/12/2014

09/15/2014

Acquire Valued Shoppers Challenge

30,000

25,195

952

04/10/2014

07/14/2014

Liberty Mutual Group - Fire Peril Loss Cost

25,000

14,812

634

07/08/2014

09/02/2014

Driver Telematics Analysis

30,000

36,065

1,528

12/15/2014

03/16/2015

Crowdflower Search Results Relevance

20,000

23,244

1,326

05/11/2015

07/06/2015

Caterpillar Tube Pricing

30,000

26,360

1,323

06/29/2015

08/31/2015

Liberty Mutual Group: Property Inspection Prediction

25,000

45,875

2,236

07/06/2015

08/28/2015

Coupon Purchase Prediction

50,000

18,477

1,076

07/16/2015

09/30/2015

100,000

39,444

2,226

08/14/2015

10/19/2015

Homesite Quote Conversion

20,000

36,368

1,764

11/09/2015

02/08/2016

Prudential Life Insurance Assessment

30,000

45,490

2,619

11/23/2015

02/15/2016

Santander Customer Satisfaction

60,000

93,559

5,123

03/02/2016

05/02/2016

Expedia Hotel Recommendations

25,000

22,709

1,974

04/15/2016

06/10/2016

Springleaf Marketing Response

Table 1: Summary of the Competitions in the Data (Partial List) Note: The table only considers submissions that received a score. The total reward is measured in US dollars at the moment of the competition. See Table A.1 in the Online Appendix for the complete list of competitions.

As mentioned in the Introduction, the rules to determine the winner of a competition is 11

The following contests also granted a prize to the fourth position: Don’t Get Kicked!, Springleaf

Marketing Response, and KDD Cup 2013 - Author Disambiguation Challenge (Track 2).

9

an interesting feature of these prediction contests. There is a large dataset partitioned into three subsamples. The first subsample, the training dataset, provides both outcome variables and covariates and can be used by the contestants to develop their predictions. The second and third subsamples, the test dataset, are provided to the players as a single dataset and only include covariates (i.e., no outcome variables). Kaggle computes the public score and private score by evaluating a player’s submission in the second and third subsample, respectively. For example, in the Heritage Health Prize, the test data was divided into a 30 percent subsample to compute the public scores and a 70 percent subsample to compute the private scores. Kaggle does not disclose what part of the test data are used to compute the public and private scores. Kaggle displays, in real-time, a public leaderboard which contains the public score of every submission made at each point in time. Because these public scores are calculated by only using part of the test dataset (e.g., 30 percent in the Heritage Health Prize competition), the final standings may be different than the ones displayed in the public leaderboard. Although the correlation between public and private scores is very high in our sample (the coefficient of correlation is 0.99), the ranking in the public leaderboard and the private leaderboard may diverge. Hence, the public leaderboard provides informative yet noisy signals on the performance of all players throughout the contest. To illustrate this noise, consider the winner of each of the 55 competitions that we analyze—i.e., the owner of the submission with the highest private score (see Table A.2 in the Online Appendix). In 27 out of 55 competitions (49 percent), the winner of the contest was ranked number one in the final public leaderboard, and in 42 out of 55 competitions (76 percent) the winner was within the top three of the final public leaderboard. That is, players face uncertainty about their true standing in the competition.

2.2

Motivating Facts

We present a series of empirical facts that guide our modeling choices. For each contest, we observe information on all submissions including the time when they were made (time of submission), who made them (the identity of the team), and their score (both public and private scores). Using this information, we reconstruct both the public and 10

Panel A: Overall summary statistics N

Mean

St. Deviation

Min

Max

Public score

834,301

0.88

0.20

0.00

1.00

Private score

834,301

0.88

0.20

0.00

1.00

Time of submission

834,301

0.60

0.29

0.00

1.00

Time between submissions

783,362

0.02

0.05

0.00

1.00

Panel B: Team-level statistics N

Mean

St. Deviation

Min

Max

Number of submissions

50,937

16.38

29.13

1

671

Number of members

50,937

1.13

0.61

1

40

Table 2: Summary Statistics Note: An observation in Panel A is a submission; an observation is a team–competition combination in Panel B. Scores and time are rescaled to be contained in the unit interval. Time between submissions is the time between two consecutive submissions by the same team.

private leaderboard at every instant of time. To make meaningful comparisons across contests we henceforth normalize the contest length and the total prize to one, as well as the public and private scores.12 We start by examining some summary statistics. Table 2 (Panel A) shows that the (transformed) public and private score take an average value of 0.88, with a standard deviation of 0.2. The average time of submission is when 60 percent of the contest time has elapsed, and two consecutive submissions by the same team are spaced in time by an average of 2 percent of the contest duration. Panel B shows that teams on average send 16.38 submissions per contest, with some teams sending as many as several hundred. Lastly, 93 percent of the teams are composed of a single member, leading to an average team size of 1.13 members.13 Observation 1. Most teams are composed of a single member. 12 13

A vector of scores x is normalized to x ˆ where x ˆi = (xi − minj xj )/(maxj xj − minj xj ). Table A.3 in the Online Appendix shows that 72 percent of users participate in a single contest,

suggesting that most players are one-off participants.

11

Share of teams with 1 submission or more 0 1 .2 .4 .6 .8

.2 Fraction of submissions .1 .05 .15 0 0

.2 .4 .6 .8 Fraction of time completed

1

0

.2

.4 .6 .8 Fraction of time completed

1

kernel = epanechnikov, degree = 0, bandwidth = .12, pwidth = .18

(a)

(b)

Figure 1: Submissions and Entry of Teams Over Time Across all Competitions Note: An observation is a submission. Panel (a) shows a histogram of submission by elapsed time categories. Panel (b) shows a local polynomial regression of the number of teams with 1 or more submissions as a function of time.

Figure 1 shows the evolution of the number of submissions and teams over time. Panel A partitions all the submissions into time intervals based on their submission time. The figure shows that the number of submissions increases over time, with roughly 20 percent of them being submitted when 10 percent of the contest time remains, and only 6 percent of submissions occurring when 10 percent of the contest time has elapsed. Panel B shows the timing of entry of new teams into the competition. The figure shows that the rate of entry is roughly constant over time, with about 20 percent of teams making their first submission when 20 percent of the contest time remains. Observation 2. New teams enter at a constant rate throughout the contest. To understand whether teams become more or less productive as time elapses, we examine the time between submissions at the team level. Figure 2 (Panel A) illustrates the time between two consecutive submissions by the same team. On average, teams take 2 percent of the contest time to send two consecutive submissions. Panel B shows a local polynomial regression for the average time between submission as a function of time. The figure shows that the average time between submissions increases over time, suggesting that either teams are experimenting when they enter the contest or that finding new ideas becomes increasingly difficult over time. Combined, Figure 1 and 12

Time between submissions .015 .02 .005 .01

.8 Fraction .4 .6 .2 0 0

.2

.4 .6 .8 Time between submissions

1

0

.2

.4 .6 .8 Fraction of time completed

1

kernel = epanechnikov, degree = 0, bandwidth = .08, pwidth = .12

(a)

(b)

Figure 2: Time Between Submissions Note: An observation is a submission. Panel (a) shows the distribution of time between two submissions. Panel (b) shows a local polynomial regression of the time between submissions as a function of time.

Figure 2 suggest that the increase in submissions at the end of contests is not driven by teams making submissions at a faster pace, but simply because there are more active teams at the end of the contest and potentially more incentives to play. Observation 3. The rate of arrival of submissions increases with time. Figure 3 shows the joint distribution of public and private scores for all submissions. The coefficient of correlation between both scores is 0.99.14 Table 3 decomposes the variance of public scores. In column 1, we find that 70 percent of the variation in public score is between-team variation, suggesting that teams differ systematically in the scores that they achieve. In column 2, we allow for dummies that identify each team’s submissions as early or late (with respect to each team’s set of submissions). This distinction allows us to measure whether relatively late submissions achieved systematically greater scores than early ones. The table shows that there are within-team improvements over the course of the contest, although those improvements only explain an additional 1.9 percent of the overall public score variance. In the model, we will capture these cross14

Notice the cluster of points around (0.3,0.9). These scores have a low private score (around 0.3)

but a high public score. This is an example of overfitting: submissions that deliver a large public score but they are poor out-of-sample predictors (i.e., not robust submissions).

13

(1)

(2)

Public Score 0.0445∗∗∗

Second 25 percent of submissions

(0.0004) 0.0624∗∗∗

Third 25 percent of submissions

(0.0004) 0.0744∗∗∗

Last 25 percent of submissions

(0.0004) Competition × Team FE Observations R2

Yes

Yes

826,310

826,310

0.696

0.715

Table 3: Decomposing the Public Score Variance Note: Robust standard errors in parentheses.

∗

p < 0.1,

∗∗

p < 0.05,

∗∗∗

p < 0.01. An observation is

a submission. Second 25 percent of submissions is an indicator variable for whether a submission is within the second 25 percent of submissions of a team, where submissions are sorted by submission time. The other indicators are defined analogously.

team differences by allowing the teams to systematically differ in their ability to produce high scores. We leave within-team dynamics and learning for future research. Observation 4. Teams systematically differ in their ability to produce high scores. With respect to how the public leaderboard shapes behavior, Table 4 suggests that teams drop out of the competition when they start falling behind in the public score leaderboard. In the table, we compare how the timing of a team’s last submission varies with the score gap between the maximum public score and their best public score up to that moment. A one standard deviation increase in a team’s deviation from the maximum public score is associated with a team submitting its final submission (0.03× total contest time) to (0.08× total contest time) sooner. That is, teams that are lagging behind seem to suffer a discouragement effect and quit the competition. This exercise sheds light on how information disclosure may affect participation incentives throughout the competition. 14

Figure 3: Correlation Between Public and Private Scores Note: An observation is a submission. The private and public scores of each submission are normalized to range between 0 and 1.

(1)

(2)

Timing of last submission -0.0327∗∗∗

-0.0782∗∗∗

(0.0012)

(0.0018)

Competition FE

Yes

Yes

Weights

No

Yes

50,937

50,937

0.050

0.065

Deviation from max public score (standardized)

Observations R

2

Table 4: Timing of Last Submission as a Function of a Team’s Deviation from the Maximum Public Score Note: Robust standard errors in parentheses.

∗

p < 0.1,

∗∗

p < 0.05,

∗∗∗

p < 0.01. Timing of last

submission is measured relative to the total contest time (i.e., it ranges between 0 and 1). Deviation from max public score is defined as the competition wide maximum public score at the time of the submission minus the submitting team’s maximum public score at the time of the submission. We then standardize this variable using its competition-level standard deviation. Column 2 weighs observations by the total number of submissions made by each team.

15

(1)

(2)

Number of submissions

log(Number of submissions)

-0.6070∗∗

-0.0748∗∗∗

(0.2741)

(0.0247)

Yes

Yes

Observations

2,531

2,531

R2

0.755

0.764

After disruptive submission

Competition FE

Table 5: The Impact of Disruptive Submissions on Participation Note: Robust standard errors in parentheses.

∗

p < 0.1,

∗∗

p < 0.05,

∗∗∗

p < 0.01. Disruptive

submissions are defined as submissions that increase the maximum public score by at least 1 percent. Number of submissions is the number of submissions in time intervals of length 0.001. The regressions restrict the sample to periods that are within 0.05 time units of the disruptive submission. Both specifications control for time and time squared.

In Table 5, we also analyze how the public leaderboard shapes incentives to participate, i.e., how the rate of arrival of submissions changes when the maximum public score jumps by a significant margin. Whenever a submission increases the maximum public score by a sufficient amount (e.g., 1 percent for our analysis in Table 5), we call the submission disruptive (see Figure A.1 in the Online Appendix for an example). Only 0.05 and 0.04 percent of submissions increased the maximum public score by 0.5 and 1 percent, respectively. To measure how the rate of arrival of submission changes with a disruptive submission, we first partition time into intervals of length 0.001 and compute the number of submissions in each of these intervals. We then perform a comparison of the number of submissions before-and-after the arrival of the disruptive submission, restricting attention to periods that are within 0.05 time units of the disruptive submission. Table 5 shows that the number of submissions decreases immediately after the disruptive submission by an average of 7.5 percent. We take this as further evidence of both the discouragement effect and the public leaderboard behavioral effect. Observation 5. The public leaderboard shapes participation incentives. With respect to the timing of those submissions that disrupt the leaderboard, Figure 4 plots the timing of submissions that increased the maximum public score by at least 16

0.5 percent (Panel A) and 1 percent in (Panel B). In the figure we restrict attention to submissions that were made when at least 25 percent of the contest time had elapsed because score processes are noisier earlier in contests. The figure suggests that disruptive submissions arrive uniformly over time and the pattern suggests that teams are not strategic about the timing of submission for those solutions that they believe will drastically change the public leaderboard. This may be driven by the fact that teams only learn about the out-of-sample performance of a submission after Kaggle has evaluated the submission. That is, before making the submission, the teams can only evaluate the solution using the training data, which may not be informative about the solution’s out-of-sample performance. Observation 6. Submissions that disrupt the public leaderboard are submitted uni-

1

1

.8

.8

Cumulative Probability

Cumulative Probability

formly over time.

.6 .4 .2 0

.6 .4 .2 0

.2

.4

.6 Time of submission

.8

1

.2

(a) Increase greater than 0.5 percent

.4

.6 Time of submission

.8

1

(b) Increase greater than 1 percent

Figure 4: Timing of Drastic Changes in the Public Leaderboard’s Maximum Score (i.e., Disruptive Submissions): Cumulative Probability Functions Note: An observation is a submission that increases the maximum public score by at least x percent. The figure plots submissions that were made when at least 25 percent of the contest time had elapsed.

Our empirical model attempts to capture most of these six observations. However, three interesting features go beyond the scope of this paper and are left for future research. First, it is plausible that teams experiment (Figure 2) and get a better understanding of the problem over time, so they are able to improve their performance over time. Clark and Nilssen (2013), for example, present a theory of learning by doing in contests. 17

Although interesting, we do not incorporate learning by doing because Table 3 shows that between-team differences are more noteworthy than within team improvement. Second, we study each contest in isolation. In reality, players have a choice of which contests to participate in. Azmat and Möller (2009) shows that when players are choosing among multiple contests, the contest design (in particular, the allocation of prizes) interacts with this choice. Given that in our data most players participate in a single contest, we do not model the players’ selection of which contest to participate in. Although we assume exogenous entry because of data limitations, we acknowledge that endogenous entry could affect equilibrium outcomes and the optimal contest design, e.g., Levin and Smith (1994), Bajari and Hortacsu (2003), and Krasnokutskaya and Seim (2011). Third, we assume that players do not discriminate among their submissions and they automatically submit their solutions once they are ready. Ding and Wolfstetter (2011) show that players could withhold their best solutions and negotiate with the sponsor of the contest after the contest has ended. This selection introduces a bias on the quality of submitted solutions. In our setting, players benefit by sending a submission for two reasons. On the one hand, they receive a noisy signal about the performance of the submission. On the other hand, Table 5 shows that disruptive submissions discourage participation, so if players could choose when to send them they would send them as soon as possible. Although we cannot disregard strategic timing of submission, the fact the timing of disruptive submissions is roughly uniformly distributed over time (as shown in Figure 4) along with the fact that players benefit from sending submissions early indicate that players do not save their best submissions to be disclosed strategically towards the end of the contest.

3

Empirical Model

We consider a contest of length T = 1. At time t = 0, there is a fixed supply of N players of heterogeneous ability (Observation 4). Player heterogeneity is captured by the set of types Θ = {θ1 , ..., θp }.15 The distribution of types, κ(θk ) = Pr(θ = θk ), is 15

We disregard team behavior and treat each participant as a single player (Observation 1).

18

known by all players. The random time of entry for each player, τentry , is drawn from an exponential distribution of parameter µ > 0 (Observation 3).16 The empirical evidence does not strongly suggest that players strategically choose the time of entry, but rather that they enter at a random time, possibly related to idiosyncratic shocks such as when they find out about the contest.17 In our model, although players can send multiple submissions throughout the contest, they can work at most on one submission at a time. Working on a submission takes a random time τ distributed according to an exponential distribution of constant parameter λ.18 The cost of building a new submission, c is an independent draw from the distribution K(σ).19 The evaluation of a submission is based on the solution sent by a player and a test dataset d. Each pair (solution, d) maps uniquely into a score through a well-defined formula. Motivated by the evaluation system used in practice, we consider two test datasets, d1 and d2 , which define two scores: the public score, computed using the solution submitted by the player and test dataset d1 ; and the private score, computed using the solution submitted by the player and test dataset d2 . We model the score of a submission as a random variable. A player of type θ draws a public-private score pair (ppublic,θ , pprivate,θ ) from a joint distribution Hθ ([0, 1]2 ), as in Figure 3. Players know the joint distribution Hθ , but they do not observe the realization (ppublic,θ , pprivate,θ ). This pair of scores is private information of the contest designer. In the baseline case, the contest designer discloses, in real-time, only the public score ppublic,θ but not the private score pprivate,θ . The final ranking, however, is constructed with the private scores.20 At the end of the contest, players are ranked by their private scores and first j-th players in the ranking receive prizes of value VP1 ≥ ... ≥ VPj , with 16 17

Pj

i=1

VPi = 1.

When players enter the competition they get a free submission (Diamond, 1971). We assume exogenous entry because of data limitations. Endogenous entry could affect equilibrium

outcomes and the optimal design, e.g., Levin and Smith (1994), Bajari and Hortacsu (2003), and Krasnokutskaya and Seim (2011). We leave this extension for future research. 18 Observation 3, Figure 2, and Table 3 show some evidence of learning and experimentation over time. We leave these elements out of the current model for tractability. 19 With type-dependent distributions we encountered convergence issues due to identification. 20 Players are allowed to send multiple submissions—each player sends about 20 submissions on average. However, the final ranking is computed with at most two submissions selected by each player. About 50 percent of the players do not make a choice, in which case Kaggle picks the two submissions with the largest public scores. Out of the 50 percent remaining that indeed choose, 70 percent choose the two scores with the highest public score.

19

The contest designer releases, in real time, the public scores and the identity of the players that obtained those scores. The collection of pairs (identity, score) from the beginning of the contest until instant t conforms the public leaderboard, denoted by t Lt = {(identity, score)j }Jj=1 , where Jt is the total number of submissions up to time

t. Conditional on the terminal public history LT , player i is able to compute pfinal = `,i Pr(i’s private ranking is `|LT ), which is the probability of player i ranking in position ` in the private leaderboard at the end of the contest, conditional on the final public leaderboard LT . A model with fully-rational players is challenging for several reasons. First, it is possible that pfinal 1,i > 0 even if player i is ranked last in the public leaderboard. That is, every player that has participated in the contest has a positive chance of winning, regardless of their position in the public leaderboard. Hence, players must use all of the available information in the public leaderboard every time they decide whether to play or not. Keeping track of the complete history of submissions, with over 15,000 submissions in each competition, is computationally intractable.21 In contrast to a dynamic environment in which players perfectly observe their relative position, the public leaderboard is just a noisy signal of the actual position of the players in the contests. Without noise, i.e., in a contest where the Pj players with the highest public score at the terminal history receive a prize, players only need to keep track of the current highest Pj public scores to make their investment decision, which leads to a low-dimensional state space. In our setting, however, the state space is large because the relevant public history is not summarized by a single number. To overcome this > 0 for ` = 1, 2, 3 if and only if player computational difficulty, we assume that pfinal `,i i is among the three highest scores in the final public leaderboard. In other words, we assume the final three highest private scores are a permutation of the final three highest public scores. Table A.2 in the Online Appendix shows that in 76 percent of the contests that we study the winner is among the three highest public scores,22 suggesting that this assumption is not too restrictive. Small and Myopic Players 21

For example, if we partition the set of public scores into 100 values, with 15,000 submissions the

number of possible terminal histories is of the order of 2300 . 22 This could be relaxed with more computational power.

20

There are at least 15,000 submissions and thousands of players on average in each contest. Fully-rational players would take into account the effect of their submissions on the strategy of the rival players. However, solving analytically and computationally a dynamic model with fully-rational and heterogeneous players turns out to be infeasible. As a simplification, we assume that players are small, i.e., they do not consider how their actions affect the incentives of other players. This price-taking-like assumption is not unreasonable for our application. This assumption is not in contradiction with Observations 5 and 6, because the expected number of future submissions is derived as an equilibrium object. Hence, a player has corrects beliefs in equilibrium about how many additional rival submissions will arrive.23 Thus, players in fact anticipate that a disruptive submission will reduce future participation. In addition to assuming that players are small, we make another simplification for computational tractability. We assume that when players decide to play or to quit, they expect more submissions in the future by rival players but not by themselves. In other words, myopic players think this current opportunity to play is their last one. It is worth noting that under this assumption players might play multiple times, however they think that they will never have a future opportunity to play or in case they do they will choose not to play. A similar assumption is made in Gross (2017). This means that myopic players are not sequentially rational. This assumption can be completely relaxed with more computational power. In fact, a dynamic model with sequentially rational players is presented as an extension in Section 3.1.2. Estimating this version of the model is computationally demanding, and we estimated it only for a handful of contests to check robustness. State Space and Incentives to Play The relevant state space is defined by three sets. First, we define the set of (sorted) vectors of the three largest public scores, Y = {y = (y1 , y2 , y3 ) ∈ [0, 1]3 : y1 ≥ y2 ≥ y3 }. Second, we define RS = {∅, 1, 2, 3, (1, 2), (1, 3), (2, 3)} to be the set of score ownership. The final set is T = [0, 1] which represents the contest’s time. Notice that y ∈ Y and t ∈ T are public information common to all players. Under the small-player assumption, the relevant state for each player is characterized by s = (t, ri , y) ∈ S ≡ T × RS × Y. 23

Similar assumptions are made in Bhattacharya (2016).

21

To be precise, s = (t, ri , y) ∈ S means that at time t player i owns the components of vector y indicated by r. For example, (t, (1, 3), (0.6, 0.25, 0.1)) means that at time t, the player components are one and three in vector y, i.e., the player owns two out of the three highest public scores: 0.6 and 0.1. The small-player assumption reduces the dimensionality of the state space, because players care only about the three highest public scores and which one of them they own. Also, although they do not observe the private scores, they are able to compute the conditional distribution of private scores given the set of public scores. Because prizes are allocated at the end of the contest, the payoff-relevant states are the final states s ∈ {T } × RS × Y. We denote by π(s) the payoff of a player at state s. In vector notation, we denote the vector of terminal payoffs by π. We consider a finite grid of m values for the public scores, Y = {y 1 , ..., y m }. If a player of type θ decides to play and send a new submission, the public score of that submission is distributed according to qθ (k) = Pr(y = y k |θ), k = 1, ..., m. Although players are small, they have beliefs over the number of future submissions sent by their rivals. At time t, a player believes that with probability pt (n) that n rival submissions will arrive before the end of the competition. Also, the scores of those submissions will be independently drawn from the distribution G, where PrG (y = y k ) = P

κ(θ)qθ (k). Furthermore, similar to Bajari and Hortacsu (2003), we assume that the

θ∈Θ

belief about the number of rival submissions that will arrive in the future follows a Poisson distribution of parameter γ(T − t), [γ(T − t)]n e−γ(T −t) . (1) n! Notice that under this functional form, players believe that the expected number of pt (n) =

remaining rival’s submissions, γ(T −t), is proportional to the remaining time of the contest. The parameter γ is an equilibrium object and will be determined as a fixed-point in the estimation. To derive the expected payoff of sending an additional submission we proceed in two steps. First, we solve for the case in which a player thinks she is the last one to play, i.e., pt (0) = 1, and then we solve for the belief pt (n) given in Equation 1. Denote by Btθ (s) the expected benefit of building a new submission for a player of type θ at state s, when she thinks she is the last player sending a submission before the end of the contest. For clarification, consider the following example. A player of type θ is 22

currently at a state s = (t, r = (1, 2), y = (y1 , y2 , y3 )) and has an opportunity to play. If she plays and the new submission arrives before T (which happens with probability 1 − e−(T −t)λθ ), the transition of the state depends on the score of the new submission y˜. The state (r, y) can transition to (r0 , y 0 ) where: r0 = (1, 2) and y 0 = (y1 , y2 , y3 ) when y˜ < y2 ;24 or r0 = (1, 2) and y 0 = (y1 , y˜, y3 ) when y2 ≤ y˜ < y1 ; or r0 = (1, 2) and y 0 = (˜ y , y2 , y3 ) when y1 ≤ y˜. More generally, we can repeat this exercise for all states s ∈ S and put all these transition probabilities in a |RS × Y| × |RS × Y| matrix denoted by Ωθ . Each row of this matrix corresponds to the probability distribution over states (r0 , y 0 ) starting from state (r, y), conditional on the arrival of a new submission. If the new submission does not arrive, then there is no transition and the state remains (r, y). In matrix notation, where each row is a different state, the expected benefit of sending one extra submission is given by B θt = (1 − e−(T −t)λθ )Ωθ π + e−(T −t)λθ π. Consider a given state s. With probability (1 − e−(T −t)λθ ) the new submission is built before the end of the contest. The score of that submission (drawn from qθ ) determines the probability distribution over final payoffs. This is given by the s-row of the matrix Ωθ . The expected payoff is computed as (Ωθ )s• ·π which corresponds to the dot-product between the probability distribution over final states starting from state s and the payoff of each terminal state. With probability e−(T −t)λθ the new submission is not finished in time and therefore the final payoff for the player is given by πs (the transition matrix is the identity matrix). A player chooses to plays if and only if the expected benefit of playing net of the cost of building a submission is larger than the expected payoff of not playing, i.e., B θt − c ≥ π

⇐⇒

(1 − e−(T −t)λθ )[Ωθ − I]π ≥ c.

(2)

We can now easily incorporate into Equation 2 the belief pt (n) over the number of rival submissions made after t. The final state does not depend on the order of submissions, because payoffs are realized at the end of the competition,25 so each player cares only about their ownership at the final state. Because players myopically think that they will not make another submission after the current one, we can replace the final payoff 24 25

See footnote 20. Except for ties, but we deal with this issue in the numerical implementation.

23

by the expected payoff after n rival submissions and then let the agent decide whether to make her last submission considering this new expected payoff. That is, from state s, there is a probability distribution over S after n rival submissions (of scores drawn ˆ n , where Ω ˆ is constructed from the distribution G) given by the s-th row of matrix Ω similarly to Ω but replacing qθ (·) by the mixture probability g(·). Instead of considering ˆ n π with the payoff π before the last play, the player considers the expected payoff Ω probability pt (n). Hence, the player plays if and only if: ∞ X

ˆ n πpt (n) ≥ c. (1 − e−(T −t)λθ )[Ωθ − I]Ω

(3)

n=0

Equation 3 is similar to Equation 2, except that now the final payoff depends on the player’s belief about the number of submissions made by rival players in the future. Using the definition of pt (n), the definition of the exponential of a matrix,26 and some manipulations, we obtain: ˆ

Γθ,t ≡ (1 − e−(T −t)λθ )[Ωθ − I]eγ(T −t)[Ω−I] π ≥ c.

(4)

Equation 4 provides a tractable equation that can be used for estimation, by making use of efficient algorithms to compute the exponential of a matrix.27 This equation reflects the effect of the beliefs over future rival submissions on the decision of a player to build an extra submission. Conditional on a state s = (t, r, y) there are two effects driving the comparative statics on t: As the competition approaches its end, on the one hand a player has less incentives to make an extra submission because she is less likely to finish building it before the end of the competition. On the other hand, it faces fewer rival submissions, which gives her more incentives to send an extra submission later on in the contest. The comparative statics on γ is intuitively clear: the larger γ, the less incentives to play. This is because the number of rival submissions that will arrive in the future are, on average, γ(T − t). Therefore, the larger γ, the larger the number of rival submissions, so the state becomes less favorable for the player. Finally, notice that for θ0 > θ we have that Ωθ0 F OSD Ωθ —i.e., drawing better scores are more likely for higher types—so high-type players have larger incentives to play conditional on a given state. P∞ n The exponential of a matrix A is defined by eA ≡ n=0 An! 27 The exponential of a matrix is a well-studied object given its applications to, for example, solving

26

a linear system of differential equations.

24

Equation 4 is the building block in our empirical estimation. Before going into the details of the estimation, we present two extensions of the baseline model.

3.1 3.1.1

Extensions Flow Cost instead of Fixed Cost

The type of cost—fixed or flow—that a player must pay to build a new submission has consequences on incentives.

28

In fact, conditional on a state, when players pay

a fixed cost per submission then the return is larger when a submission is sent at the beginning of the contest, because the submission is more likely to arrive before the end of the contest. Therefore, with a fixed cost per submission, submissions sent at the beginning of the contest are relatively cheaper than those sent towards the end of the contest. In contrast, by paying a flow cost, this effect is mitigated because players only pay a cost proportional to the time they spend building a submission. With a flow cost, Equation 4 changes to ˆ

(1 − e−(T −t)λ )[Ωθ − I]eγ(T −t)[Ω−I] π ≥ E[c], where Et [c] =

3.1.2

R T −t 0

cτ λe−λτ dτ =

c λ

h

(5)

i

1 − e−λ(T −t) (λ(T − t) + 1) is the expected cost.

Forward Looking Small Players

Consider small but sequentially rational players. Each player action is either to continue or quit participating in the contest. That is, players do not have the possibility of waiting and then making submissions. They are either developing a submission or not participating in the contest at all.29 Given that the length of the contest is finite, the game can be solved by backward induction. At time T , no player has enough time to build a new submission. So the value of reaching state s = (T, r, y) is simply V (s) = π(s). Let Vt be a S × 1 vector 28 29

This is discussed in Loury (1979) and Lee and Wilde (1980). This strong assumption is required for identification reasons, because we cannot distinguish

whether a player is working on a submission or waiting without working.

25

indicating the value at each state s = (t, r, y). If the optimal decision at time t is to quit participating in the contest, then the payoff is given by ˆ

VtQuit = eγ(T −t)[Ω−I] π. If the optimal decision is to continue playing, the expected payoff is either Z T −t

VtPlay = VtPlay

0

Z T −t

=

0

ˆ

h

i

(6)

λe−λτ Ωθ eγ(τ −t)[Ω−I] Vt+τ dτ − c.

(7)

λe−λτ Ωθ eγ(τ −t)[Ω−I] Vt+τ − cτ dτ , or ˆ

Equation 6 corresponds to the value of playing when the player pays a flow cost c, whereas Equation 7 is the value of playing when the player pays a lump-sum cost c. In ein h

i

h

io

ther case, we can solve by backward induction to obtain Vt = max E VtQuit , E VtPlay .

4

Estimation

We estimate the parameters of the model in two steps. First, we estimate a number of primitives directly from the data. Second, using the estimates of the first step, we estimate the remaining parameters using a likelihood function constructed based on the model. We repeat this procedure for each contest. The full set of parameters for a given contest include: i) the distribution of new player arrival times, which we assume follow an exponential distribution with parameter µ, exp(µ); ii) the distribution of submission arrival times, which we assume follow an exponential distribution with parameter λ, exp(λ); iii) the distribution of private score conditional on public score, {H(·|ppublic )}ppublic ∈[0,1] , which we assume is given by pprivate = α + βppublic + , with distributed according to a double exponential distribution; iv) the type-specific cumulative distribution of public scores, Qθ : (0, 1) → [0, 1], which we assume is given by the standard normal distribution, Qθ (x) = Φ

log(x/(1−x))−µθ σθ

; v) the distribution of types, κ, which we assume is a dis-

crete distribution over the set of player types, Θ; vi) the time-specific distribution of the number of submissions that will be made in the remainder of the contest, pt (n), which we assume follows a Poisson distribution with parameter γ(T − t), pt (n) =

[γ(T −t)]n e−γ(T −t) ; n!

and, lastly, vii) the distribution of submission costs, which we assume has a support 26

that is bounded above by 1 (i.e., the normalized value of the total prize money), and has a cumulative distribution function given by K(c; σ) = cσ (with σ > 0). We estimate primitives i) through vi) in the first step, and vii) using the likelihood function implied by the model. i), ii), and iii) are estimated using the maximum likelihood estimators for µ, λ, and (α, β), respectively. We estimate iv) and v) using a Gaussian mixture model that we estimate using the EM algorithm. The EM algorithm estimates the k Gaussian distributions (and their weights, κ(θk )) that best predict the observed distribution of public scores. Throughout our empirical analysis we assume that there are k = 3 player types.30 Lastly, for vi), we impose that γ must equal the observed number of submissions in each contest (see Table 2), as a way of capturing γ as an equilibrium object. In each of the counterfactuals, we recompute γ as an equilibrium object. The likelihood function implied by the model is based on the decision of a player to make a new submission. Recall that a player chooses to make a new submission immediately after the arrival of each of his submissions. A player facing state variables s chooses to make a new submission at time t if and only if Γθ,t (s) ≥ c,

(8)

ˆ

where Γθ,t = (1 − e−(T −t)λθ )[Ωθ − I]eγ(T −t)[Ω−I] π is the vector of the net benefits of making a new submission at time t for all posible states s (before deducting the cost of making a submission) and c is the cost of a submission. Γθ,t depends only on primitives estimated in the first step of the estimation, which simplifies the rest of the estimation. When computing Γθ,t we partitioned the contest time [0, 1] in 200 time intervals. Based on Equation 8, a θ-type player facing state variables s plays at time t with probability Pr(play|s, t, θ) = K (Γθ,t (s)) . Given that we do not observe the player’s type, we take the expectation with respect to θ, which yields Pr(play|s, t) =

X

κ(θ)K (Γθ,t (s)) ,

θ 30

We experimented with different number of types. k = 3 is parsimonious and gave us good fit.

27

where κ(θ) is the probability of a player being of type θ. The likelihood is constructed using tuples {(si , ti , t0i )}i∈N , where i is a submission, si is the vector of state variables at the moment of making the submission, ti is the submission time, and t0i is the arrival time of the next submission, which may or may not be observed. If the next submission is observed, then ti < t0i ≤ T , if not, t0i > T . If the new submission arrives at t0i ≤ T , then the player must have chosen to make a new submission at ti , and the likelihood of the observation (si , ti , t0i ) is given by l(si , ti , t0i ) = 0

0

Pr(play|si , ti ) · λe(−λ(ti −ti )) , where λe(−λ(ti −ti )) is the density of the submission arrival time. If we do not observe a new submission after the player’s decision at time t (i.e., t0i > T ), then the likelihood of (si , ti , t0i > T ) is given by l(si , ti , t0i > T ) = Pr(play|si , ti ) · e(−λ(T −ti )) + 1 − Pr(play|si , ti ), which considers both the events of i) the player choosing to make a new submission at ti and the submission arriving after the end of the contest; and ii) the event of the player choosing not to make a new submission. The log-likelihood function is then given by L(δ) =

X

log l(si , ti , t0i ),

i∈N

where δ is the vector of structural parameters. We perform inference using the asymptotic distribution of the maximum likelihood estimator.

28

4.1

Model Estimates

hhp

µ

SE

λ

SE

σ

SE

ˆ log L(δ)/N

N

2.54

0.0691

139.3308

0.8757

0.003

0.0001

-3.5129

25316 24526

allstate-purchase-prediction-challenge

1.9117 0.0483

86.6572

0.5533

0.0035

0.0001

-3.0264

higgs-boson

2.2081

0.0523

99.8622

0.528

0.0017 0.0001

-3.2407

35772

acquire-valued-shoppers-challenge

2.0347

0.0659

123.2493

0.7765

0.0012

-3.5269

25195

liberty-mutual-fire-peril

2.3163

0.092

84.3712

0.6932

0.0028 0.0001

-3.145

14812

axa-driver-telematics-analysis

2.0942

0.0536

98.6269

0.5193 0.0013

0.0001

-3.2956

36065

crowdflower-search-relevance

2.0708

0.0569

68.1422

0.447

0.0022

0.0001

-2.8531

23244

caterpillar-tube-pricing

3.2151

0.0884

61.5938

0.3794

0.0025

0.0001

-2.791

26360

liberty-mutual-group-property-inspection-prediction

2.8362

0.06

63.4536

0.2963

0.0023

0.0001

-2.827

45875

coupon-purchase-prediction

2.1102

0.0643

66.0059

0.4856

0.0027

0.0001

-2.8099

18477

springleaf-marketing-response

2.4308

0.0515

64.4029

0.3243

0.0029

0.0001

-2.7964

39444

homesite-quote-conversion

2.2237

0.0529

81.1871

0.4257

0.0031 0.0001

-3.0634

36368

prudential-life-insurance-assessment

2.1082 0.0412

72.0748

0.3379

0.0017 0.0001

-2.9023

45490

santander-customer-satisfaction

2.2486

0.0314

68.9879

0.2255

0.0033

0.0001

-2.8727

93559

expedia-hotel-recommendations

2.2155

0.0499

40.0034

0.2655

0.0091

0.0003

-2.2065

22709

0.0001

Table 6: Maximum Likelihood Estimates of the Cost and Arrival Distributions (partial list). Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’ See Table A.4 in the Online Appendix for the full table.

Table 6 presents the maximum likelihood estimates for the submission-cost distribution as well as for the distributions of entry time and submission arrival time. The model was estimated separately for each contest. Column 1 shows the estimates of µ, the players’ rate of entry in a given competition. The average entry time (i.e., 1/µ) ranges between 0.31 and 0.52 (where the contest time is normalized 1). Column 3 presents the estimates for λ, the rate at which submissions are completed. In line with Table 2, the estimates suggest that the average time between submissions ranges between 0.007 and 0.025. Column 5 presents estimates for the coefficients governing the distribution of submission costs, σ. The distribution functional form implies that the expected cost of making a submission is given by σ/(1 + σ). The estimates for σ suggest that the expected submission cost ranges between 0.001 and 0.009, where these submissions costs are measured relative to the total monetary rewards of each contest. Table A.5 in the Online Appendix presents the EM algorithm estimates for the type-specific distributions of scores; Table A.6 in the Online Appendix presents estimates for the distribution of private scores conditional on public scores. 29

With respect to how well the model fits the data, Figure 5 plots the actual versus predicted number of submissions. The predicted number of submissions are computed based on 2,000 simulations for each contest. The simulations make use of the estimates of the model and take the number of teams that participate in each contest as given. The contest participation predicted by the model is the average number of submissions across all of the simulations. The correlation between the actual and the predicted number of submissions is 0.87. The figure shows that the model does not systematically over or under predict participation. 4

x 10 6

Predicted participation

5

4

3

2

1

0 0

2

4 6 Actual participation

8

10 4 x 10

Figure 5: Number of Submissions Predicted by the Model Versus Actual Number of Submissions Note: An observation is a contest. The coefficient of correlation between the actual and predicted number of submissions is 0.87. The predicted number of submissions is based on 2, 000 simulations of each contest using our model estimates.

5

Counterfactual Contest Designs

In this section, we evaluate a series of counterfactual contest designs to shed light on how contest design affects participation incentives and contest outcomes. In these counterfactual exercises we evaluate the total number of submissions and upper-tail statistics of the score distribution (e.g., maximum and 95th percentile). As we have already mentioned, the contest literature has acknowledged the importance of diversity 30

in providing a rationale to sponsor an open contest (Terwiesch and Xu, 2008). In our dataset we cannot measure diversity directly, so we instead use the total number of submissions as a proxy to measure diversity. Some sponsors may not care about diversity per-se and instead are focused on obtaining a high quality solution. In this case, the relevant measure of performance of a contest design is the upper-tail of the score distribution. We use upper-tail statistics to understand how the upper-tail of the distribution changes under different contest designs.

5.1

Information Design

We first study the role of information disclosure. As explained previously, the evaluation in the baseline case is based on two datasets. The contest designer chooses the size of these datasets—e.g., 60 percent of test data to generate public scores and 40 percent to generate private scores—and also which scores to disclose. In the baseline contest design, the contest designer only discloses the public scores but the final standings are computed using the private scores. We consider two alternative designs. In the first counterfactual, the sponsor does not display a leaderboard: participants only observe their own scores but they do not observe the score of their rivals. In the second counterfactual, we explore the effect of eliminating the noise that generates an imperfect correlation between private and public scores. We simulate a contest where there is no noise between these scores (i.e., the public score equals the private score) and prizes are allocated according to the final standings of the public leaderboard. No Public Leaderboard Consider the counterfactual scenario where the contest designer does not disclose the leaderboard but players are privately informed about their public scores. Note that payoffs are realized only at the end of the contest and no information is disclosed other than the players’ past submissions. Also, importantly, submissions scores are independent conditional on arriving. Suppose that a player of type θ makes a submission. Then, starting from state s,

31

the probability distribution over states is given by the s-th row of the matrix Ωθ . If after a player sends a submission, a rival player makes a submission, the probability ˆ θ . Since submission scores are independent, distribution over states is given by ΩΩ the distribution is the same regardless of who plays first because a state is completely ˆ defined by a history of submission scores. In other words, the matrices Ωθ and Ω commute. Consider an information set at time t for a player of type θ, denoted by Itθ . Because the leaderboard is not visible, Itθ only contains the history of scores of this particular player. In fact, we can restrict attention to the player’s largest three scores. Let s = (t, r, y) be the state at time t, constructed by using only the scores of the player up to time t (contained in the history It ), ignoring the past submissions of the rivals. From the commutativity of the transition matrices (equivalently, the stationary of distributions), the decision problem is equivalent to the decision of making a submission at time t = 0, starting from state s, in a contest of length T − t, with beliefs over the number of rivals submissions equal to pT (n). This is, we look at the row corresponding to state s in the following vector inequality: ˆ

(1 − e−(T −t)λθ )[Ω − I]eγT [Ω−I] π ≥ c.

(9)

Notice the difference between Equation 9 and Equation 4. Incentives to make a submission change without disclosure. When the public leaderboard is observable, there are histories for which a player would choose to play and others for which a player would choose to quit. Players have larger incentives to play after histories where rivals have realized low scores, relative to those histories where rivals have realized very high scores. When the public history is hidden, however, players cannot condition their strategy on the current state. Instead, they choose whether to play or to quit based on their belief about the current history, which is an average across all feasible histories. Whether there is more or less participation without a public leaderboard will depend on the distribution of types. The benefit of playing in a favorable state “cross-subsidizes” the loss of playing in less favorable states. Depending on the strength of these crosssubsidization, which depend on the distribution of types, a player may end up playing more or less compared to the case where the public leaderboard is disclosed. Figure 6 compares the number of submissions when the public leaderboard is shown 32

and when it is not. The effects are measured in percentage points and relative to the number of submissions in the baseline contest design (i.e., contest with public leaderboard). Given the heterogeneity in the response across contest, we consider the weighted average where weights correspond to the size of the contest. Specifically, let ∆%j be the percentage change in the total number of submissions relative to the baseline case and let Vj be the sum of prizes in contest j. Then, the weighted average effect is computed as ∆% =

55 X

Vj P55

j=1

`=1

V`

∆%j .

Hiding the public leaderboard can decrease the number of submissions by as much as 60 percent or increase it by almost 60 percent. The figure shows that the overall number of submissions would increase by about 23 percent if the contest designer chose to hide the public leaderboard in all contests. Example 1 illustrates the fact that hiding the public leaderboard may increase or decrease participation, and that may increase overall participation by increasing the number of submissions made by low types and decreasing the number of submissions made by high types.

33

No leaderboard/Baseline Zero Overall Effect Overall 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 −40

−20

0 20 Change in participation (percentage points)

40

60

Figure 6: Change in the number of submissions when comparing the case without leaderboard versus the case with leaderboard (baseline) Note: An observation is a contest. The brackets indicate a 95 percent confidence interval. The predictions are based on 2, 000 simulations of each contest using our model estimates. Contests are labeled according to the contest numbers in Table A.1 in the Online Appendix.

Table 7 summarizes our counterfactual results for weighted average outcomes: total number of submissions, total number of submissions by high-types, the 95 and 99 percentile and the maximum score. As in Figure 6, the weights correspond to the total reward in a contest.31 Column 1, shows the average outcome over 2000 simulations of the baseline case. The rest of the columns show the difference between the outcome variables under alternative 31

In the Online Appendix, we redo Table 7 using different weights (number of submissions or uniform

weights).

34

contest designs and the outcome variables under the baseline case. Column 2 shows the case of no leaderboard. As depicted in Figure 6, the average number of submissions increases by about 23 percent. When hiding the leaderboad, however, the total number of submissions by high types goes down by 16 percent. Because high types send fewer submissions, the upper tail of the distribution shifts to the left, decreasing the maximum score and the 95th percentile of the score distribution. Average change relative to baseline Submissions

Baseline

No leaderboard

No Noise

1 Prize

Limited Participation

15214.6471

3563.1752

-469.9248

-1.5074

-1320.7285

(15201.0352,15228.2591) (3547.7692,3578.5812) (-481.3420,-458.5075) (-4.2856,1.2709) (-1329.9250,-1311.5320) High type submissions Score p95 Score p99 Score max

13.4812

-2.1939

-0.4175

0.0031

-0.9320

(13.4686,13.4939)

(-2.2068,-2.1809)

(-0.4293,-0.4058)

(-0.0004,0.0066)

(-0.9415,-0.9225)

0.9635

-0.0060

-0.0001

-0.0000

0.0006

(0.9635,0.9636)

(-0.0060,-0.0060)

(-0.0001,-0.0001)

(-0.0000,0.0000)

(0.0006,0.0007)

0.9808

-0.0025

-0.0001

-0.0000

0.0003

(0.9808,0.9808)

(-0.0025,-0.0025)

(-0.0001,-0.0000)

(-0.0000,0.0000)

(0.0003,0.0003)

0.9944

-0.0004

-0.0000

0.0000

-0.0000

(0.9943,0.9944)

(-0.0004,-0.0004)

(-0.0000,-0.0000)

(-0.0000,0.0000)

(-0.0001,-0.0000)

Table 7: Contest Outcomes Under Counterfactual Contest Designs Note: Average changes weighted by the size of the prize. The predictions are based on 2, 000 simulations of each contest using our model estimates. Submissions are the total number of submissions; high type submissions are the average number of submissions by each high-type contestant; score p95 is the 95th percentile of the score distribution of the contest; score max is the maximum score of the contest. A high type is defined as the player type that maximizes µθ + 3σθ for each contest (i.e., the player better able to achieve extreme score). Bootstrapped 95 percent confidence intervals in parenthesis.

Example 1 provides some intuition for the heterogeneity shown in Figure 6 and Table 7. Intuitively, when the high-type player is ahead, the marginal return of an extra play is smaller than the cost. The low-type player is behind, so the marginal benefit of an extra play is large so it is worth playing. However, this is only when there is no leaderboard. With a leaderboard, both type of players submit an extra play when Nature’s draw can be defeated. The example shows that we can obtain more participation overall, but less participation of higher types when the leaderboard is not displayed. Example 1 (Disclosure of Information and Participation). An agent of type θ currently has a score of aθ . The agent can send a new submission at cost c < 1 to increase her score to aθ + bθ . The agent plays after Nature who draws a score x ∼ G(·). The largest score wins a prize of 1 with probability α ∈ [0.5, 1]. 35

Leaderboard is Displayed: The agent perfectly observes the realization of x (Nature’s draw). If x < aθ the chances of winning are unaffected by whether the agent plays or not. Since playing is costly, the agent does not play. Similarly, if x > aθ + bθ the agent does not play. When x ∈ Iθ ≡ [aθ , aθ + bθ ] the agent plays if and only if α − c ≥ 1 − α ⇔ 2α − 1 ≥ c. Hence, an agent of type θ plays with probability 1(2α − 1 > c)[G(aθ + bθ ) − G(aθ )]. Leaderboard is not Displayed: Suppose that x is unobservable. The expected payoff for an agent that does not play is α Pr(x < aθ ) + (1 − α) Pr(x > aθ ), whereas the expected payoff if the agent plays is α Pr(x < aθ + bθ ) + (1 − α) Pr(x > aθ + bθ ) − c. Therefore, the best response is to play if and only if (2α − 1)[G(aθ + bθ ) − G(aθ )] > c. Expected Participation with and without a Leaderboard: Suppose there are only two types: θH with probability λ and θL with probability 1 − λ. If 2α − 1 < c there is no participation regardless of the contest format, so we assume that 2α − 1 > c. The expected participation with a leaderboard is λ[G(aH + bH ) − G(aH )] + (1 − λ)[G(aL + bL ) − G(aL )], and the expected participation without a leaderboard is λ1((2α − 1)[G(aH + bH ) − G(aH )] ≥ c) + (1 − λ)1((2α − 1)[G(aL + bL ) − G(aL )] ≥ c). If (2α − 1)[G(aθ + bθ ) − G(aθ )] ≥ c for all θ, then all types participate without a leaderboard. In this case, then the expected participation with a leaderboard is strictly lower than the participation without a leaderboard. If (2α − 1)[G(aθ + bθ ) − G(aθ )] ≥ c for all θ, then no type participates without a leaderboard. In this case, then the expected participation with a leaderboard is strictly higher than the participation without a leaderboard. If G(aH + bH ) − G(aH ) < c < G(aL + bL ) − G(aL ) and λ is sufficiently small, total participation increases and the participation of high types decreases without a leaderboard if an only if (1 − λ) > λ[G(aH + bH ) − G(aH )] + (1 − λ)[G(aL + bL ) − G(aL )]. When λ approaches zero this condition holds. 36

Perfect Correlation between Public and Private Scores A second counterfactual design that manipulates the amount of information disclosed to participants is to send a perfect (rather than noisy) signal regarding the position of the players in the contest. This counterfactual exercise eliminates the noise caused by the positive but imperfect correlation between public and private scores.32 Online Appendix B extends Example 1 by generalizing the parameter α to be a function that depends on the distance between the first and second score: when they are far away, the leader wins for sure; when they are close, the leader does not win for sure. When the size of the noise is small, all the results in Example 1 hold. However, we show in Table 7 (Column 3) that eliminating the noise reduces the incentives to play for all player type; eliminating the noise reduces overall participation and the participation of high type players by about 3 percent. Finally, it is worth mentioning that allocating prizes using the public score ranking has the potential drawback of participants attempting to engage in overfitting, to find solutions that maximize the public score ranking but are not robust outside of the test data. In our model, we abstract away from this issue but our results show that, if anything, allocating prizes according to the public leaderboard (i.e., removing the noise) decreases participation on average. This finding in conjunction with the overfitting issue suggests that to maximize participation and the quality of the top submissions, the contest designer should allocate prizes according to the private score ranking (as Kaggle already does).

5.2

Number of Prizes

We also analyze the role of the allocation of prizes on contest outcomes. Instead of allocating prizes to the j-highest ranked players, we simulate a contest that allocates a single prize to the winner, keeping the total award money fixed. Literature on the optimal allocation of prizes has found that the shape of the cost function plays an important role in determining the optimal prize allocation when the contest designer’s goal is to maximize aggregate effort. 32

This can be implemented if contest designer uses all the test data to compute the public scores.

37

Relative to a single prize, multiple prizes create stronger incentives to reach places other than the first, but weaker incentives to reach the first place. Hence, total participation in a contest may increase or decrease when there are multiple prize rather than a single prize. Let us compare the incentives to send a submission of a player of type θ currently in position k in a contest with a single prize V versus a contest with j prizes VPi ≥ ... ≥ VPj such that

Pj

i=1

VPi = V . Denote by pfinal `,θ (s|a) the probability that a player of type θ

finishes in position ` at the end of the contest after action a ∈ {play, not play} when final the current state is s. Denote by ∆`,θ (s) = pfinal `,θ (s|play) − p1,θ (s|not play) the increase

in the probability of finishing in position ` conditional on an extra submission. With a single prize, it is profitable to send extra submission if and only if V ∆1,θ (s) ≥ c. With multiple prizes, it is profitable to send extra submission if and only if V ∆1,θ (s) + Pj

i=2

VPi [∆i,θ (s) − ∆1,θ (s)] ≥ c. Therefore, we have the following result,

Corollary 1. A player of type θ in state s has higher incentives to participate in a contest with j prizes versus a contest with a single prize if and only if Prizes ηθ,j (s) =

j X

VPi [∆i,θ (s) − ∆1,θ (s)] ≥ 0.

i=2 Prizes With asymmetric players, the sign of ηθ,j (s) can by positive or negative, so partici-

pation can go up or down. Notice, however, when ∆i,θ (s) ≈ ∆1,θ (s), participation with multiple prizes or a single prize should be similar. Table 7 (Column 4) shows the impact of awarding a single prize on different contest outcomes. Our results show that changing the allocation of prizes has a small and statistically insignificant overall effect on participation. While the results are heterogeneous across contests, the magnitude of the effect is less than 1 percent (in absolute value) in all but one contest. To explain this result, notice that the first order effect on incentives is whether or not a player is ranked among the first three players at the end of the contest. Conditional on that event, the effect of allocating one or three prizes is small because of the uncertainty created by the imperfect correlation between public and private scores. In other words, ∆i,θ (s) ≈ ∆1,θ (s). Finally, there is an interesting connection between a contest that allocates j prizes 38

according to the private scores and a contest in which each player gets a prize according to their public score ranking. When prizes are allocated according to private scores, a player that finishes in position k-th in the public ranking receives the prize VPi with probability Qi,k , which is the probability that a player that ranks k in the ranking of public scores ranks i in the ranking of private scores. With risk-neutral participants, a contest in which public scores are a noisy signal of the final standings and allocate prizes VP1 ≥ ... ≥ VPj is equivalent to a contest in which prizes Vb1 ≥ ... ≥ VbN are allocated P according to the public score ranking, with Vbk = ji=1 VPi Qi,k —the player that finishes

in position k in the ranking of public scores receives the prize Vbk for sure. Hence, this provides another explanation of the heterogeneous effects in participation described in the previous section.

5.3

Limiting the Number of Participants

Lastly, we study the role of limiting participation on contest outcomes. The direct effect of limiting participation is the reduction in the number of participants. In our setting, fewer participants imply that there will be fewer players sending submissions. There are two indirect effects due to the reduction of competition. First, with fewer players, it is more likely that an agent wins by participating. Hence, the marginal benefit of sending a submission increases. Second, with fewer players and with the opportunity of sending multiple submissions, a player that is ahead faces a larger “replacement effect” and this can lead to fewer submissions. Some articles have argued that limiting participation may be optimal (see Che and Gale 2003; Kireyev 2016; Taylor 1995) although these results are generally based on models of static contests or contests without a leaderboard. We contribute to the literature by shedding light on the impact of limiting participation in a dynamic environment with heterogeneous players. We consider the case where we reduce the number of participants in each contest by 10 percent, keeping the distribution of player types unchanged. Table 7 (Column 5) shows that the increase in individual incentives caused by reduced competition does not fully compensate for the reduction in the number of players, leading to an average decrease in the total number of submissions of 8.7 percent and reducing the average number of high-types submissions by about 6.8 percent. As a consequence of these effects, the 39

maximum score decreases.

5.4

Robustness

In this section we discuss two variations of the baseline model and explain how our results are affected by these variations. First, we consider how the results change when using the specification where players pay a flow cost rather than a fixed cost when building a submission (see the discussion in Section 3.1).33 It has been argued in the literature that whether players pay a flow or fixed cost matters for players’ incentives (see for instance Loury 1979 and Lee and Wilde 1980). Table A.7 in the Online Appendix replicates Table 7 using the estimates for the flow cost specification, and shows that the results are qualitatively identical throughout. Second, we estimate the dynamic model with sequentially rational players presented in Section 3.1.2. The estimation of the dynamic model is computationally demanding because each evaluation of the likelihood function requires solving for the players’ equilibrium beliefs. For this reason, we only estimated a handful of contests using the dynamic framework as a robustness check. Table A.8 in the Online Appendix shows that the estimates of the model with forward-looking players are not statistically different than those in Table 6.

6

Discussion

We investigate how competition design affects players’ incentives. The competitions we study are dynamic rank-ordered contests—where players can send multiple submissions after observing a signal about their current position in the ranking. Our first contribution is methodological. We introduce a novel and tractable empirical model—i.e., a model with a state space that is computationally manageable—to study 33

The fit of the model is slightly worse when using the flow cost specification. The coefficient of

correlation between the actual and the predicted number of submissions drops to 0.77. The better fit of the fixed cost specification motivates us to choose it as our main specification.

40

a setting where players can submit multiple submissions throughout the contest. The contest designer displays, in real time, a public leaderboard which provides participants with a noisy signal of their position, allowing the players to decide to continue participating or to quit. Our model relies on various simplifications motivated by empirical evidence, but is general enough to be applied to other settings. The computational tractability is achieved by assuming that players are small, i.e., they do not consider the effect of their actions on rival player’s strategies. However, players form beliefs, which are correct in equilibrium, about the future number of submissions in the contest. In our framework, the players’ expected payoff of making a submission conditional on their beliefs about the number of rivals’ submissions can be represented by the exponential of a matrix, which can be computed efficiently. These aspects of the model allow us to estimate the structural parameters and to compute outcomes for counterfactual contest designs. Our second contribution sheds light on contest design in a dynamic setting with heterogeneous players. We simulate several counterfactual scenarios to explore alternative contest designs. In particular, we study the role of information disclosure, allocation of prizes, and limited participation in shaping the player’s incentives. Although we find heterogeneity in the results, information design is what has the greatest impact on contest outcomes. Specifically, introducing a small amount of noise encourages participation of all players. This is especially relevant in prediction contests, where a small amount of noise can deal with overfitting issues. Also, we find that providing a real-time leaderboard encourages high-type players to participate and decreases the participation of low-type agents. In fact, without a real-time leaderboard, there are more low-type submissions but high-type players send on average 6.8 fewer submissions. These results stress the importance of information design in dynamic competitions.

7

References

Aoyagi, Masaki (2010) “Information feedback in a dynamic tournament,” Games and Economic Behavior, Vol. 70, pp. 242–260. Azmat, Ghazala and Marc Möller (2009) “Competition among contests,” The RAND

41

Journal of Economics, Vol. 40, pp. 743–768. Bajari, Patrick and Ali Hortacsu (2003) “The winner’s curse, reserve prices, and endogenous entry: Empirical insights from eBay auctions,” RAND Journal of Economics, pp. 329–355. Baker, George P, Michael C Jensen, and Kevin J Murphy (1988) “Compensation and incentives: Practice vs. theory,” The Journal of Finance, Vol. 43, pp. 593–616. Balafoutas, Loukas, E Glenn Dutcher, Florian Lindner, and Dmitry Ryvkin (2017) “The Optimal Allocation of Prizes in Tournaments of Heterogeneous Agents,” Economic Inquiry, Vol. 55, pp. 461–478. Benkert, Jean-Michel and Igor Letina (2016) “Designing dynamic research contests,” University of Zurich, Department of Economics, Working Paper. Bhattacharya, Vivek (2016) “An Empirical Model of R&D Procurement Contests: An Analysis of the DOD SBIR Program,” MIT, Department of Economics, Working Paper. Bimpikis, Kostas, Shayan Ehsani, and Mohamed Mostagir (2014) “Designing dynamic contests,” Working paper, Stanford University. Boudreau, Kevin J, Nicola Lacetera, and Karim R Lakhani (2011) “Incentives and problem uncertainty in innovation contests: An empirical analysis,” Management Science, Vol. 57, pp. 843–863. Boudreau, Kevin J, Karim R Lakhani, and Michael Menietti (2016) “Performance responses to competition across skill levels in rank-order tournaments: field evidence and implications for tournament design,” The RAND Journal of Economics, Vol. 47, pp. 140–165. Chawla, Shuchi, Jason D Hartline, and Balasubramanian Sivan (2015) “Optimal crowdsourcing contests,” Games and Economic Behavior. Che, Yeon-Koo and Ian Gale (2003) “Optimal design of research contests,” The American Economic Review, Vol. 93, pp. 646–671.

42

Chesbrough, Henry, Wim Vanhaverbeke, and Joel West (2006) Open innovation: Researching a new paradigm: Oxford University Press on Demand. Clark, Derek J and Tore Nilssen (2013) “Learning by doing in contests,” Public Choice, Vol. 156, pp. 329–343. Cohen, Chen, Todd R Kaplan, and Aner Sela (2008) “Optimal rewards in contests,” The RAND Journal of Economics, Vol. 39, pp. 434–451. Diamond, Peter A (1971) “A model of price adjustment,” Journal of economic theory, Vol. 3, pp. 156–168. Ding, Wei and Elmar G Wolfstetter (2011) “Prizes and lemons: procurement of innovation under imperfect commitment,” The RAND Journal of Economics, Vol. 42, pp. 664–680. Ederer, Florian (2010) “Feedback and motivation in dynamic tournaments,” Journal of Economics & Management Strategy, Vol. 19, pp. 733–769. Fullerton, Richard L and R Preston McAfee (1999) “Auctioning entry into tournaments,” Journal of Political Economy, Vol. 107, pp. 573–605. Goltsman, Maria and Arijit Mukherjee (2011) “Interim performance feedback in multistage tournaments: The optimality of partial disclosure,” Journal of Labor Economics, Vol. 29, pp. 229–265. Gross, Daniel P (2015) “Creativity Under Fire: The Effects of Competition on Creative Production,” Available at SSRN 2520123. (2017) “Performance feedback in competitive product development,” The RAND Journal of Economics, Vol. 48, pp. 438–466. Halac, Marina, Navin Kartik, and Qingmin Liu (2014) “Contests for experimentation,” Journal of Political Economy, Forthcoming. Hendricks, Kenneth and Robert H Porter (1988) “An empirical study of an auction with asymmetric information,” The American Economic Review, pp. 865–883. Hinnosaar, Toomas (2017) “Dynamic common-value contests.” 43

Huang, Yan, Param Vir Singh, and Kannan Srinivasan (2014) “Crowdsourcing new product ideas under consumer learning,” Management science, Vol. 60, pp. 2138– 2159. Jeppesen, Lars Bo and Karim R Lakhani (2010) “Marginality and problem-solving effectiveness in broadcast search,” Organization science, Vol. 21, pp. 1016–1033. Kireyev, Pavel (2016) “Markets for Ideas: Prize Structure, Entry Limits, and the Design of Ideation Contests,” HBS, Working Paper. Klein, Arnd Heinrich and Armin Schmutzler (2016) “Optimal effort incentives in dynamic tournaments,” Games and Economic Behavior. Krasnokutskaya, Elena and Katja Seim (2011) “Bid preference programs and participation in highway procurement auctions,” The American Economic Review, Vol. 101, pp. 2653–2686. Lakhani, Karim R, Kevin J Boudreau, Po-Ru Loh, Lars Backstrom, Carliss Baldwin, Eric Lonstein, Mike Lydon, Alan MacCormack, Ramy A Arnaout, and Eva C Guinan (2013) “Prize-based contests can provide solutions to computational biology problems,” Nature biotechnology, Vol. 31, pp. 108–111. Lazear, Edward P and Sherwin Rosen (1979) “Rank-order tournaments as optimum labor contracts.” Lee, Tom and Louis L Wilde (1980) “Market structure and innovation: a reformulation,” The Quarterly Journal of Economics, Vol. 94, pp. 429–436. Lerner, Josh and Jean Tirole (2002) “Some simple economics of open source,” The journal of industrial economics, Vol. 50, pp. 197–234. Levin, Dan and James L Smith (1994) “Equilibrium in auctions with entry,” The American Economic Review, pp. 585–599. Li, Tong, Isabelle Perrigne, and Quang Vuong (2002) “Structural estimation of the affiliated private value auction model,” RAND Journal of Economics, pp. 171–193. Loury, Glenn C (1979) “Market structure and innovation,” The quarterly journal of economics, pp. 395–410. 44

Megidish, Reut and Aner Sela (2013) “Allocation of Prizes in Contests with Participation Constraints,” Journal of Economics & Management Strategy, Vol. 22, pp. 713–727. Moldovanu, Benny and Aner Sela (2001) “The optimal allocation of prizes in contests,” American Economic Review, pp. 542–558. (2006) “Contest architecture,” Journal of Economic Theory, Vol. 126, pp. 70– 96. Moldovanu, Benny, Aner Sela, and Xianwen Shi (2007) “Contests for status,” Journal of Political Economy, Vol. 115, pp. 338–363. Olszewski, Wojciech and Ron Siegel (2015) “Effort-Maximizing Contests.” Sisak, Dana (2009) “Multiple-prize contests–the optimal allocation of prizes,” Journal of Economic Surveys, Vol. 23, pp. 82–114. Strack, Philipp (2016) “Risk-Taking in Contests: The Impact of Fund-Manager Compensation on Investor Welfare.” Takahashi, Yuya (2015) “Estimating a war of attrition: The case of the U.S. movie theater industry,” The American Economic Review, Vol. 105, pp. 2204–2241. Taylor, Curtis R (1995) “Digging for golden carrots: an analysis of research tournaments,” The American Economic Review, pp. 872–890. Terwiesch, Christian and Yi Xu (2008) “Innovation contests, open innovation, and multiagent problem solving,” Management science, Vol. 54, pp. 1529–1543. Xiao, Jun (2016) “Asymmetric all-pay contests with heterogeneous prizes,” Journal of Economic Theory, Vol. 163, pp. 178–221.

45

Online Appendix: Not For Publication Dynamic Tournament Design: An Application to Prediction Contests Jorge Lemus and Guillermo Marshall

i

A

Additional Tables and Figures Contest

Name of the

Number

Competition

Total

Number of

Teams Start Date

Deadline

Reward Submissions

1

Predict Grant Applications

5,000

2,800

204

12/13/2010 02/20/2011

2

RTA Freeway Travel Time Prediction

10,000

3,129

356

11/23/2010

02/13/2011

3

Deloitte/FIDE Chess Rating Challenge

10,000

1,563

181

02/07/2011

05/04/2011

4

Heritage Health Prize

500,000

25,316

1,353

04/04/2011 04/04/2013 06/28/2011 09/20/2011

5

Wikipedia’s Participation Challenge

10,000

1,020

90

6

Allstate Claim Prediction Challenge

10,000

1,278

102

07/13/2011

10/12/2011

7

dunnhumby’s Shopper Challenge

10,000

1,872

277

07/29/2011

09/30/2011

8

Give Me Some Credit

5,000

7,730

925

09/19/2011 12/15/2011

9

Don’t Get Kicked!

10,000

7,261

570

09/30/2011

01/05/2012

10

Algorithmic Trading Challenge

10,000

1,406

111

11/11/2011

01/08/2012

11

What Do You Know?

5,000

1,747

239

11/18/2011 02/29/2012

12

Photo Quality Prediction

5,000

1,356

200

10/29/2011 11/20/2011

13

KDD Cup 2012, Track 1

8,000

13,076

657

02/20/2012

14

KDD Cup 2012, Track 2

8,000

5,276

163

02/20/2012 06/01/2012

06/01/2012

15

Predicting a Biological Response

20,000

8,837

699

03/16/2012

06/15/2012

16

Online Product Sales

22,500

3,755

363

05/04/2012

07/03/2012

17

EMI Music Data Science Hackathon - July 21st - 24 hours

10,000

1,319

133

07/21/2012

07/22/2012

18

Belkin Energy Disaggregation Competition

25,000

1,526

165

07/02/2013

10/30/2013

19

Merck Molecular Activity Challenge

40,000

2,979

236

08/16/2012

10/16/2012

20

U.S. Census Return Rate Challenge

25,000

2,666

243

08/31/2012

11/11/2012

21

Amazon.com - Employee Access Challenge

5,000

16,872

1,687

05/29/2013

07/31/2013

22

The Marinexplore and Cornell University Whale Detection Challenge

10,000

3,293

245

02/08/2013

04/08/2013

23

See Click Predict Fix - Hackathon

1,000

1,051

80

09/28/2013

09/29/2013

24

KDD Cup 2013 - Author Disambiguation Challenge (Track 2)

7,500

2,304

237

04/19/2013 06/12/2013

25

Influencers in Social Networks

2,350

2,105

132

04/13/2013 04/14/2013

26

Personalize Expedia Hotel Searches - ICDM 2013

25,000

3,502

337

09/03/2013

27

StumbleUpon Evergreen Classification Challenge

5,000

7,495

625

08/16/2013 10/31/2013

28

Personalized Web Search Challenge

9,000

3,570

194

10/11/2013 01/10/2014

29

See Click Predict Fix

4,000

5,570

532

30

Allstate Purchase Prediction Challenge

50,000

24,526

1,568

02/18/2014

05/19/2014

31

Higgs Boson Machine Learning Challenge

13,000

35,772

1,785

05/12/2014

09/15/2014

32

Acquire Valued Shoppers Challenge

30,000

25,195

952

33

The Hunt for Prohibited Content

25,000

4,992

285

06/24/2014

34

Liberty Mutual Group - Fire Peril Loss Cost

25,000

14,812

634

07/08/2014 09/02/2014

35

Tradeshift Text Classification

36

Driver Telematics Analysis

37

5,000

5,648

375

30,000

36,065

1,528

Diabetic Retinopathy Detection

100,000

7,002

661

11/04/2013

09/29/2013 11/27/2013

04/10/2014 07/14/2014 08/31/2014

10/02/2014 11/10/2014 12/15/2014

03/16/2015

02/17/2015 07/27/2015

38

Click-Through Rate Prediction

15,000

31,015

1,604

11/18/2014

02/09/2015

39

Otto Group Product Classification Challenge

10,000

43,525

3,514

03/17/2015

05/18/2015

40

Crowdflower Search Results Relevance

20,000

23,244

1,326

05/11/2015

07/06/2015

41

Avito Context Ad Clicks

20,000

5,949

414

06/02/2015

07/28/2015

42

ICDM 2015: Drawbridge Cross-Device Connections

10,000

2,355

340

06/01/2015

08/24/2015

43

Caterpillar Tube Pricing

30,000

26,360

1,323

06/29/2015

08/31/2015

44

Liberty Mutual Group: Property Inspection Prediction

25,000

45,875

2,236

07/06/2015

08/28/2015

45

Coupon Purchase Prediction

50,000

18,477

1,076

07/16/2015

09/30/2015

46

Springleaf Marketing Response

100,000

39,444

2,226

08/14/2015 10/19/2015

47

Truly Native?

10,000

3,223

274

08/06/2015

10/14/2015

48

Homesite Quote Conversion

20,000

36,368

1,764

11/09/2015

02/08/2016

49

Prudential Life Insurance Assessment

30,000

45,490

2,619

11/23/2015

02/15/2016

50

BNP Paribas Cardif Claims Management

30,000

54,516

2,926

02/03/2016

04/18/2016

51

Home Depot Product Search Relevance

40,000

35,619

2,125

01/18/2016

04/25/2016

52

Santander Customer Satisfaction

60,000

93,559

5,123

03/02/2016

05/02/2016

53

Expedia Hotel Recommendations

25,000

22,709

1,974

04/15/2016

06/10/2016

54

Avito Duplicate Ads Detection

20,000

8,153

548

05/06/2016

07/11/2016

55

Draper Satellite Image Chronology

75,000

2,734

401

04/29/2016

06/27/2016

Table A.1: Summary of the Competitions in the Data (Full List) Note: The table only considers submissions that received a score. The total reward is measured in US dollars at the moment of the competition.

ii

Public Ranking

Cumulative

of Winner

Frequency

Probability

Probability

1

27

49.09

49.09

2

12

21.82

70.91

3

3

5.45

76.36

4

6

10.91

87.27

5

1

1.82

89.09

6

2

3.64

92.73

11

3

5.45

98.18

54

1

1.82

100.00

Table A.2: Public Leaderboard Ranking of Competition Winners

Number

Cumulative

of Competitions

Frequency

Probability

Probability

1

23,443

71.75

71.75

2

4,577

14.01

85.76

3

1,861

5.70

91.46

4

903

2.76

94.22

5 or more

1,887

5.78

100.00

Table A.3: Number of Competitions by User

iii

µ

SE

λ

SE

σ

SE

ˆ log L(δ)/N

N

unimelb

1.9729

0.1381

58.0749

1.0975

0.006

0.0005

-2.6275

2800

RTA

1.9144

0.1015

46.6825

0.8345

0.0068

0.0005

-2.2438

3129

ChessRatings2

2.1394

0.159

45.2062

1.1435

0.0101

0.0009

-2.1779

1563 25316

hhp

2.54

0.0691

139.3308

0.8757

0.003

0.0001

-3.5129

wikichallenge

2.4773

0.2611

53.1795

1.6651

0.0077

0.001

-2.4616

1020

ClaimPredictionChallenge

1.881

0.1862

59.669

1.6691

0.0024

0.0003

-2.6226

1278

dunnhumbychallenge

1.9335

0.1162

35.5228

0.821

0.009

0.0007

-1.8383

1872

GiveMeSomeCredit

1.6946

0.0557

49.4207

0.5621

0.0079

0.0003

-2.2323

7730

DontGetKicked

1.6483

0.069

74.4886

0.8742

0.0022

0.0001

-2.8461

7261

AlgorithmicTradingChallenge

2.222

0.2109

49.9619

1.3324

0.0067

0.0008

-2.4633

1406

WhatDoYouKnow

1.918

0.1241

46.3084

1.1079

0.0117

0.0009

-2.0821

1747

PhotoQualityPrediction

2.1705

0.1535

22.075

0.5995

0.0077

0.0008

-1.4708

1356

kddcup2012-track1

2.2132

0.0863

78.8734

0.6898

0.0036

0.0002

-3.0284

13076

kddcup2012-track2

2.1044

0.1648

96.2316

1.3248

0.0018

0.0002

-3.3634

5276

bioresponse

2.0129

0.0761

60.6832

0.6455

0.0057

0.0003

-2.6222

8837

online-sales

1.9858

0.1042

43.0122

0.7019

0.0062

0.0004

-2.2444

3755

MusicHackathon

2.7146

0.2354

20.3445

0.5602

0.0027

0.0004

-1.6083

1319

belkin-energy-disaggregation-competition

1.951

0.1519

59.3161

1.5184

0.004

0.0004

-2.4637

1526

MerckActivity

1.9117

0.1244

43.4366

0.7958

0.006

0.0006

-2.3581

2979

us-census-challenge

2.1633

0.1388

50.6728

0.9814

0.0131

0.0008

-2.3429

2666

amazon-employee-access-challenge

2.5105

0.0611

44.1925

0.3402

0.0061

0.0002

-2.2066

16872 3293

whale-detection-challenge

1.936

0.1237

54.1091

0.9429

0.0035

0.0003

-2.57

the-seeclickfix-311-challenge

1.9617

0.2193

34.7874

1.0731

0.0023

0.0004

-2.188

1051

kdd-cup-2013-author-disambiguation

1.8902

0.1228

44.5504

0.9281

0.0057

0.0005

-2.243

2304

predict-who-is-more-influential-in-a-social-network

2.4997

0.2176

39.3508

0.8577

0.003

0.0004

-2.3471

2105

expedia-personalized-sort

2.0033

0.1091

37.0919

0.6268

0.0044

0.0003

-2.1301

3502

stumbleupon

2.1684

0.0867

46.13

0.5328

0.0034

0.0002

-2.3662

7495

yandex-personalized-web-search-challenge

1.6879

0.1212

99.9952

1.6736

0.0024

0.0002

-3.2396

3570

see-click-predict-fix

1.4089

0.0611

56.0822

0.7514

0.0031

0.0002

-2.5511

5570

allstate-purchase-prediction-challenge

1.9117

0.0483

86.6572

0.5533

0.0035

0.0001

-3.0264

24526

higgs-boson

2.2081

0.0523

99.8622

0.528

0.0017

0.0001

-3.2407

35772

acquire-valued-shoppers-challenge

2.0347

0.0659

123.2493

0.7765

0.0012

0.0001

-3.5269

25195

avito-prohibited-content

2.1635

0.1282

69.2081

0.9795

0.0049

0.0004

-2.8636

4992

liberty-mutual-fire-peril

2.3163

0.092

84.3712

0.6932 0.0028

0.0001

-3.145

14812

tradeshift-text-classification

2.3836

0.1231

43.0846

0.5733

0.0048

0.0003

-2.396

5648

axa-driver-telematics-analysis

2.0942

0.0536

98.6269

0.5193

0.0013

0.0001

-3.2956

36065

diabetic-retinopathy-detection

1.8801

0.0731

76.8586

0.9185

0.0036

0.0002

-2.7529

7002

avazu-ctr-prediction

2.6021

0.065

81.6021

0.4634

0.0022

0.0001

-3.0362

31015

otto-group-product-classification-challenge

2.713

0.0458

44.8662

0.2151

0.0043

0.0001

-2.332

43525

crowdflower-search-relevance

2.0708

0.0569

68.1422

0.447

0.0022

0.0001

-2.8531

23244

avito-context-ad-clicks

1.8135

0.0891

56.6256

0.7342

0.0018

0.0001

-2.6281

5949

icdm-2015-drawbridge-cross-device-connections

1.908

0.1035

32.6333

0.6725

0.0167

0.0012

-1.7726

2355

caterpillar-tube-pricing

3.2151

0.0884

61.5938

0.3794

0.0025

0.0001

-2.791

26360

liberty-mutual-group-property-inspection-prediction

2.8362

0.06

63.4536

0.2963

0.0023

0.0001

-2.827

45875

coupon-purchase-prediction

2.1102

0.0643

66.0059

0.4856

0.0027

0.0001

-2.8099

18477

springleaf-marketing-response

2.4308

0.0515

64.4029

0.3243

0.0029

0.0001

-2.7964

39444

dato-native

2.2652

0.1368

33.6794

0.5932

0.0043

0.0004

-2.0923

3223

homesite-quote-conversion

2.2237

0.0529

81.1871

0.4257

0.0031

0.0001

-3.0634

36368 45490

prudential-life-insurance-assessment

2.1082

0.0412

72.0748

0.3379

0.0017

0.0001

-2.9023

bnp-paribas-cardif-claims-management

2.5338

0.0468

61.5269

0.2635

0.0019

0.0001

-2.7772

54516

home-depot-product-search-relevance

2.3944

0.0519

66.5513

0.3526

0.0029

0.0001

-2.808

35619

santander-customer-satisfaction

2.2486

0.0314

68.9879

0.2255

0.0033

0.0001

-2.8727

93559

expedia-hotel-recommendations

2.2155

0.0499

40.0034

0.2655

0.0091

0.0003

-2.2065

22709

avito-duplicate-ads-detection

2.3401

0.1

44.5067

0.4929

0.002

0.0001

-2.4243

8153

draper-satellite-image-chronology

2.7764

0.1386

22.6104

0.4324

0.0037

0.0002

-1.4435

2734

Table A.4: Maximum Likelihood Estimates of the Cost and Arrival Distributions. Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’

iv

Type 1

Type 2

σ3

κ3

ˆ log L(δ)/N

N

1.4842 0.5608

0.6113

0.1085

0

-1.9818

2800

9.6423

0.4896

0.3241

10.212

0.1099

0.093

-0.8111

3129

3.1253

0.5515 0.0776

4.2695

0.7443 0.6963

-1.3112

1563

σ1

κ1

µ2

unimelb

2.8456

0.9588

0.4392

-0.1946

RTA

10.2009 0.5641 0.5829 0.2261

Type 3 µ3

µ1

σ2

κ2

ChessRatings2

2.624

0.6269

hhp

4.3531

0.5544 0.6499

3.2627

0.562

0.3136

4.5789

0.1858

0.0364

-0.9001

25316

wikichallenge

2.347

0.6357

0.224

2.6796

0.6565 0.0973

3.9466

0.7628 0.6786

-1.3093

1020

0.4024 0.6369

0.7763

0.8546 0.2612

-1.1957

1278

1.4778

0.7282

0.1273

-1.4971

1872 7730

ClaimPredictionChallenge

-0.0574

0.3426 0.1019

-0.4416

dunnhumbychallenge

1.0278

1.1346 0.4695

0.204

0.7564

GiveMeSomeCredit

5.1508

0.8246

0.1834

0.9064

0.5082 0.0857

3.4727

1.4605 0.7309

-1.8215

DontGetKicked

0.5705

0.4243 0.2847

0.9868

0.598

0.1423

2.4957

0.852

0.5729

-1.4494

7261

AlgorithmicTradingChallenge

6.4594

0.4204 0.0273

10.229

1.4068

0.8673 10.7348

0.5171 0.1054

-1.7009

1406

WhatDoYouKnow

2.4412

0.7158 0.2094

2.8922

0.5075

0.0459

4.0498

0.9616

0.7447

-1.4859

1747

PhotoQualityPrediction

4.1049

0.9689 0.5888

2.6271

0.9318 0.3054

1.0702

0.13

0.1058

-1.553

1356

kddcup2012-track1

0.7896

0.0757

0.9738

0.4139 0.2522

0.7355

1.1913 0.6721

-1.4368

13076

kddcup2012-track2

1.2462

1.2097 0.8524

0.9289

0.1845 0.0351

1.6984

0.5459 0.1125

-1.5371

5276

bioresponse

5.8904

0.3569 0.0563

3.7033

0.5922 0.0473

5.817

0.9204 0.8964

-1.3153

8837

online-sales

3.6076

0.4193 0.0911

3.201

0.316

0.0256

5.1005

0.8893 0.8834

-1.3482

3755

MusicHackathon

9.5678

0.2218

0.0001 10.2019 0.6746

0.7376

11.6405 0.9884 0.2623

-1.4502

1319

belkin-energy-disaggregation-competition

2.7039

0.0395 0.2056

2.6578

0.5265 0.5099

2.7014

0.0121 0.2845

0.2845

1526

MerckActivity

1.938

0.5368 0.4059

1.2421

1.3096

1.0949

0.0834

-1.3706

2979

0.4

0.4032

0.5907

0.0034

us-census-challenge

10.4306 0.5038 0.1446

10.6871 0.2188 0.7107

10.717

0.0741 0.1447

0.0949

2666

amazon-employee-access-challenge

1.6573

1.3475 0.8911

3.2464

0.7082 0.0759

-0.6308

0.0059

0.033

-1.6575

16872

whale-detection-challenge

0.6838

0.6825

0.2054

1.0976

0.5896

0.0403

2.7663

1.1109

0.7543

-1.6481

3293

the-seeclickfix-311-challenge

3.9408

1.1063 0.5713

2.1233

0.442

0.0425

2.5888

0.5668 0.3861

-1.5705

1051

kdd-cup-2013-author-disambiguation

2.9824

2.9823

0.0002 0.0001

3.5083

0.823

0.8564

0.2645

2304

predict-who-is-more-influential-in-a-social-network

2.8285

0.3657

1.293

0.3108

0.1271

-1.2194

2105

expedia-personalized-sort

2.361

0.9221 0.3556

1.3225

0.5795 0.4504

1.5107

0.4878 0.1941

-1.3436

3502

stumbleupon

3.5835

0.1942 0.0225

2.9551

0.9197 0.7269

1.0064

0.4782 0.2507

-1.3964

7495

yandex-personalized-web-search-challenge

3.8794

0.005

0

3.7799

1.0433

0.7789

3.8832

0

0.2211

0.1459

3570

see-click-predict-fix

4.6827

0.39

0.0571

5.3728

0.7071 0.3339

6.9452

1.1138

0.609

-1.614

5570

allstate-purchase-prediction-challenge

3.4057

1.747

0.8905

4.2364

0.109

4.4644

0.1457 0.0005

1.1056

24526

higgs-boson

1.395

1.2432 0.8077

0.2407

0.6681 0.0546

2.6722

0.6975 0.1377

-1.6292

35772

acquire-valued-shoppers-challenge

-0.0677

0.0119 0.0764

0.8687

0.6671 0.6979

1.6765

0.6132 0.2257

-1.137

25195

avito-prohibited-content

1.1865

1.3408

0.234

2.1859

1.3331

0.0988

3.8849

1.2947

0.6672

-1.859

4992

liberty-mutual-fire-peril

0.8113

0.616

0.2018

2.077

0.7348

0.3789

1.5279

0.465

0.4193

-1.1671

14812

0

0.1435

0.8

0.5073

3.69

0.5483

0

tradeshift-text-classification

8.5326

0

0.036

9.0426

1.3183

0.943

5.9231

0.002

0.021

-1.2686

5648

axa-driver-telematics-analysis

-0.2973

0.3988

0.4406

0.573

0.3832

0

1.1602

0.9544 0.5594

-1.4357

36065

diabetic-retinopathy-detection

-1.0562

0

0.2727

0.1629

1.0689

0.591

-1.0528

0.0157 0.1363

1.2467

7002

avazu-ctr-prediction

7.2044

0.9141 0.9355

4.5068

0

0.0598

7.6576

0.0615

0.0046

-0.7507

31015

otto-group-product-classification-challenge

5.9335

0.8821

0.3693

1.3099

0.2037

0.0741

4.4623

1.4846

0.5566

-1.6457

43525

crowdflower-search-relevance

1.9197

0.7375

0.6164

0.7426

0.4648

0.2318

2.6715

0.6321

0.1518

-1.2667

23244

avito-context-ad-clicks

7.1874

0.7949 0.0293

8.4222

0.6868

0.7093

8.2107

0.1875

0.2613

-0.9142

5949

icdm-2015-drawbridge-cross-device-connections

0.3269

0.5145

0.008

0.8898

0.8418

0.0689

-1.3237

4.1569 0.9231

-2.6078

2355

caterpillar-tube-pricing

5.2892

0.5212 0.1598

4.7022

0.1088

0.0068

7.2057

0.9754

-1.4288

26360

0.8334

liberty-mutual-group-property-inspection-prediction

2.1033

0.6899

0.2566

2.9815

0.8675 0.2213

4.1077

0.9027 0.5221

-1.5068

45875

coupon-purchase-prediction

-0.7873

1.6607

0.8421

0.3649

0.6218 0.1506

0.2726

0.1864 0.0073

-1.7835

18477

springleaf-marketing-response

2.7418

0.7053 0.5749

3.8728

0.7375

0.2151

3.5606

0.4536

0.21

-1.2708

39444

dato-native

0.645

0.9252 0.1826

1.2624

1.0538 0.0255

3.0788

1.241

0.7919

-1.7773

3223

homesite-quote-conversion

5.0378

1.065

0.8078

2.2916

0.8525

5.6097

0.1344

0.0428

-1.3459

36368

prudential-life-insurance-assessment

1.9303

0.5616 0.3456

2.7204

0.7883 0.1644

4.036

0.8935

0.49

-1.5333

45490

bnp-paribas-cardif-claims-management

6.4413

0.5791 0.7302

5.4212

0.6353 0.1911

6.5963

0.167

0.0788

-0.8207

54516

home-depot-product-search-relevance

2.5764

0.6399 0.5672

3.3866

0.5433

3.0398

0.2081

0.3044

-0.928

35619

santander-customer-satisfaction

2.1517

0.8139 0.1689

4.5375

1.0418

0.7326

5.2545

0.3388

0.0984

-1.4068

93559

expedia-hotel-recommendations

0.7299

1.1005 0.7993

1.4984

0.264

0.1614

1.5658

0.0259 0.0393

-0.8216

22709

avito-duplicate-ads-detection

0.9448

0.3428 0.4645

1.1552

0.0917

0.0997

2.3054

0.9791

0.4359

-1.3727

8153

draper-satellite-image-chronology

-0.1587

0.2192 0.4371

0.2675

0.8138

0.262

-0.1135

0.1719 0.3009

-0.6571

2734

0.1494

0.1284

Table A.5: EM Algorithm Estimates for the Type-specific Distribution of Scores, qθ . Note: The model is estimated separately for each contest. µi and σi are the parameters in type i’s log(s/(1−s))−µi ˆ distribution of scores Qi (s) = Φ . κi is the fraction of players of type i. log L(δ)/N is σi the value of the log-likelihood function evaluated at the EM estimates. Standard errors are available.

v

α

SE

SE

N

0.0561

2800

1.0021

0.8608

3129

0.1823 1.0176

0.1908

1563

1.0025

0.0737

25316

0.3367

0.9998

0.3497

1020

0.0787

0.9437

0.1388

1278

β

unimelb

-0.0115

0.0413 1.0233

RTA

-0.0023

0.8605

ChessRatings2

-0.0177

hhp

0.0022

0.0717

wikichallenge

0.0001

ClaimPredictionChallenge

0.0244

dunnhumbychallenge

0.0047

0.073

1.0153

0.1041

1872

GiveMeSomeCredit

0.0016

0.0609

0.9989

0.066

7730

DontGetKicked

0.0019

0.0611

1.0013

0.0716

7261

AlgorithmicTradingChallenge

0.0013

0.3561

0.9987

0.3581

1406

WhatDoYouKnow

0.006

0.1801

0.994

0.1893

1747

PhotoQualityPrediction

-0.0156

0.2402

1.0174

0.254

1356

kddcup2012-track1

-0.0149

0.0259

0.9655

0.0379

13076

kddcup2012-track2

-0.0057

0.0444

0.9968

0.0587

5276

bioresponse

0.0374

0.127

0.9635

0.1293

8837

online-sales

-0.0321

0.4462

1.0323

0.4524

3755

MusicHackathon

0.0003

0.8621

0.9997

0.8634

1319 1526

belkin-energy-disaggregation-competition

0.0128

0.2171 1.0157

0.2346

MerckActivity

-0.0074

0.0731

1.0083

0.0908

2979

us-census-challenge

0.0082

0.5591 0.9918

0.5601

2666

amazon-employee-access-challenge

0.0263

0.0306 0.9748

0.0368

16872

whale-detection-challenge

0.0039

0.0806

0.9961

0.0928

3293

the-seeclickfix-311-challenge

0.0121

0.3505

0.9877

0.3701

1051

kdd-cup-2013-author-disambiguation

-0.0003

0.2136 0.9996

0.2228

2304

predict-who-is-more-influential-in-a-social-network

0.0479

0.1612

0.9494

0.1741

2105

expedia-personalized-sort

0.0496

0.0908

0.9407

0.1078

3502

stumbleupon

0.0297

0.0736

0.9875

0.0815

7495

yandex-personalized-web-search-challenge

-0.0002

0.1021

1.0004

0.1074

3570

see-click-predict-fix

-0.0023

0.9291

1.0022

0.9321

5570

allstate-purchase-prediction-challenge

0.0005

0.0197

1.0043

0.0221

24526

higgs-boson

-0.0002

0.0183 1.0168

0.0224

35772

acquire-valued-shoppers-challenge

-0.0139

0.033

1.0105

0.043

25195

avito-prohibited-content

-0.0003

0.0521

0.9999

0.0574

4992

liberty-mutual-fire-peril

0.044

0.0449 0.9083

0.054

14812

tradeshift-text-classification

0.0004

0.4734

0.9996

0.4746

5648

axa-driver-telematics-analysis

-0.0019

0.0179 1.0019

0.0253

36065

diabetic-retinopathy-detection

-0.0082

0.0285

1.0106

0.0489

7002

avazu-ctr-prediction

0.0006

0.0683

0.9994

0.0688 31015

otto-group-product-classification-challenge

0.0002

0.0309

0.9997

0.0321

43525

crowdflower-search-relevance

0.0174

0.0362

0.986

0.0426

23244

avito-context-ad-clicks

-0.0001

0.2656

1.0001

0.2665

5949

icdm-2015-drawbridge-cross-device-connections

-0.0007

0.0326

1.0014

0.0579

2355

caterpillar-tube-pricing

-0.014

0.3919

1.014

0.393

26360

liberty-mutual-group-property-inspection-prediction

0.0061

0.0383 0.9961

0.0407

45875

coupon-purchase-prediction

0.033

0.0134

0.9022

0.0279

18477

springleaf-marketing-response

0.0092

0.052

0.9894

0.0553

39444

dato-native

0.008

0.0721

0.9931

0.0823

3223

homesite-quote-conversion

0.0028

0.0401

0.997

0.0417

36368 45490

prudential-life-insurance-assessment

0.0092

0.042

0.9933

0.0447

bnp-paribas-cardif-claims-management

0.0036

0.073

0.9964

0.0737

54516

home-depot-product-search-relevance

-0.0002

0.044

0.9999

0.047

35619

santander-customer-satisfaction

0.028

0.0267

0.972

0.0282

93559

expedia-hotel-recommendations

0.0006

0.019

0.9983

0.0269

22709

avito-duplicate-ads-detection

0.0016

0.0632

0.9985

0.0754

8153

draper-satellite-image-chronology

0.1325

0.0643

0.8211

0.1141

2734

Table A.6: Maximum Likelihood Estimates of the Distribution of Private Scores Conditional on Public Scores. Note: The Conditional Distribution is Assumed to be Given by pprivate = α + βppublic + , with Distributed According to a Double Exponential Distribution. The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’

vi

Average change relative to baseline Baseline

No leaderboard

No Noise

1 Prize

14151.7390

5308.3878

-566.9941

-4.9069

-1176.9672

(14136.1268,14167.3513)

(5288.2487,5328.5269)

(-580.8436,-553.1445)

(-8.3663,-1.4476)

(-1187.1811,-1166.7534)

Submissions High type submissions Score p95 Score p99 Score max

Limited Participation

13.5304

-3.0907

-0.6851

0.0032

-0.9026

(13.5155,13.5453)

(-3.1062,-3.0753)

(-0.6997,-0.6705)

(-0.0012,0.0076)

(-0.9129,-0.8923)

0.9641

-0.0080

-0.0002

-0.0000

0.0006

(0.9641,0.9641)

(-0.0080,-0.0080)

(-0.0002,-0.0001)

(-0.0000,0.0000)

(0.0006,0.0006)

0.9810

-0.0033

-0.0001

-0.0000

0.0003

(0.9810,0.9810)

(-0.0033,-0.0033)

(-0.0001,-0.0001)

(-0.0000,0.0000)

(0.0003,0.0003)

0.9944

-0.0005

-0.0000

-0.0000

-0.0000

(0.9944,0.9945)

(-0.0005,-0.0005)

(-0.0001,-0.0000)

(-0.0000,0.0000)

(-0.0001,-0.0000)

Table A.7: Contest Outcomes Under Counterfactual Contest Designs Using Estimates for the Flow Cost Specification of the Model Note: Average changes weighted by the size of the prize. The predictions are based on 2, 000 simulations of each contest using our model estimates for the flow cost specification of the model. Submissions are the total number of submissions; high type submissions are the average number of submissions by each high-type contestant; score p95 is the 95th percentile of the score distribution of the contest; score max is the maximum score of the contest. A high type is defined as the player type that maximizes µθ + 3σθ for each contest (i.e., the player better able to achieve extreme score). Bootstrapped 95 percent confidence intervals in parenthesis.

Myopic

Forward Looking

σ

SE

ˆ log L(δ)/N

ˆ log L(δ)/N

N

unimelb

0.0054

0.0005

-2.6273

0.0054 0.0006

-2.6199

2800

RTA

0.0053

0.0004

-2.2421

0.0053

0.0003

-2.2288

3129

ChessRatings2

0.0083

0.0008

-2.1763

0.0083 0.0007

-2.1606

1563

hhp

0.0033

0.0001

-3.5127

0.0033 0.0001

-3.5103

25316

wikichallenge

0.0062

0.0009

-2.4604

0.0062

-2.4477

1020

σ

SE

0.0008

Table A.8: Estimates of the Distribution of Submission Costs: Myopic versus Forward Looking Players Note: The model is estimated separately for each contest. Asymptotic standard errors are reported in the columns that are labeled ‘SE.’ In both models, we consider the case where players pay a fixed submission cost. The myopic model is our benchmark model, while the model with forward looking players is discussed in Section 3.1.2.

vii

.8

Max public score .9 .85 .95

1

Competition: Expedia Hotel Recommendation

.2

.4 .6 .8 Fraction of time completed

1

Figure A.1: Evolution of the Maximum Public Score in the Expedia Hotel Recommendation Contest. The Jump in the Maximum Public Score Captures a Drastic Submission

B

Effect of the Noise Parameter α

Consider the case in which the noise is proportional to the difference between the scores of Nature and the agent. Suppose the agent’s score is y and Nature’s score is x, with y > x. Let ∆ = y − x be the difference between the scores. Then, then the agent wins with probability α(∆) =

  1 2

+

∆ 2∆max

 1

if ∆ ∈ [0, ∆max ], otherwise.

We assume that ∆max is small and we perform comparative statics on ∆max . Notice that ∆max → 0 is the baseline case with α = 1 and recall that in this case, the agent plays if and only if x ∈ Iθ = [aθ , aθ + bθ ]. For simplicity, we assume that 2∆max < bθ . We have several cases. • x < aθ − DM . In this case, the agent wins with probability one even without playing. Therefore, the best response is to not play. • x ∈ [aθ − DM, aθ ]. In this case, by not playing the agent wins with probability α(aθ − x) and by playing the agent wins with probability 1. Therefore, the agent

viii

plays if and only if 1 − c ≥ α(aθ − x) ⇔ x ≥ aθ + (2c − 1)∆max • x ∈ [aθ , aθ + DM ]. In this case, by not playing the agent wins with probability 1 − α(x − aθ ) and by playing the agent wins with probability 1. Therefore, the agent plays if and only if 1 − c ≥ 1 − α(x − aθ ) ⇔ x ≥ aθ + (2c − 1)∆max • x ∈ [aθ + DM, aθ + bθ − DM ]. In this case, by not playing the agent never wins and by playing the agent wins with probability 1. Therefore, the agent always play. • x ∈ [aθ + bθ − DM, aθ + bθ ]. In this case, by not playing the agent never wins and by playing the agent wins with probability α(aθ + bθ − x). Therefore, the agent plays if and only if α(aθ + bθ − x) − c ≥ 0 ⇔ x ≤ aθ + bθ − (2c − 1)∆max • x ∈ [aθ + bθ , aθ + bθ + DM ]. In this case, by not playing the agent never wins and by playing the agent wins with probability 1 − α(x − aθ − bθ ). Therefore, the agent plays if and only if 1 − α(x − aθ − bθ ) − c ≥ 0 ⇔ x ≤ aθ + bθ − (2c − 1)∆max • x > aθ + bθ + DM . In this case, the agent losses for sure, and therefore, the best response is to not play. Denote by Iθ∆max the values of x such that the best response of the agent is to play. It is easy to see that for c < 12 , Iθ∆max ⊃ Iθ0 . Therefore, by increasing the amount of noise there is more participation by every type.

ix

Dynamic Tournament Design: An Application to ...

Jul 14, 2017 - data to extract useful insights;3 companies can procure this data using their in-house ... We use public information from Kaggle4âa company primarily dedicated to hosting ... is posted in real-time on a public leaderboard on the website. .... that when the number of participants is reduced by 10 percent in ...

Download PDF

616KB Sizes 0 Downloads 208 Views

Report

Dynamic Tournament Design: An Application to ...

Recommend Documents