Combining Artificial Intelligence Methods: Automating ...

Viewer
Transcript

Imperial College London Department of Computing

Combining Artificial Intelligence Methods: Automating the Playing of DEFCON by Robin Baumgarten

MSc in Advanced Computing Individual Project Report

Simon Colton Duncan F. Gillies September 2007

Abstract

In the commercial video game industry, computer opponents that act intelligently are increasingly important, especially as better graphical effects decline to serve as a driving force for the commercial success of a game. The methods used by developers to create these bots are often obsolete and struggle to scale with the complexity of modern games. Nonetheless the use of modern artificial intelligence techniques used by researchers is rarely seen in video games. In this project, we designed and implemented a computer opponent for the realtime strategy game DEFCON by combining artificial intelligence methods such as case-based reasoning, decision tree algorithms and hierarchical planning. Highlevel strategy plans for matches are automatically created by querying a case base of recorded matches and building a plan decision tree. The development of an automated opponent for a complex video game required the application of many different techniques to receive, store, process and predict game information. For this purpose, alongside a high-level reasoning system, we use secondary AI techniques like simulated annealing and influence mapping to create a reactive and learning bot. We applied these techniques in DEFCON and created a competitive bot that can beat the AI bot developed by Introversion consistently. The importance of small-scale tactics in this game requires careful unit control, which we incorporated through various methods, such as a movement desire model, fleet formations and a synchronous attack algorithm. Extensive testing was conducted to optimise and fine-tune the efficiency of these optimisation algorithms. Comprehensive training of high-level plans enabled our bot to learn potent strategies that provided a win ratio of over 75% against the official AI bot developed for DEFCON by Introversion.

Acknowledgements

I am very grateful to my project supervisor, Simon Colton, for his constant advice and helpful hints during the project. His ability to let me realise my ideas and at the same time give competent guidance has been essential to my work. Thanks are also extended to the Introversion team for providing their well-written source code to DEFCON, their great technical support and interest in my project. My fellow students and now friends at Imperial College deserve my thanks for providing an enjoyable work environment and much good advice throughout the entire course. My deepest thanks go to my parents and my sister for their encouragement. Finally, I owe very special thanks to Judith Vogelsang for her caring support and for having been more nervous about my exams and thesis than I was.

‘A strange game. The only winning move is not to play. How about a nice game of chess?’ Joshua — Wargames, 1983

Contents Abstract

i

Acknowledgements 1 Introduction 1.1 Motivation . . . . . 1.2 Aims of the Project 1.3 Contributions . . . . 1.4 Report Structure . .

iii

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 1 2 2 3

2 Artificial Intelligence Methods 2.1 Machine Learning . . . . . . . . . 2.1.1 Decision Tree Learning . . 2.1.2 Genetic Algorithms . . . . 2.1.3 Artificial Neural Networks 2.1.4 Reinforcement Learning . 2.2 Case-Based Reasoning . . . . . . 2.3 Other Problem Solving Methods 2.3.1 Planning . . . . . . . . . 2.3.2 Simulated Annealing . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

4 5 5 7 9 10 11 12 12 15

3 Artificial Intelligence in Games 3.1 Use of AI in Games in the Past . 3.2 AI Techniques specific to Games 3.2.1 Dynamic Scripting . . . . 3.2.2 Minimax Decision Trees . 3.2.3 Plan Recognition . . . . . 3.2.4 Finite State Machines . . 3.2.5 Influence Maps . . . . . . 3.3 Board Games . . . . . . . . . . . 3.4 Video Games . . . . . . . . . . . 3.4.1 Academic Research . . . . 3.4.2 Commercial Games . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

17 17 18 18 18 19 19 21 22 23 23 24

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

v

vi

Contents 3.5

DEFCON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Description of DEFCON . . . . . . . . . . . . . . . . . . . . . 3.5.2 AI in DEFCON . . . . . . . . . . . . . . . . . . . . . . . . . .

26 26 35

4 Overview of a DEFCON Match 37 4.1 Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 During the Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 End of Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 System Design 5.1 Planning . . . . . . . . . . . . . . . . . . . . 5.1.1 Metafleets . . . . . . . . . . . . . . . 5.1.2 Attacks . . . . . . . . . . . . . . . . 5.2 Learning . . . . . . . . . . . . . . . . . . . . 5.2.1 Case Based Reasoning . . . . . . . . 5.2.2 Decision Tree Generalisation . . . . 5.2.3 Evolutionary Based Plan Generation 6 Learning a Case Base of Plans 6.1 Building a Case . . . . . . . . . . 6.1.1 Plan Representation . . . 6.1.2 Structure Information . . 6.1.3 Recording Opponent Data 6.2 Filling the Case Base . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

42 43 43 45 45 46 48 48

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

52 53 53 53 55 57

7 Automatically Generating Game Plans 7.1 Case Retrieval . . . . . . . . . . . . . . . . . . 7.1.1 Using General Game Information . . . . 7.1.2 Using Opponent Behaviour Information 7.2 Decision Tree Generalisation . . . . . . . . . . 7.2.1 Plans as Training Examples . . . . . . . 7.2.2 Extending the Entropy Function . . . . 7.2.3 ID3 for Plan Generalisation . . . . . . . 7.3 Evolutionary Based Plan Generation . . . . . . 7.4 Worked Example of Generating a Plan . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

59 59 59 60 60 61 62 64 66 66

. . . .

68 68 70 70 71

8 Carrying Out Game Plans 8.1 Initial Structure Placement . 8.2 Handling High-Level Plans . 8.2.1 Controlling Attacks . 8.2.2 Controlling Metafleets

. . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

8.3 8.4 8.5 8.6 8.7

Low Level Actions . . . . . . . . . Influence Maps . . . . . . . . . . . Movement Desire Model . . . . . . Ship Formation . . . . . . . . . . . Resolving the Assignment Problem

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

74 78 80 82 85

9 Experimentation and Results 87 9.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 9.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 9.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 10 Conclusion 105 10.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.2 Limitations & Future Work . . . . . . . . . . . . . . . . . . . . . . . 107 10.3 Final Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Appendices

108

A Selected Source Code 109 A.1 Simulated Annealing Algorithm . . . . . . . . . . . . . . . . . . . . . 109 A.2 Selecting Optimal Structure Placement . . . . . . . . . . . . . . . . . 111 A.3 Implementation of the ID3 algorithm . . . . . . . . . . . . . . . . . . 112 B Proofs

113

Glossary

115

Bibliography

116

vii

viii

1 Introduction 1.1 Motivation Computer games have become an increasingly popular test bed for applications of AI methods. While algorithms like the A* path finding algorithm are now inherent parts of nearly every game, more advanced research methods, machine learning methods in particular, are still relatively rare. But now that the role of entertaining AI in commercial computer games has changed from an optional add-on to an integral component, applying these methods and using research knowledge has become more important than it was few years ago. Single player games have always needed an artificial opponent that acted at least vaguely human, but the simple measures that were once sufficient to simulate this behaviour are no longer able to scale to the immense complexity of modern computer games. This complexity can make it very hard for developers to take every aspect and possible action into account when designing the behaviour of the AI. Machine learning provides a flexible remedy to this problem. Several methods have been developed in a research context that are able to automatically explore the often extensive possibilities of a game world and extract rules to create competitive and entertaining control of non-player characters. DEFCON from Introversion Software is a game that shows these complex characteristics. Up to six participating parties use offensive and defensive means in this real time strategy game to obtain an advantage over their combatants. The very large state space and the reactive nature of this game make it a very interesting test-bed for machine learning methods.

1

2

Chapter 1. Introduction

1.2 Aims of the Project The goal of this project is to design and implement a computer opponent (bot) for the real-time strategy game DEFCON, which improves the existing AI bot through the application of advanced research methods and to show soundness and superiority of the chosen methods. The inclusion of previous matches into strategic considerations and the creation of individual strategic profiles specific to the participating players is hoped to enable responsive behaviour while also maintaining background knowledge to avoid the repetition of mistakes and create a diverse set of strategies. Research into the existing AI-bot of DEFCON and into state-of-the-art AI methods is used to reveal improvements and yield approaches that are applicable to this game. The use of hierarchical planning and influence maps will be examined to cover the broad range of actions required in a typical real time strategy (RTS) game, namely a fair amount of foresight and long term strategy, as well as short term tactics. To support the chosen approach, we evaluate the performance against the existing AI with respect to both the employed learning algorithms and the improved low level actions and metrics used in location evaluation methods.

1.3 Contributions We developed a new AI bot for the video game DEFCON, which is able to beat the bot built by the makers of this game, Introversion, with a probability of more than 75%. It employs hierarchical planning that combines improved low level actions which rely on a simulated annealing optimisation process and a desire based movement model and high-level actions that control the strategy of the bot. Our approach implements a combination of AI techniques — Case-Based Reasoning, Decision Trees, Planning and Evolutionary Based Plan Generation — to include previous game experience into strategic considerations and create individual

1.4. Report Structure

3

high-level plans specific to the current environment of a game. We applied these techniques successfully to DEFCON, where we have created a superior computer opponent that can outperform the existing AI bot with a win-ratio of over 75%.

1.4 Report Structure Chapter 2: Artificial Intelligence Methods provides an overview of current AI methods, including machine learning techniques and optimisation algorithms. Chapter 3: Artificial Intelligence in Games shows how AI methods have been used in games and presents the computer game DEFCON, of which a formal description is given. The existing bot in DEFCON is also analysed. Chapter 4: Overview of a DEFCON Match gives an outline of a match of DEFCON, putting the techniques explained in the next chapters into context. Chapter 5: System Design explains the tasks of a computer opponent and justifies the choice of AI methods used. The structure of a strategy plan is developed and its ramifications for the system analysed. Chapter 6: Learning a Case Base of Plans details the learning process and the use of a case base to store recorded matches. Chapter 7: Automatically Generating Game Plans explains how the case base is used to create game plans with the help of a decision tree. Chapter 8: Carrying out Game Plans shows how acquired plans are implemented into concrete actions in the game. Chapter 9: Experimentation and Results presents experiments that were carried out to guide the development process and to evaluate the performance of the learning algorithm. Chapter 10: Conclusion summarises the achievements of the project and outlines limitations and future work.

2 Artificial Intelligence Methods The task of writing a computer program that plays a video game against a human opponent requires methods that are intelligent, or at least try to appear intelligent. The topic of creating and studying intelligent agents, which Russell and Norvig [1995] defines as agents that perceive their environment and execute actions that maximise their chances of success, is known as the topic of Artificial Intelligence (AI). Like a human player, we want our playing agent (bot) to learn from the actions of the opponent and from its own mistakes. This learning process should take place automatically, and algorithms that have been developed for this task are known as Machine Learning (ML) algorithms, which are a subtopic of AI. Using machine learning we can approximate a model of the game world and the behaviour of the opponents. This is useful for predicting future states of the world, helping our bot to come to the right decisions. Another topic in AI are general problem solving algorithms. These techniques include applying heuristics and elaborated data structures to break down the complexity of an otherwise potentially NP-hard problem, a problem that cannot be solved exactly with conventional computers in a reasonable amount of time. Many AI methods adopt ideas from nature, the brain and physics. For example, genetic algorithms incorporate the concept of the survival of the fittest and genetic mutation to generate a “population” of solutions that evolves towards an optimum. Another example is simulated annealing, which is inspired by the controlled heating and cool-down process of metals in order to create crystals and remove impurities. We will apply some of these AI methods, using machine learning and other problem solving methods, to create a bot for a video game. As can be seen in the

4

2.1. Machine Learning

5

later chapters, we employ decision tree algorithms, case-based reasoning, a form of evolutionary algorithms, planning and simulated annealing. In this chapter, we present all of these methods and also include methods that we initially considered and compare our solutions to.

2.1 Machine Learning 2.1.1 Decision Tree Learning

Figure 2.1: A sample decision tree (Image taken from Wikipedia [2005])

A decision tree is a model to classify instances of a problem into leaf nodes of a tree structure. These leaf nodes represent conclusions drawn by a predictive algorithm, while the intermediate nodes represent tests for specific attributes. Therefore, the decision function is represented by the tree. Due to the discrete nature of a tree, problems with discrete-valued target functions are especially suited for decision tree learning. In particular, Mitchell [1997] proposes its use for the following problem classes:

6

Chapter 2. Artificial Intelligence Methods • Instances that can be represented by attribute-value pairs, optimally with a discrete value space. • Problems that can be described by disjunctive expressions. • Non-perfect training data. Decision tree learning methods can cope well with errors in training data. • Incomplete information in training data.

Decision tree learning methods recursively split training data into subsets. The instances belonging to each subset are decided by an attribute value test which is then associated with the corresponding node in the learned tree.

ID3 ID3 (Iterative Dichotomiser 3) is an iterative algorithm that is used to construct a decision tree from a given set of examples. It uses the notion of entropy and information gain: Definition 2.1. Let S be a collection of examples of a target concept that can take on c different values. (i) The entropy of S relative to c-wise classification is given as Entropy(S) ≡

c X

−pi log2 pi

(2.1)

i=1

where pi is the proportion of examples classified into the i-th category. (ii) The gain of an attribute A relative to S is defined as Gain(S, A) ≡ Entropy(S) −

X

v∈V alues(A)

|Sv | Entropy(Sv ) |S|

(2.2)

where Values(A) is the set of all possible values of A, Sv = {s ∈ S|A(s) = v}.

2.1. Machine Learning

7

Thus the entropy of a set classifies the purity of its elements with respect to an attribute. The gain is a measure of how well an attribute classifies the given data, i.e. how much the entropy changes when we classify a set according to that attribute. The ID3 algorithm builds a decision tree by iteratively choosing the attribute with the highest information gain to split the data. It is a greedy algorithm that grows the decision tree top-down until all attributes have been used or all examples are perfectly classified. A pseudo-code listing for boolean classification ID3 is given in figure 1.

2.1.2 Genetic Algorithms The concept of Genetic Algorithms (GAs) is loosely based on the theory of biological evolution. Beginning with an initial set (called a population) of individuals, the genetic algorithm method applies random modifications (called mutations) and recombinations (called crossovers) to appropriate members of this set to form the next population. Fitness functions are used to probabilistically determine these appropriate members. This process is repeated until a hypothesis reaches a certain optimality condition. Hypotheses are usually bit-strings or computer programs, in the latter case the concept is called genetic programming. As promising hypotheses are more likely to be explored, the method of genetic algorithms forms a generateand-test hill-climbing beam search.GAs tend to be computationally expensive, as a big search space has to be covered to find optimal solutions. GAs also do not guarantee performance nor success due to the randomness involved. Furthermore, GAs often need extensive tweaking and testing, it is sometimes hard to see if a bad solution was caused by faulty code or unevolved behaviour and it is nearly impossible to add functionality to a problem solved by GAs, for example the addition of a new attribute (extending the gene) usually requires a complete restart of the learning process. However, Schwab [2004] states that they can be useful for problems that involve • highly non-linear interacting parameters • many local maximums

8

Chapter 2. Artificial Intelligence Methods input : Examples, TargetAttribute, Attributes output: Decision tree that correctly classifies the Examples Create a root node for the tree; if all examples positive then Return the single-node tree Root, with label = + end if all examples negative then Return the single-node tree Root, with label = end if number of predicting attributes is empty then Return the single node tree Root, with label = most common value of the target attribute in the examples else A = The Attribute that best classifies examples; Decision Tree attribute for Root = A; foreach possible value vi of A do Add a new tree branch below Root, corresponding to the test A = vi; Let Examples(vi), be the subset of examples that have the value vi for A; if Examples(vi) empty then below this new branch add a leaf node with label = most common target value in the examples else below this new branch add the subtree ID3 (Examples(vi), TargetAttribute, Attributes {A}) end end end return Root Figure 2.2: Summary of the ID3 algorithm [Mitchell, 1997]

• discontinuous output solution functions • costly computation of decisions. GAs can be used to replace this costly function with a learned algorithm. In addition, genetic algorithm learning is easy to set up and produces results immediately.

9

2.1. Machine Learning

2.1.3 Artificial Neural Networks Hidden Input Output

Figure 2.3: An artificial neural network with a hidden layer

Like genetic algorithms, Artificial Neural Networks (ANNs) are inspired by biological processes, in this case by the processes of learning and the neurological functions of the brain. Neurons are modelled by nodes of a layered directed network (Figure 2.3). The so called hidden layers between the input layer and the output layer (each with a corresponding node for each input/output) have a predefined number of nodes and store the information of the ANN. Each layer is usually fully connected to the next layer and each arc is associated with a variable weight. Given an input1 , the activation function of a node determines the output, considering the sum of the weighted incoming signals2 . These weights can be adjusted to fit the given data through a gradient descent search with methods like backpropagation. There are several points to consider when using ANNs. They can extract abstract relationships and complex mathematical functions well out of training data, and are easy to train and use, once set up. But setting up the ANN and to determine how to train it is often as hard a problem as solving the original problem itself, and inconsistent data might lead to networks that learned the wrong thing and overfit. 1

Often binary, but continuous data can also be used by applying proper activation functions like the hyperbolistic tangent or logistic sigmoid functions [Schwab, 2004]. 2 Additionally biases can be added to this sum to reflect the inhibitory or excitatory effect of the neuron on the network [Schwab, 2004].

10

Chapter 2. Artificial Intelligence Methods

Also, it is hard to scale an ANN, and to debug or understand the weights in a trained network. According to this, the following uses of ANNs have been proposed for problems where: • instances are given through attribute-value pairs • a real-valued, discrete-valued (or a vector thereof) target function is required • the training data might contain errors • training is allowed to take a lot of time • fast evaluation of the target function is desired • it is not important for humans to understand the resulting network.

2.1.4 Reinforcement Learning Reinforcement learning (RL) is a variant of machine learning that does not require predefined training examples but rather proposes examples itself and learns through feedback. RL is concerned with unsupervised learning of autonomous agents through long-term rewards. It is therefore fundamentally different from many other machine learning methods like neural networks and genetic algorithms, where knowledgeable supervisors are required. Agents use stochastic rules (policies) depending on a state to choose actions of the environment and receive rewards as evaluation of the choices made. The aim of the agent is to maximise the rewards received for its actions taken, therefore optimising actions needed to reach an (implicit) goal. To limit the often vast search space, algorithms are needed that find a balance between exploration and exploitation. Several approaches exist to tackle this problem of finding an appropriate policy: Dynamic programming requires a complete model of the environment, but is mathematically well developed

2.2. Case-Based Reasoning

11

Monte Carlo methods are simpler and don not need a model, but are not suited for incremental computation Temporal-difference methods do not require a model and are incremental, but are more complex to analyse [Sutton et al., 1998]. The rate of convergence of a RL method is limited by the size of the state space (roughly determined by number of inputs and actions possible in each state), and therefore a compact but information-rich representation of the state space must be found to ensure reasonably fast convergence [Baekkelund, 2006].

2.2 Case-Based Reasoning Case-Based Reasoning (CBR) is the method of solving problems using solutions of similar past problems. These past problems are stored as cases in a repository called the case memory. Cases are usually comprised of a problem description, a problem solution and the outcome. CBR is an example of a lazy learning method, it postpones generalisation of the stored instances until a new instance is received. Upon reception of such an instance, a typical CBR algorithm follows the following stages [Aamodt and Plaza, 1994]: 1. Case retrieval: The Case Base is searched to retrieve the cases which are closest to the new instance, according to some fitness measure. There exist several methods for selecting best cases, among them k-nearest neighbour and locally weighted regression. 2. Case reuse: The retrieved cases are reused to generate an approximate solution the new problem instance. 3. Case revision: If the proposed solution does not fit to the instance closely, it is adapted (through the use of heuristics), creating a new case, which can be retained. 4. Case retaining: If the instance is new to the case base, it can be added to it together with its outcome.

12

Chapter 2. Artificial Intelligence Methods

The advantages of the use of instance-based learning methods like CBR are that they can estimate the target function locally and differently for each observed instance, and, in fact, do not construct a global approximation of the target function. On the other hand, a lazy learner has to do nearly all computation when a new instance has to be classified instead of when the training examples are encountered. Typical examples include help-desk systems, text classification and e-commerce.

2.3 Other Problem Solving Methods 2.3.1 Planning Planning is the general notion of deciding on a series of actions before acting by using knowledge to lead to a (long-term) goal. According to Schwab [2004], many planning algorithms have a simple underlying strategy: • Divide the actions possible in the environment into operators • State-based design of the environment • Construct a tree or apply operators of states to search actions towards goal He states that by using planning, the perceived intelligence of an AI system can be improved. Generic planning algorithms can be implemented hierarchically to break down the complexity and cost of planning. However, attempting to plan too far in advance can still be very expensive. Goal-Oriented Action Planning (GOAP) has been proposed as a planning method by Orkin [2004], which incorporates the scheme outlined above. It is described as an improvement over finite state machines as it does not require encoding of all possible action sequences and allows for dynamic changes in these sequences. There are several proposed languages proposed for plans: Stanford Research Institute Problem Solver (STRIPS) is the name of the formal

2.3. Other Problem Solving Methods

13

I n i t i a l s t a t e : BoxAt (A) , At (B) , L e v e l ( low ) , BananasAt (C) Goal s t a t e : Have ( Bananas ) Actions : Move (X, Y) P r e c o n d i t i o n s : At (X) , L e v e l ( low ) P o s t c o n d i t i o n s : not At (X) , At (Y) ClimbUp ( L o c a t i o n ) P r e c o n d i t i o n s : BoxAt ( L o c a t i o n ) , At ( L o c a t i o n ) , L e v e l ( low ) P o s t c o n d i t i o n s : L e v e l ( h i g h ) , not L e v e l ( low ) ClimbDown ( L o c a t i o n ) P r e c o n d i t i o n s : BoxAt ( L o c a t i o n ) , At ( L o c a t i o n ) , L e v e l ( h i g h ) P o s t c o n d i t i o n s : L e v e l ( low ) , not L e v e l ( h i g h ) MoveBox (X, Y) P r e c o n d i t i o n s : BoxAt (X) , At (X) , L e v e l ( low ) P o s t c o n d i t i o n s : not BoxAt (X) , BoxAt (Y) TakeBananas ( L o c a t i o n ) P r e c o n d i t i o n s : BananasAt ( L o c a t i o n ) , L e v e l ( h i g h ) , At ( L o c a t i o n ) P o s t c o n d i t i o n : Have ( bananas ) Figure 2.4: An example STRIPS plan. language used for the STRIPS planner3 proposed by Fikes [1971]. A STRIPS instance consists of an initial state, a set of conditions, a set of operators and a goal state. Action Description Language (ADL) is a planning system for robots that assumes the “open world”, i.e., unlike STRIPS it assumes everything not occurring in conditions is unknown (instead of false) [Gelfond and Lifschitz, 1998]. Planning Domain Description Language (PDDL) is a common language to ex3

Originally STRIPS was used as the name of the planner but was later on adopted as the name of the language used for it.

14

Chapter 2. Artificial Intelligence Methods press plans. PDDL tries to standardise planning domain languages and as such contains STRIPS and ADL. It is used by many planners and is the language in use for planning contests. The current version is PDDL3, which does not allow numerical expressions in actions and cannot easily express concurrency.

Hierarchical Task Network Language is a language to describe the dependency among actions as a network, and tasks are divided into primitive, compound and goal tasks. It is useful to describe the hierarchical structure of a plan, and it has the same expressivity as STRIPS [Erol et al., 1994]. Reactive Model-based Programming Language (RMPL) is a language designed by Williams [2001] for Robotic Space Explorers. They are powerful enough to represent concurrent actions, however they require a big framework to be used. Plans expressed in RMPL can be converted to Constraint Satisfaction Problems (see below) through the use of Temporal Plan Networks, which represent all possible executions of an RMPL over a finite window. Constraint Satisfaction Problems (CSPs): This is a simple and powerful formalism used to solve combinatorial problems. Although this is not directly a plan description language, many of the above mentioned languages are reduced to CSPs to take advantage of the existing fast algorithms to solve CSPs, for example by Williams [2001].

Hierarchical Planning Hierarchical Task-Network Planning (HTNP) extends the standard planning model by applying reasoning on high-level tasks rather than on actions [Muoz-Avila and Hoang, 2006]. These tasks are represented by methods which encode subtasks that break down the original task into simpler ones. This simplification continues until tasks are fully decomposed into actions.

2.3. Other Problem Solving Methods

15

2.3.2 Simulated Annealing Simulated annealing is an optimisation method inspired by metallurgy: In an metal annealing process, a molten substance is slowly cooled from a high temperature such that the system is approximately in thermodynamic equilibrium, which means that the temperature is roughly the same at all locations within the substance. As it cools down, the system becomes more ordered and arrives in a “frozen” state at the end, which, in metallurgy, is a crystallised block of metal with very few impurities. Simulated annealing maps this analogy on combinatorial problems, as shown by Kirkpatrick et al. [1982], Cerny [1985]. The current solution to the given problem resembles the current state of the thermodynamic system. The temperature of the molten substance can be described by the objective function. “Cooling down” is then the process of optimising this objective function until we arrive in the “frozen state”, which is the global minimum. The molecules move freely while the substance is molten. This is resembled by allowing the current solution to move in the search space and grow to some extend (in a minimisation problem) to avoid local minimums. As the solution is being “cooled”, this movement is more restricted until finally only moves are allowed that improve the solution, i.e., a gradient descent search behaviour emerges. The pseudo-code for a simulated annealing algorithm is given in figure 2.5. This strategy of allowing suboptimal movements has been proven very effective in practice and leads to a fast convergence [Kirkpatrick et al., 1982].

16

Chapter 2. Artificial Intelligence Methods

// i n i t i a l s t a t e s = s0 ; // i n i t i a l f i t n e s s f = F( s ) ; // i n i t i a l t e m p e r a t u r e t = t0 ; // w h i l e t e m p e r a t u r e i s h i g h enough and r e q . f i t n e s s not r e a c h e d while ( t > tmin and f < fmax ) { // f i n d some n e i g h b o u r s t a t e sn = n e i g h b o u r ( s ) ; f n = F( sn ) ; // i s n e i g h b o u r s t a t e w i t h i n t e m p e r a t u r e range o f o l d f i t n e s s ? i f ( fn < f ∗ (1 + t ) ) { // i f so , change t o s t a t e s = sn ; f = fn ; } // r e d u c e t e m p e r a t u r e by a cooldown f a c t o r c , c < 1 t = t ∗ c; } return s

Figure 2.5: Simplified pseudo code algorithm for simulated annealing.

3 Artificial Intelligence in Games their present use in games is studied and an adequate candidate for their application, the commercial real time strategy game DEFCON, is presented.

3.1 Use of AI in Games in the Past Computer games and Artificial Intelligence (AI) research are relatively young, both emerged with the dawn of the information age around the early 1950s. Their combination is even younger, as early game machines did not have the computational power or the storage space required to implement AI techniques. While it was possible for most game developers to attract gamers through innovative and more realistic graphics in recent years, the advances in graphics are decelerating and differences between the new generations of games become harder to spot. The attention of developers is therefore beginning to shift towards more intelligent behaviour of nonplayer characters and opponents [Mateas, 2003]. Game AI is developing from a rather small and unimportant feature to an increasingly demanded and critical part of current games [Schwab, 2004]. Similarly, the rise in complexity of these games makes it harder to implement proper AI engines and requires careful software design. On the other hand, academic researchers are beginning to discover interactive computer games as ’killer application’ for human-level AI [Laird and Lent, 2000]. These games – unlike classical board games as chess – do not require brute-force calculations and optimisation algorithms, but rather a simulation of human behaviour and reactions, including moods, faults, anticipation and planning.

17

18

Chapter 3. Artificial Intelligence in Games

3.2 AI Techniques specific to Games While most of the methods mentioned in the last chapter have been applied to games, there are some techniques which are used almost exclusively in games and thus are presented here.

3.2.1 Dynamic Scripting Dynamic Scripting (DS) is an often used, developer driven control method for AI bots. It uses domain knowledge to restrict the search space of actions in a game allowing real-time adaptation but in the same time ensures plausibility. It can therefore be seen as a modification of reinforcement learning, which can also learn by trial and error and by delayed rewards. However, it requires a small search space to converge properly, where DS uses conventional scripts to limit the state-action space. These scripts (rules) have to be generated manually by the programmer and are stored with an associated, dynamic and automatically generated weight in a rulebase. Each time the algorithm is initialised, rules are selected randomly with a probability according to their weights. At the end of the algorithm, a fitness function evaluates the outcome and adjusts the weights of the rules. DS can be applied to problems that satisfy these requirements, as argued by Spronck [2006]: • The AI-bot can be scripted as a series of rules. • Effective AI rules can be written by a developer. • It is possible to assess the performance of a script through a fitness function.

3.2.2 Minimax Decision Trees A minimax tree is a special form of a decision tree that is especially useful for modelling game states and searching optimal solutions. The minimax algorithm assigns a positive value for each final game state (represented by leaf nodes) where the considered player has won, and a negative value for lost game states respectively. The algorithm assumes a perfectly playing opponent that chooses the worst

3.2. AI Techniques specific to Games

19

possible outcome for its opponent, i.e. selects the node with the smallest value. The considered player in turn chooses the node with the highest value to maximise his chances of winning. Following these rules, the values of intermediate nodes can be calculated through a bottom-up approach, as leaf node values are known. Estimation algorithms are used to find the value of nodes if it is not possible to construct the entire tree, for example when the search space grows too large. Further speed-up can be achieved by pruning the tree and sorting nodes, for instance with alpha-beta pruning, analysed by Knuth [1975].

3.2.3 Plan Recognition Plan Recognition (PR) is a method used by an agent to observe the actions of another agent with the objective of inferring the agent’s future actions, goals or intentions. Several methods for plan recognition have been explored: • Deductive methods are used by Kautz [1987]. • Abductive methods have been presented by Allen and Ferguson [1994]. • An probabilistic approach is given by Eugene and Robert [1993]. • A case-based method is presented by Kerkez and Cox [2001]. In some PR systems, the plan library is manually created by the developers. These libraries often have to be complete [Kautz, 1987], however this is only a tractable task when the domain complexity is low. This is almost never the case in realworld applications where creating complete plan libraries may be impossible. The automation of creating plan libraries using machine learning methods has been investigated by Kerkez and Cox [2001], Lesh et al. [1999], Bauer [1995].

3.2.4 Finite State Machines The data structure Finite State Machine (FSM) contains the states of the (virtual) world, the input and output events and a transition function that uses the current state and input events to produce the next state and output events [Conway, 1971].

20

Chapter 3. Artificial Intelligence in Games

Figure 3.1: A graph of a simple finite state machine process. (Image taken from Wikipedia [2007])

Schwab [2004] uses a slightly different definition of a FSM, where there is a standalone module for each state that also contains the update logic, transition logic and special events. He claims that this modularity improves the flexibility and generality of the system and allows for more complex and specialised transition functions. Furthermore, FSMs in general are easy to construct, implement and debug. However, they are relatively static, don not scale well with iterative addition of features and can run into problems like state oscillation. In this problem the margin of a transition function is just slightly exceeded. The following state lowers the margin again, which might lead to the transition function of that state to trigger and activate the former state again, and so on. Schwab [2004] suggests some extensions that can circumvent these drawbacks: Hierarchical FSMs: group similar states in a nested FSM within states of a higher level FSM to further modularize the state machine and add complexity without impairing the understandability of the system

3.2. AI Techniques specific to Games

21

Message- and Event-Based FSMs: Instead of having polling transition functions, use messages or events to trigger events. This increases performance especially on large FSMs Fuzzy Transition FSMs: Merely a special type of transition functions than a whole different method, fuzzy transitions allow for more flexible and realistic state changes Stack-Based FSMs: A stack stores recent states to give the agent a limited form of memory Inertial FSMs: Inertia is introduced to reduce state oscillation. After a state is activated, new states have to exceed this inertia to be activated. By a controlled decay over time, some sort of single-mindedness of the AI can be simulated.

3.2.5 Influence Maps By copying the environment map into a layered grid of cells, where each layer contains different information about the game state, an agent is able to use these influence maps to infer strategic and tactical decisions and therefore enable it to react sensibly to its environment [Sweetser, 2006, Tozour, 2001]. Influence Maps are used in games to make high-level strategic decisions, as their structure allows to make inferences about the game terrain, opponent actions and important strategic control areas. Different layers of it can be combined through a weighted sum to give more abstract data (like safety of a given location).

Occupancy Maps Occupancy maps can be seen as a specialisation of influence maps and are a knowledge-representation method which allows agents to keep track of positions of objects when knowledge about the environment is only partially given, as described in Isla [2006]. It is based on expectation theory, which involves making predictions about the future. To circumvent the problem of the highly irregular

22

Chapter 3. Artificial Intelligence in Games

and disjoint probability distributions that would appear in a continuous mathematical model, discrete occupancy maps can be used. An algorithm to implement such a map consists of the following steps: • Inspect all visible grid nodes, adjust their probability (to zero if object is not there) • renormalise entire distribution • diffusion step, where each node passes a part of its probability value to its neighbours to reflect the agents’ increasing uncertainty about the node Occupancy maps are used in Thief: Deadly Shadows, where a search layer for guards is used that prevents them from investigating the same areas over and over again [Tozour, 2004].

3.3 Board Games The difference between most academic studies and entertainment versions of board game playing programs is the notion of a limited time. Research about a board game is often done with the aim of “solving” it completely, which can take countless computing hours. Solved games are tic-tac-toe, Othello, Connect Four and checkers. The latter took about 1014 calculations and took 18 years to solve, being the most complex game solved so far [Schaeffer et al., 2007]. Although this research differs from the goal of creating an immersive entertainment program, the resulting algorithms can often be used for programs that compete against humans. Almost all AI techniques have been applied to board games, because with their limited state space and clearly defined rules, board games provide an easy to use test bed for these techniques. Examples include: • Genetic programming and genetic algorithms have been used for the board game Othello by Eskin and Siegel [1999]. • Artificial neural networks have been applied to Blackjack and Tic-Tac-Toe [Olson, 1993].

3.4. Video Games

23

• Reinforcement learning has been used to find new opening move sets in the board game Backgammon that are now used by all the world’s leading players [Baekkelund, 2006], to explore ’local shapes’ in the board game Go [Silver et al., 2007], and to find strategies for the board game Settlers Of Catan [Pfeiffer, 2004]. • Minimax decision trees are a very useful tool for modelling board games and in general games with perfect information, i.e. where all parties have full information on the game state. Minimax trees are used in Chess, Checkers and Reversi [Bramer, 1983]. Chess has been very thoroughly studied, and it was the game that heralded the time where artificial intelligence can surpass human intelligence — at least in a game: Deep Blue defeated the then world chess champion, Gary Kasparov. Deep Blue achieved that through heavy parallelism, a complex evaluation function, the effective use of Grandmaster game libraries and tailored hardware [Campbell, 2002].

3.4 Video Games Researchers are just beginning to research into more complex computer games as, in the past, limited resources restricted the experimental research to toy experiments and board games. However, modern technology allows for more complex simulations and the demand from the industry plays a part as well in the increasing interest into using AI techniques for video game applications.

3.4.1 Academic Research CBR has been used for case-based plan recognition in the game Space Invaders. In Fagan and Cunningham [2003] the authors were able to produce good prediction accuracy using plan libraries derived from recorded games of another player. They used a simplified planning language called COMETS which does not employ preconditions to actions like STRIPS (cf. section 2.3.1) and applied plan recognition to abstract state-action pairs.

24

Chapter 3. Artificial Intelligence in Games

Dynamic scripting has been used in academic context to implement an adaptive AI for the commercial role playing game Neverwinter Nights (BioWare) [Spronck, 2005]. In the real-time strategy game Wargus, a dynamic scripting approach was able to outperform manually designed plans after a short learning period [Spronck, 2006]. Hierarchical planning was tested by Muoz-Avila and Hoang [2006] in an Unreal Tournament environment and showed a clear advantage of bots using hierarchical planning over others that used classical finite state machines.

3.4.2 Commercial Games The methods used in commercial games are usually far less experimental and do not contain learning of any sort, with some exceptions. There are several reasons why learning is not common here, as discussed in Laird [2005]: • The game designers have less control over the behaviour of the game, which can lead to worse game play. The AI can get stuck while learning and learning might take a long time to show significant results. Additionally it is difficult to validate and predict future behaviours. • The cost for development is usually higher, as the learning algorithms need more time to develop, to test and to debug. Also, the company needs to have sufficiently skilled programmers to apply machine learning methods. • Often it is cheaper to fake learning by scripting multiple levels of performance, for example the number of mistakes a system makes or the repertoire of moves it has. This gives the player the illusion of a learning system whilst avoiding the drawbacks outlined above. Therefore the most common techniques in these games for creating “intelligent” opponents are scripting and finite state machines, as they restrict the state space and provide a fully defined and controllable environment. There are a lot of examples of games using these techniques, Final Fantasy, Neverwinter Nights and Baldur’s Gate are role-playing games relying on scripting, Quake and Unreal are first-person

3.4. Video Games

25

shooters which incorporate scripting and intelligent pathfinding-routines, Donkey Kong and Mario Bros. are platform games that use FSM-controlled enemies, often single-state enemies. There are many more examples from other genres. In recent years, certain games have implemented more advanced AI methods, for instance: • Genetic algorithms are used in bSerene 1 , Creatures and Black & White for online learning. • Planning is also only now emerging in commercial computer games, as cheaper methods (computationally and in terms of complexity) of simulating plans like scripting or even pure reactive behaviour were (and still often are) sufficient to satisfy the needs of game designers. Dark Reign (Activision) uses a form of finite state machine that involves planning [Davis, 1999], the first person shooter F.E.A.R uses GOAP [Orkin, 2005].

1

A hide&seek game where non-player characters (NPCs) try to catch the human player. These NPCs learn using genetic algorithms and A-Life. The game is open source and can be found at http://alifegames.sourceforge.net/bSerene/game.html.

26

Chapter 3. Artificial Intelligence in Games

3.5 DEFCON 3.5.1 Description of DEFCON DEFCON is a real-time strategy game, where the participating parties use offensive and defensive units to attack units and cities of opponents to score most points. The game is divided into stages, that sequentially unlock more powerful means of attack. The information in this chapter is derived from the manual2 and the game itself3 .

Figure 3.2: Two players shortly after the start of the match.

In this video game, each player plays a nuclear superpower, namely Europe, Africa, Russia, South Asia, North America or South America. At the beginning of a typical match of DEFCON, every player chooses or is assigned one of these territories. The player then continues by placing structures like air defence silos, airbases and radar stations at tactical positions to protect his cities. Fleets are also created, they consist of battleships, carriers and submarines. Thereafter, he starts commanding his units through the world map, in order to defend against attacks and start own attacks against the opponent. At first, fights ensue between fleets and planes. Players try to command their units for optimal effi2 3

Available online at http://www.everybody-dies.com/downloads/manual.html. Demo available online at http://www.everybody-dies.com/downloads/.

27

3.5. DEFCON

Figure 3.3: A fleet battle ensues.

ciency, which includes selecting targets, form up fleets and choose between different unit statuses.

Figure 3.4: A successful missile attack against a city later in the match.

Finally, missiles are being used. They are devastating against structures and are the only way to score points, namely by striking opponent cities. The correct timing is important when a player plans to attack with missiles, concurrent attacks are more effective than dragged on attacks. Also opponent silos are defenceless when launching nukes themselves, which is a good opportunity to start a counter attack.

28

Chapter 3. Artificial Intelligence in Games

The game is finishes when most of the missiles have been used, the player who has inflicted the most damage to opponent cities wins. The remainder of this chapter contains a formal description of DEFCON and its units, followed by a discussion of the AI bot created by Introversion. Parties & Territories A game has at least two parties, where each party can be controlled by either a human or an AI player. Each party controls one territory, which is randomly assigned or chosen before the game starts. Each territory can be controlled by 0 or 1 party. In each of the latter there is at least one city, each with a specified positive number of inhabitants in them. The sum of the inhabitants of all cities is equal for each territory. There are no cities in territories that are not controlled by a party. Also, a part of the sea (terrain in which naval units operate, disjunct from territories defined above) is available for each party to place naval units into in the first stages of the game course. Alliances Parties can form alliances to cooperate. This is done by sending a request to join the other party’s alliance. All parties that are currently in this alliance can then vote whether or not to accept this request. If the majority of the voters accept the new party, it is added. The benefits of being in an alliance depend on the options set by the game server4 . Ceasefire prevents units from attacking allied units, shared radar coverage enables parties to see all units of its allies, with the exception of submerged submarines. These settings can be changed while the game is running5 . Allies can be expelled from an alliance through a majority vote, that has to be initiated by an alliance member. Allies themselves can leave an alliance at any time.

4 5

Both automatic ceasefire and shared radar coverage are enabled by default. Only if the server settings allow for this (default).

3.5. DEFCON

29

Units Each party has the same predefined quantity of units it can use. There are ground installations, naval units and aerial units. Ground Installations Silo: Initially contains 10 missiles. States: Air Defence: Automatically targets an aerial unit in attack range. If there are several, it chooses Nukes over Bombers over Fighters. If there are several of the same class, it chooses the closest. Targets can be manually assigned. After a shot is fired, it needs to recharge 20 seconds. Launch Missiles: Targets for missiles have to be manually assigned. Every location on the map can be targeted. After a target is chosen, a missile will be launched. There is a recharge time of 120 seconds after the launch of a missile, whilst no other missiles can be launched. A launch of a missile will reveal the location of the Silo to all parties. Radar: All Units (except submarines) within a certain distance are visible6 to the party. There are no actions for this ground installation. Airbase: Initially contains 10 missiles, 5 fighters and 5 bombers. If there are fewer than the initial number of fighters, they are slowly regenerated, 1 fighter every 1000 seconds. The sum of the number of fighters and bombers contained in the airbase cannot exceed the initial combined size of 10 planes. States: Launch Fighters: Targets for fighters have to be manually assigned. Every location on the map can be targeted. After a target is chosen, a fighter will be launched. The recharge time after the launch of a fighter, whilst no other fighters can be launched is 20 seconds.

6

To be visible means within radar range of any unit of the party in this report.

30

Chapter 3. Artificial Intelligence in Games Launch Bombers: Same actions as in state launch fighters, with the difference that bombers will be launched instead of fighters.

Naval Units All naval units can move to every reachable position on sea. Groups of up to 6 naval units can be combined to a fleet when they are initially placed. Thus a player usually has several fleets. A fleet cannot be separated and ships in fleets cannot be navigated separately, however it is possible to command ships in fleets to attack different targets and change into different states. Carrier: Initially contains 6 missiles, 5 fighters and 2 bombers. Maximum capacity equals initial number of fighters and bombers. States: Launch Fighters: If there are hostile units that can be attacked by fighters within a certain range of the carrier, the carrier will launch fighters. Targets for fighters can also be manually assigned. Every location on the map can be targeted. After a target is chosen, a fighter will be launched. There is a recharge time of 120 seconds after the launch of a fighter, whilst no other fighters can be launched. Launch Bombers: Targets for bombers can also be manually assigned. Every location on the map can be targeted. After a target is chosen, a bomber will be launched. There is a recharge time of 120 seconds after the launch of a bomber, whilst no other bombers can be launched. Anti Submarine: A sonar scan reveals all submarines within a certain distance from the carrier. If there are hostile submarines detected, a depth charge is released, which destroys hostile submarines within a certain range and probability (see table 3, page 10). Each sonar scan has a recharge time of 60 seconds. Battleship: Automatically attacks hostile naval units, fighters and bombers within its attack range.

3.5. DEFCON

31

Submarine: Initially contains 5 missiles. States: Passive Sonar: If not submerged already, a submarine does so upon entering this state. It is then invisible to radar and can only be detected by carriers in anti submarine status and other submarines in active sonar mode. It can attack hostile naval units if they are visible to the party of the submarine. Active Sonar: The same as passive sonar, additionally the submarine creates a sonar scan, which reveals all naval units within a certain distance from the submarine. Each sonar scan has a recharge time of 20 seconds. Launch Missiles: Submarine surfaces upon entering this state, thus becoming visible to radar and attackable by battleships, submarines, fighters and bombers. Targets for missiles have to be manually assigned. Every location within a certain attack range can be targeted. After a target is chosen, a missile will be launched. There is a recharge time of 120 seconds after the launch of a missile, whilst no other missiles can be launched. A launch of a missile will reveal the location of the submarine to all parties. Once all missiles have been launched, the submarine enters passive sonar state. The launch missiles state can only be chosen if the submarine still contains missiles. Aerial Units Fighter: Attacks visible hostile surfaced naval units and fighters and bombers within its attack range. A fighter has limited fuel. Any unit with limited fuel will crash if it does not return to an airbase or carrier before it runs out of fuel. Bomber: Like the fighter, the bomber has limited fuel. States: Naval Combat: Attacks visible hostile surfaced naval units within its attack

32

Chapter 3. Artificial Intelligence in Games range. Launch Missiles: Targets for missiles have to be manually assigned. Every location can be targeted. When the target is within the bombers attack range, the missile will be launched. Otherwise the targets can be reassigned. Missile: A missile can be disarmed. This action destroys the missile after 100 seconds, if it has not impacted before. Upon impact, a missile inflicts damage to all units within a radius of 0.5 length units. (see table 3.1).

Figure 3.5: Screenshot of DEFCON : The green player attacks with missiles Structure Radar Airbase Silo

Hits to destroy 1 2 3

Table 3.1: Hits of missiles required to destroy structures

Temporal course of a DEFCON game A game is divided into 4 stages, each called a defcon phase by the game. It starts in stage Defcon 5.

33

3.5. DEFCON

State Silo: Air Defence Silo: Launch Missiles Airbase: Launch Fighters Airbase: Launch Bombers Carrier: Launch Fighters Carrier: Launch Bombers Carrier: Anti Submarine Submarine: Passive Sonar Submarine: Active Sonar Submarine: Launch Missiles Bomber: Naval Combat Bomber: Launch Missiles

Time [sec] 340 120 120 120 120 120 240 240 240 120 60 240

Table 3.2: Default time to change into given state

Attacker

Target Fighter Bomber Battleship Carrier surfaced Sub submerged Sub Silo

Fighter medium very high low

Bomber very high high medium

Battleship medium high medium high -

Carrier medium high high high -

surfaced Sub high very high very high very high high -

submerged Sub very high high

Missile very high

Table 3.3: Chances of destruction of target upon hit. The probabilities convert as follows: low = 10%, medium = 20%, high = 25%, very high = 30%

34

Chapter 3. Artificial Intelligence in Games

Defcon 5: A party can place7 its ground installations within its territories and naval units within its assigned sea parts (see 2.1.1). It cannot see nor attack any unit of other parties, even if they would be in radar range. Naval units can move within waters that are not assigned to other parties. Other actions are not possible. This stage lasts 3 Minutes. Defcon 4: The same as Defcon 5, except that units of other parties can now be seen, if in radar range. This stage lasts three minutes. Defcon 3 & Defcon 2: In this stage8 units can no longer be placed. All actions not involving missiles are allowed, i.e. non-missile aerial units can be launched, naval units can move into foreign waters, attacks are possible. This stage lasts 6 Minutes. Defcon 1: Units cannot be placed. All actions are allowed. This stage has no fixed time limit. After 80% of all missiles in the game have been launched, a victory counter starts (45 Minutes). When this counter finishes, the game ends. Winning conditions The score of a party in DEFCON is calculated as s = ak − bl where k denotes the number of points scored in opponent cities and l denotes the number of points scored by opponents in their own cities. a and b are defined by the score mode, that has to be chosen before the game starts (see table 3.4). A missile that hits a city increases the score by 50% of the inhabitants in that city (measured in millions) and decreases the number of inhabitants by 50%. At default score mode, a party therefore scores 2 points for each million inhabitants that opponent cities were reduced by through own missiles, and gets -1 point for each million of inhabitants that were lost in own cities. At the end of the game the party with the highest score wins.

7

Placing a unit is the action of selecting it out of a pool (which is outside the world) of available units and placing them in the world. 8 Although this stage is treated as two stages in the game, they do not differ and are therefore treated as one stage here.

35

3.5. DEFCON Score mode Default Genocide Survivor

a 2 1 0

b 1 0 1

Table 3.4: score modes in DEFCON

3.5.2 AI in DEFCON The AI currently in use in DEFCON relies on scripting its actions and is entirely deterministic. This implies that, given all game events, it is possible to calculate the next event of the AI. Thus AI related communication between the hosts on the network the game runs is not required. Although reducing the network load, the behaviour of the AI has to be entirely predefined and is predictable. While this might not be entirely desirable for play against humans (as it is very easy for humans to adapt and learn counter strategies), it provides a good starting point for writing another AI which applies machine learning and can use the reproducible actions of the existing AI9 as a measure for its own fitness.

Finite State Machine in the Existing AI The state diagram of the existing AI is shown in Figure 3.6. The AI has 5 states that it traverses in a linear fashion; once it reaches the final state it does not change states any more. The states can be characterised as follows: Placement: Fleets and structures are placed. The fleet is randomly placed at predefined starting positions. Structures are placed near cities. Once all units are placed, the AI proceeds to the next state. Scouting: The AI tries to uncover structures of a random opponent by moving fleets towards occupied territories and launching fighters towards them. Once

9

We will refer to the AI originally developed by Introversion as the existing AI throughout this report.

36

Chapter 3. Artificial Intelligence in Games

Placement

Scouting

Assault

Strike

Final

Figure 3.6: State diagram of the existing AI bot in DEFCON

5 structures have been uncovered or a predefined assault timer 10 expires or the victory timer starts the next state is invoked. Assault: The AI starts to launch missile attacks with bombers and subs on the previously chosen opponent. Once 5 structures have been destroyed or the assault timer expires or the victory timer starts, the strike state is invoked. Strike: Silos launch their missiles and other missile carrying units continue to attack. After these attacks have been initiated, the system changes into the final state. Final: In the final state no more strategic commands are issued. Fleets continue to approach random attack spots.

10

The assault timer is randomly initialised before the game starts to a value between approx. 3000-7000 seconds into the game

4 Overview of a DEFCON Match In this chapter we present a short overview of a match of DEFCON with the AI bot we have developed playing against the existing AI bot (built by Introversion). We explain the different stages of the game and relate them to chapters in this report. Recall that DEFCON is a simulation of global nuclear war, and that the aim is to annihilate as much of the opponents cities as possible.

4.1 Initialisation At the start of the match, the starting territories are selected automatically by the DEFCON engine, and, for the purposes of this explanation, players have no control over it.

In this example, the existing AI will occupy North America (shown in green), while our bot starts with South America (shown in red). The list of starting territories makes up a starting configuration.

37

38

Chapter 4. Overview of a DEFCON Match Once the territories are selected, we have a starting configuration,

Case Base

that is used to search a case base for similar cases.

Note that a

similar case in this context is extracted information of a recorded game in which the starting configuration was similar. The sim-

Decision Tree

section 7.1, and details of how

A 1

C

B 10

20

...

2

+

ilarity measure is discussed in we created a case base of plans

3

−

is shown in chapter 6. The plans contained in retrieved

... ...

cases are then used in a decision tree generalisation algo-

Plan Generation

in a decision tree using plan at-

A

tributes. With the help of this

1

C

B

−

20

...

+

rithm. The plans are classified

... ...

tree a new plan is created through a fitness-proportionate selection. This process is explained in section 7.2.

The decision tree gives us a high

level plan that controls groups of fleets of ships, called metafleets, and attack values. The justification for this combination is given in 5.1. Once the plan is generated, fleets are placed as dictated by the high-level plan.

39

4.2. During the Match

The placement of structures such as silos, airbases and radar stations is controlled by a placement algorithm which takes recent matches into account by querying the case base.

Note

that structure placement is not controlled by the high-level plan. It is explained in section 8.1, and fleet starting zones are discussed in section 8.2.2.

4.2 During the Match After everything has been placed, an attack on the mainland can be prepared. We use influence maps to determine the best place to attack, which depends on the population density and positions of fleets. The uses of influence maps are explained in section 8.4.

40

Chapter 4. Overview of a DEFCON Match Knowing where to attack, fleets can start sailing towards the opponents.

To optimise the effi-

ciency of the bot’s units, we use a movement desire model for mobile units and organise fleets in formations. The two techniques are presented in sections 8.5 and 8.6, respectively.

Having established where and when to attack, we have to execute the attack. This is optimally done with a synchronous attack, where all participating units are controlled in such a way that all missiles hit in a small time frame. We show how this is achieved in section 8.3. The allocation of targets to bombers and of bombers to landing pads is enhanced by a simulated annealing optimisation process. This optimisation shortens the distance that bombers have to travel, ensuring that they do not exceed their range. It is explained in section 8.7.

41

4.3. End of Match

4.3 End of Match After most of the missiles have been launched, a victory timer starts.

When it expires, the

game is over, and the player that destroyed most opponent cities, while keeping his own cities safe, wins. A more detailed description of DEFCON and it’s units can be found in chapter 3.

... ... ... . . . ...

After the match has ended, we extract the results, the plan used, structure information and opponent fleet movement into a case, which is then retained in the case base. The composition of a case is detailed in chapter 6.

5 System Design In this chapter we explain our design choices and justify the chosen approaches. We decided to use hierarchical planning to cover both high level strategy and small scale tactics. We explain why the focus for high-level plans is on groups of fleets, which we will call metafleets, and attacks. We also show how we approached the task of automatically learning good behaviour through combining case-based reasoning, decision tree generalisation and an evolutionary based plan generation.

Case Base

Decision Tree

Evolutionary Based Plan Generation

A

A 1

... Case Config.

20

+

1

3

C

B 10

2

−

C

B 20

... ...

...

Experiment

Plan

Defcon

A=1

Plan

B=20

Result

C=rnd

+

...

Figure 5.1: Overview of the chosen system design

42

... ...

−

5.1. Planning

43

5.1 Planning Typical real time strategy (RTS) games often require much foresight and use of long term strategy as well as short term tactics. In DEFCON the fleets in particular require careful positioning, as they are the slowest moving objects. In fact, we can group the units into three groups: stationary units that remain at their starting position within a player’s territory, primary moving units and secondary moving units. The primary moving units are the ships (battleships, carriers and submarines). Their movement is mainly responsible for the ability to launch an attack and to prevent opponents from getting too close1 . The secondary moving units (fighters and bombers) support fleets by launching attacks from them and helping them fight opponent fleets. Thus secondary units are important for tactical (local) decisions while primary units are crucial for long-term, strategic decisions. These levels of resolution promote the use of hierarchical planning. High-level plans can be used to control strategic decisions like when and where to attack, while lower level plans control tactical choices like how to attack.

5.1.1 Metafleets Often there are separated oceans adjacent to a players territory, an example is depicted in Figure 5.2. In this figure the player owns territory A and the two starting areas s for sea units divide the sea terrain into two regions 1 and 2. This separation requires the initial fleet placement to take the starting territories of opponents into account, in our example player A’s fleets can reach Europe much quicker when they are placed in the starting zone in the Atlantic Ocean (zone 2 ). Also it might be beneficial to have more than one group of fleets, for example to attack two different opponents or different regions of an opponent’s territory, or to assign different tasks like defending and attacking to these groups. We call these logical groups of fleets metafleets. The logical distinction of two metafleets is their task, and tasks become manifest in plans. Therefore each metafleet has its own plan, and as metafleets contain all primary moving units every high-level plan 1

This is more important for configurations where players start far apart; for adjacent territories the opponent can strike directly regardless of fleet position.

44

Chapter 5. System Design

relates to a metafleet.

Figure 5.2: Separated sea terrain (1 + 2) for territory A

We come to the conclusion that the strategy of a player can be partially described through a high-level plan for each metafleet. Thus a DEFCON strategy comprises a list of metafleet actions, which are described below. The addition required to complete the description of a strategy are attacks which are discussed in the next section. We describe high-level actions for metafleets through changes of the state they are in and propose the following list of states: Idle: No task given. Await opponent: The metafleet moves to a position where contact with opponent fleets is likely. Avoid opponent: The metafleet moves to a position where contact with opponent fleets is unlikely. Move direct: The metafleet moves on a direct way to a good attack spot of the current opponent. Move intercepting opponent: The metafleet moves on a path where contact with opponent fleets is likely towards a good attack spot.

5.2. Learning

45

5.1.2 Attacks The decision to attack an opponent requires consideration of several factors. First of all, the bot has to decide which opponent to attack. For this project we have limited the number of opponents to one as the additional complexity of allowing more players does not immediately add insights and slows the learning process of the AI (see also 10.2, Limitations & Future Work). Once an opponent has been selected, the bot has to determine where and when to attack. The target position is determined by an influence map algorithm (see chapter 8) and a lower bound for the attack time is given by how fast the attacking units can be brought within range of the target. However, it can be advantageous to delay an attack, for example to wait until the defences of the opponent are down (i.e., his silos are not in air defence mode), or to discover structure positions first. Another aspect is the element of surprise and unpredictability, although the existing AI does not consider opponent actions. Thus the time of the (first) attack can be seen as part of the strategy against a player and is therefore included in a high-level plan. How attacks are executed in detail and when subsequent attacks are started will be discussed in chapter 8.

5.2 Learning We want to apply machine learning to train our AI bot what strategies are promising in a given environment. To accomplish this task we combined several techniques: Case Based Reasoning to store and retrieve similar recorded matches, Decision Tree Generalisation to abstract and generalise information about promising plans, and a modification of Genetic Algorithms to generate a new plan, expanding the case base and applying the retrieved information about the mentioned promising plans (Figure 5.1 on page 42 shows an overview of the chosen design).

46

Chapter 5. System Design

Case Base Decision Tree

Plan Generation

reuse

revision

retrieve retain Case

Experiment Plan

Problem description: Match configuration Problem solution:

Plan

Outcome:

Match result

Figure 5.3: CBR interpretation of the system

5.2.1 Case Based Reasoning To be able to learn, we need to remember previous matches. This requires some means of abstracting and storing them. We can use these matches to form cases in a case base. We interpret the composition of a case according to its components described in Aamodt and Plaza [1994]: Problem description: The purpose of a problem description is to contain as much data as required for an efficient and accurate case retrieval. In our case, that means we have to record and store enough data while a game is running to be able to compare it later on to another game. Essentially that data is given by the plan of the opponent. As we usually cannot immediately find out the intentions of an opponent, we are bound to record the effects of them. In DEFCON this boils down to recording fleet manoeuvre and main attack time frames. Problem solution: The problem solution describes actions that were taken to solve

5.2. Learning

47

the given problem. In DEFCON, the actions the AI bot executes are given by a high-level plan, which is subsequently broken down hierarchically into basic steps, which ultimately also results in fleet manoeuvre and attack commands. Therefore the problem solution can be described through the high-level plan. Outcome: The outcome of a case is given by the resulting state of the world after the solution was applied. In our case, the outcome can be fixed on the final score, which directly reflects the success of an applied plan to our “problem”, i.e. to our opponent. Thus we have a very straight-forward fitness measure of our solutions. We can enrich this measure with various other statistics which can later help us find weaknesses and strengths of our behaviour and that of others (see Chapter 9). An example of a case can be found at Figure 6.2 in Chapter 6. The similarity measure must be able to compare any case to the currently active match. This requires the comparison of fleet positions at any given time in a recorded match and correlating them with current fleet positions. As fleet composition is not predefined and (parts of) fleets might already have been destroyed, this is not a trivial task. The system can be interpreted as a case-based reasoning system as the typical steps for CBR algorithms (c.f. 2.2) appear in the process: Case retrieval: Cases are retrieved from the case base depending on their similarity measure — the match configuration. Case reuse: The old solutions — plans — are then reused to build a decision tree, from which the new plan is created. Case revision: The generated plan can be modified2 . Case retaining: After a match is finished, a case is created and retained into the case base. 2

However, the intent of the plan modification we will use differs from the original intent. We do not adapt our plan to the configuration but rather mutate it for exploration of the state space, see section 5.2.3.

48

Chapter 5. System Design

5.2.2 Decision Tree Generalisation We want to find a way to generalise given plans and recombine them to form new ones. Decision Tree Algorithms provide an easy to use way to achieve this and classify plan elements. As we are primarily interested in the ability to generalise plans, we use the term Decision Tree Generalisation (DTG). The benefit of the use of DTG is twofold: We can use this technique to generalise globally, i.e. over all cases of one configuration3 and even many configurations to get a notion of (parts of) plans that are generally sound or always lead to poor behaviour. This knowledge can guide the plan generation process when only a little history is available, i.e. only a few cases fit the current state of the world or the distance (given by the similarity measure of the case base) of the given cases is very high. On the other hand, DTG can be applied locally to similar cases in order to extract the most deciding factors in the case problem solutions. These factors are parts of plans that can then be used to construct a plan as a solution for the current situation.

5.2.3 Evolutionary Based Plan Generation The selection, combination and mutation of plans and learning from the consequences of these plans — the outcomes of a match — rather than explicit teaching can be interpreted as parts of an evolutionary approach. Our intention to use CBR and DTG allows us to formulate the whole learning process as parts of a slightly modified genetic algorithm: Populations The initial population in a GA consists of randomly generated individuals. We model matches as individuals with their outcome as the fitness value used for selection. The first set of matches uses randomly generated plans to fill the first 3

By configuration we mean the initial set up concerning player starting territories.

49

5.2. Learning

Case Base

Decision Tree

Plan Generation

Population

selection

recombination & mutation

Experiment Case

Plan

fitness function

new individual

Figure 5.4: GA interpretation of the system

population. Populations are handled by the case base. It retains all matches and can retrieve the most similar cases as a new population. Therefore we can define the population size k through a k-nearest neighbour selection in the case base. A resulting difference to conventional GAs is that a population is not solely defined through previous populations, but also through the case base similarity measure. Individual Selection & Recombination Given a population, the fittest individuals have to be selected and recombined to form new individuals. These two steps are conjointly achieved through the decision tree algorithm. A decision tree is generated out of the whole population. A new individual is then generated by recursively moving down the tree choosing the branch to follow through a fitness proportionate selection of all the possible child nodes. The fitness is given through the cumulative fitness ranking of each of the

50

Chapter 5. System Design

cases classified into that branch of the tree. Once a leaf node is reached, we either have a complete plan (no free variables left to further classify plans) or we have a partial plan and all cases in that node have the same classification. The partial plan is then completed with random values, which can be seen as a form of mutation. Mutation Another way of controlling mutation, and with it the exploration to exploitation ratio, is to introduce a factor e ∈ [0, 1] that controls the fitness proportionate selection of a child-node nchild at a node nparent . Let w(n) denote the ratio of won matches over all matches that are classified in4 the node n. Let C(n) denote the set of child nodes of a decision tree node n, such that there is exactly one child node for every possible value of a decision, i.e. of a plan attribute. We can then define the probability of choosing a child node P r(X = nchild ) with P r(X = nchild ) = e ·

1 |C(nparent )|

+ (1 − e) · P

w(nchild )

n∈C(nparent ) w(n)

(5.1)

Proposition 5.2.1. The distribution induced by equation (5.1) is indeed a discrete probability distribution. Proof. The proof is given in Appendix B. The induced probability distribution is a weighted average of the uniform distribution and a fitness proportionate distribution. The weight is given by e; when it is set to 1 we have a fitness proportionate distribution, for e = 0 we have a uniform distribution. For values between 0 and 1 we have the weighted average of both distributions and can thus control the amount of exploration (use random nodes) and exploitation (use well-performing nodes) directly. Fitness Function Once the new plan has been created, we can run the match using it, receiving the resulting score. This score is the fitness value which is then associated with the 4

A match is classified in a node n if it is classified in the parent of node n and the attribute-value pair associated with the branch between the parent node and n matches the plan in the match.

5.2. Learning

51

match and a case can be created and retained in the case base. This completes one generation of the genetic algorithm. Therefore we have only one new individual per generation and we do not explicitly delete or replace old individuals. This is another difference to conventional GAs, where new individuals replace individuals from older generations5 .

Similarities and Distinctions to other AI Learning Approaches The described learning method appears to be similar to Reinforcement Learning (RL), in that it proposes plans and learns from consequences of these plans rather than from explicit teaching. However, the main distinction between RL and GAs is that RL applies learning during an individual’s life through the use the mathematical framework of Markov decision processes, while GAs use a fitness value calculated when the individual’s task is finished. When some metric of the efficiency of actions is available during the runtime of an individual, reinforcement learning methods can outperform genetic algorithms. These benefits were considered during the design phase of this project, however the amount of additional work required to create such a runtime metric was not directly clear to us. We decided to rather use the already available metric given through the final score of a match. This resulted in our decision for genetic algorithms as an underlying model. In future versions the used metric could be enhanced to support intermediate evaluation and thus introduce reinforcement learning to DEFCON.

5

There are several methods in GA of combining a new generation with an old one, commonly referred to as the +-model where the two generations are unified and then pruned to the population size through fitness proportionate selection, and the ,-model, where the new generation replaces the old one [Rechenberg, 1973, 1994].

6 Learning a Case Base of Plans The implementation is carried out in C++. The main reason for this is that DEFCON is also written in C++. Various high-level programming languages like Lisp and Prolog could provide easier access to logical reasoning and planning, but the additional overhead given through the need of an interface and external libraries and the vague real-time behaviour outweigh these advantages. The excellent quality of the source code of DEFCON provided by Introversion facilitated the implementation and reduced the time required to develop data structures and helper functions. Most of the algorithms for the communication between the existing AI-Bot and the game itself have been reused with minimal modifications, but especially algorithms for fleet management like path finding and event handling required more attention. The development was executed as an incremental process. We started by implementing a minimal working bot that included a planning engine, but no means of automatic plan generation. The used plan was designed by hand in a way that we thought would give acceptable results to start with. After the actions required for the planning engine like coordinated attacks and basic fleet movement were implemented, we spent some time debugging and tweaking these actions to improve their behaviour. During that process, the performance of our bot could be easily measured by running it against the AI bot developed by Introversion, which we refer to as the existing AI bot. To be able to evaluate the learning process of the used learning algorithms we had to stop improving the basic actions at some point. Of course there are still things which can be improved or elaborated upon concerning the bot mechanics. However, this has been left for future work (see 10.2).

52

6.1. Building a Case

53

In this chapter, we describe the first step towards a learning bot, namely how we created the case base and what information a case contains. We explain how a plan is represented and what structural and opponent data was recorded during the training phase.

6.1 Building a Case 6.1.1 Plan Representation In chapter 2 we discussed some powerful languages employed for plan representation. These languages assume that the reasoning is carried out by the planner and thus include preconditions and postconditions to actions and convey information about the world state before and after these actions. However, in our system, the reasoning is carried out by the combination of a case base, decision tree generalisation and evolutionary algorithms (see 5.2). Therefore we do not need the mentioned conditional formalism. Rather than using one of these languages just for the sake of using an existing planning formalism, we created a minimal language that describes just the actions and their numerical attributes, which will be explained in detail in section 7.2. When retaining a plan in the case base, it is converted into the Extensible Markup Language (XML), see Figure 6.1 for an example. XML is human readable and a plan in this format could easily be converted to one of the other mentioned planning languages, should the need arise.

6.1.2 Structure Information We record information about the structures that are placed at the beginning of the match to be able to optimise their position later on (see 8.1). The location of such structures as airbases, radar stations and silos play an important role in the defence of a player’s territory and so we measure their success with the following metrics:

54

Chapter 6. Learning a Case Base of Plans

Figure 6.1: Example plan in XML format Missiles destroyed: How many missiles the structure was able to destroy is a relevant metric for air defence silos. Units destroyed: This is a count of all the opponent units destroyed. Both the air defence silo and the airbase (indirectly through planes) are measured through this metric. Missiles attacked: Additional to missiles destroyed, this metric is relevant for air defence silos and records their involvement in the defence against incoming missiles. Missiles spotted: Radar stations extend the area in which missiles and other units are visible to the player and therefore radar stations help to defend the player’s structures and cities. The number of missiles spotted is an indicator of the quality of the position the radar station was placed at. Planes launched: The number number of planes launched from an airbase can be seen as a rating of how active an area around that airbase is.

6.1. Building a Case

55

Planes quickly lost: When the airbase is placed within the attack range of an opponent silo, launched planes can be attacked instantly which usually leads to the loss of most of the planes on that airbase. To penalise this behaviour, this metric records the number of planes shot down shortly after launch. Time survived: This metric records the time of the destruction of a structure. Early destruction can be an indicator for a position that is unsuitable for placement.

6.1.3 Recording Opponent Data Recording opponent behaviour is important for the case retrieval, as we will use the recorded opponent information to estimate the similarity to the current match. To get a concise representation of the opponent actions, we have to distill the information about the opponent units and events. It is not helpful to record every small ship movement and attack action, as this will only complicate the similarity measure and slow down the retrieval process. Additionally, the high storage space required is undesirable. Therefore we abstract the actions as follows:

Abstracting Opponent Attacks During a match, we record every missile launched by the opponent. After the match is finished, we try to cluster these attacks into waves, as described below. A histogram of a typical record is given in Figure 6.2. In this figure, we can see three attack “waves”, the first one starting around 2000 seconds into the match, the second one at about 8000 seconds and the third one shortly after 12000 seconds into the match. There are no attacks in the first 2000 seconds, which is due to the fact that missile attacks are only allowed in the “Defcon 1” state, which starts after 1800 seconds. To calculate these attack times, we cumulate the single attack events into timeframe bins, i.e., all attacks within a period of time are counted. When this sum gets bigger than a threshold value, we consider this time as the start of an attack.

56

Chapter 6. Learning a Case Base of Plans

Detected Missiles

40

30

20

10

0 0 s

2000 s

4000 s

6000 s

8000 s

10000 s

12000 s

14000 s

16000 s

Time

Figure 6.2: Launched opponent missiles during a match

The attack ends when the subsequent bin sums drop below the threshold again. We use time-frames of 500 seconds length and a threshold of 5 missiles, as we found that this gave a good clustering over test samples we took from the general performance test, given in 9.1.

Abstracting Opponent Fleet Movement Every command that is issued to an opponent fleet is recorded. This includes opponent commands that happen outside of the view of the player, the commands are recorded through the event handling mechanism of DEFCON. Although this is knowledge not immediately available to a player while playing, our bot does not use this information during the active game, but only afterwards. As it is possible to record games and play them back later with full information, and spectators can join a game and also see everything, we do not consider this as cheating. To simplify the process of retrieving a fleet position, intermediate path information is also stored. After the match is finished, the recorded data is abstracted such that

6.2. Filling the Case Base

57

it only contains relevant waypoints, which are waypoints that, if omitted, would lead to a invalid route partly passing over non-sailable terrain. This is done by iteratively removing intermediate waypoints until the simplified route is invalid, which is checked by an algorithm provided by the system environment.

6.2 Filling the Case Base The information gathered in section 6.1 is then merged into a case. We use XML to represent cases and store them for the same reasons we represent plans in XML, as it is easy to read and to convert. An example of an XML case representation is given in Figure 6.2. The list of structures and fleet movement has been pruned in the figure, as the entries of these lists differ only by their numerical values. This completes the case creation, retrieval and retaining. In chapter 7, we describe a similarity measure which enables comparison of plans and retrieval of plans from the case base. To fill the case base to prepare the reasoning process, we run matches and retain their cases. More details on the number of matches and configurations used can be found in chapter 9.

58

Chapter 6. Learning a Case Base of Plans

. . . (15 s t r u c t u r e s omitted ) . . . ( a p p r o x . 10 w a y p o i n t s o m i t t e d ) . . . ( approx . 4 f l e e t s omitted ) . . . ( a p p r o x . 10 w a y p o i n t s o m i t t e d ) . . . (15 s t r u c t u r e s omitted )

Figure 6.3: Example case in XML format. It has been reduced due to size restrictions. A typical case has around 140 lines.

7 Automatically Generating Game Plans Once we have obtained a case base of previous matches, we can begin to reason using them and use this to generate new plans. We create plans at the beginning of a match in order to place our fleets and lay out the overall behaviour throughout the match. Basing this plan on previous matches can help the bot to prevent making strategic mistakes again.

7.1 Case Retrieval To retrieve cases from the case base, we have to define a similarity measure. This measure compares the recorded game information, most importantly the territory configuration and opponent information. We will refer to the match we compare the cases to as the current match. We apply a k-nearest neighbour search, meaning that we select the k nearest cases with respect to a similarity measure.

7.1.1 Using General Game Information First of all, we restrict the selection to cases in which the players start in the same territories as in the current match. This is reasonable, as all structural information and opponent fleet information is specific to the chosen territories. Using games with other configurations would require us to introduce conversion algorithms that map the events and positions of that case to the current match with no obvious advantages. Furthermore, the similarity measure takes the name of the opponent and the time of the recorded match into account. As many players have their individual strategies,

59

60

Chapter 7. Automatically Generating Game Plans

we want to select matches that these players participated in before. Also, newer matches are preferred over old ones to allow for opponents who develop their game over time. When we play against such an opponent, old matches might be useless, as the opponent might have learned from his mistakes.

7.1.2 Using Opponent Behaviour Information When we want to retrieve similar cases during a match, we can compare the known fleet positions to the recorded positions. This can give us information about the strategy the opponent may be using and thus we can use similar plans to predict future actions of the opponent. Matching fleet positions is not trivial: We have to consider which fleets associate with other fleets with respect to the location and composition of fleets. Also, we have to deal with incomplete information: Fleets may have not yet been spotted and thus cannot be included in the comparison1 . This problem can be seen as an instance of the assignment problem. Using a simulated annealing algorithm, we can efficiently approximate the optimal solution to this problem (see also section 8.7, Resolving the Assignment Problem). The fitness function used in the simulated annealing optimisation process can then also be used as a similarity measure for the fleet similarity to compare cases. Having calculated the similarity to other cases, we can select the k nearest cases. At the beginning of a match, we can then use the plans contained in these cases to create one for the current situation. During a match we can use these cases to predict future fleet movement or attack plans of the opponent.

7.2 Decision Tree Generalisation Generating a plan out of a set of existing plans can be done in several ways. It would be possible to just select the values of an attribute of the plans that appeared 1

Of course it would be possible to read out the desired fleet positions from the game process itself, but that would clearly be considered cheating.

7.2. Decision Tree Generalisation

61

most often. However that method does not make use of any structural information, and it is not easy to incorporate negative examples in the generation process. Negative examples are useful for a learner to constrain the space of valid strategies, otherwise there would be no obvious way to learn from errors. Furthermore we would have no notion of how important an attribute is with respect to the outcome of the match. Another important factor conflicting with that method is the possible interdependence of attributes. For example, the value of the time of the first attack might be correlated with the initial state of a metafleet. A late attack could enable the metafleet to start in a defensive state and only later on to switch to attacking, while an early attack would require it to move into attack range to strengthen a combined attack. Independently selecting values of attributes can lead to suboptimal decisions in that case. Decision Trees naturally separate examples into groups (leaf nodes), and each path in the tree correctly reflects the dependency of attributes, because combinations of attribute values only appear when there are examples that back up these combinations2 . The ID3 algorithm takes negative examples into account when creating the tree. Additionally, the splitting method employed prefers smaller trees over larger ones3 by using the information gain measure to select attributes (c.f. section 2.1.1). The information gain gives us a method of comparing the influence of an attribute over the result of a match by evaluating how well values of that attribute can separate positive and negative training examples.

7.2.1 Plans as Training Examples In order to apply the ID3 algorithm, we have to represent a plan as a set of attribute–value pairs. This is not a difficult task, as the structure of a high-level plan can be expressed in a straight forward manner in the required form. As explained in section 5.1, we have to encode metafleet information, metafleet state changes and attack data into a plan. All this data can be expressed as variables 2

These examples are classified into leaf nodes that are on a path to the root node that includes branches associated to the respective attribute values. 3 However, the ID3 algorithm does not always produce minimal trees. It could therefore be seen as a heuristic for minimal trees [Mitchell, 1997].

62

Chapter 7. Automatically Generating Game Plans

(i.e., instances of them are attribute–value pairs), see Table 7.1 for a complete list. Attribute first attack time number of metafleets per Metafleet: initial state start zone # of battleships # of carriers # of subs time before state change new state

Min 2000 1 0 0 0 0 0 2000 0

Max 7000 2 5 2 4 4 4 6000 5

Step 1000 1 1 1 1 1 1 2000 1

Table 7.1: Range of plan attributes. Key for metafleet states: 0 = await opponent, 1 = avoid opponent, 2 = move direct, 3 = move avoiding opponent, 4 = move intercepting enemy. Key for start zones: 0 = zone with low opponent presence, 1 = zone with average opponent presence, 2 = zone with high opponent presence To simplify the learning process, we restricted attribute ranges in a high-level plan: We allowed a maximum of two metafleets and at most one state change after the initial state. Although these restrictions are quite narrow, they should provide enough room for learning and make clear the approach of the used algorithms.

7.2.2 Extending the Entropy Function The entropy measure of a set of positive and negative examples is an indicator of its disorder. It is easy to divide a set of DEFCON matches into positive and negative examples by using the outcome of the matches: The example is positive if the bot got a higher score in a match than its opponent. However, this leads to very small trees when the number of positive examples and negative examples is unbalanced, for example if the bot always wins against an opponent, all examples are positive and the tree consists only of a leaf node. Even though the bot is winning all the time in this example, we are still interested in further optimising its behaviour and selecting plans that achieve a higher score than other plans. This can be accomplished by redefining what a “positive” and a “negative” example is.

63

7.2. Decision Tree Generalisation

By maximising the information entropy, we cause the decision tree algorithm to create a more detailed tree, allowing for a more precise classification. We use the following proposition to achieve an approximated maximisation: Proposition 7.2.1. The information entropy of a set S relative to a binary classification, defined in equation (2.1) on page 6, is maximal for p+ = p− = 1/2. Proof. The proof is given in Appendix B. According to Proposition 7.2.1, we adjust the boundary for the outcome of a match to decide on a positive or negative example. The outcome of a case c is the difference of the own score and the opponent score: outcome(c) := scoreown (c) − scoreopp (c), which is then used to separate the set of cases C into positive examples C+ and negative examples C− : C+ := {c ∈ C | outcome(c) ≥ b} C− := C \ C+ . Setting b to the average outcome of cases in C b :=

1 X outcome(c) |C| c∈C

gives a separation into two sets with approximately the same cardinality, e.g. p+ =

|C+ | ≈ 1/2. |C|

We can assume that the two sets have approximately the same cardinality because the distribution of case outcomes is roughly symmetric, as indicated in Figure 7.1. The matches used to create this figure are matches from the plan library and as such contain matches with suboptimal plans, which explains that the mean is close to zero.

64

Chapter 7. Automatically Generating Game Plans

250

Number of Matches

200

150

100

50

0 -300

-200

-100

0

100

200

300

Score Differential

Figure 7.1: Distribution of outcomes

7.2.3 ID3 for Plan Generalisation Having each training example given as a set of attribute–value pairs, the implementation of the ID3 algorithm (pseudo-code shown in figure 1 on page 8) is straightforward. The main function for this is given in Appendix A.3. The implemented ID3 algorithm creates trees with plan attributes in the nodes and branches for every attribute value. The more separating an attribute is in terms of positive and negative examples, the higher in the tree (i.e., closer to the root) it appears. Figure 7.2 is an example of the output of this algorithm. The tree in this figure has been pruned to a depth of 2 to maintain readability of the nodes. The chosen branch for the plan generation is shown in bold.

Await Enemy

Move Direct

First Attack Time Gain: 0.58 Plans: 7

4000

0 Entropy! avgFit: −1.270 Plans: 2

2

MetaFleet Count Gain: 1.00 Plans: 2

1

0 Entropy! avgFit: −1.440 Plans: 2

MetaFleet 1: Battleships Gain: 0.18 Plans: 23

3000

2000

0 Entropy! avgFit: −2.000 Plans: 1

0 Entropy! avgFit: −0.357 Plans: 3

5000

4

First Attack Time Gain: 0.16 Plans: 14

2

MetaFleet 1: Start Zone Gain: 0.92 Plans: 3

0 Entropy! avgFit: 0.420 Plans: 1

4000

MetaFleet 1: Initial State Entropy: 0.92 Fitness: −0.61 (0% won) Plans: 3

0 Entropy! avgFit: −2.000 Plans: 1

1

2

0

3

0 Entropy! avgFit: −0.820 Plans: 1

0

2000

0 Entropy! avgFit: −0.670 Plans: 1

MetaFleet 1: Start Zone Gain: 0.72 Plans: 5

0

3000

0 Entropy! avgFit: −0.660 Plans: 3

2

0 Entropy! avgFit: −0.680 Plans: 1

Avoid Enemy

1

0 Entropy! avgFit: −1.707 Plans: 3

7.2. Decision Tree Generalisation

1

0 Entropy! avgFit: −0.620 Plans: 1

5000

0 Entropy! avgFit: −0.590 Plans: 2

0 Entropy! avgFit: −0.230 Plans: 1

6000

MetaFleet 1: Start Zone Entropy: 0.92 Fitness: −0.59 (16% won) Plans: 6

0 Entropy! avgFit: −1.230 Plans: 1

0 Entropy! avgFit: −0.210 Plans: 1

0 Entropy! avgFit: −0.350 Plans: 1

65

Figure 7.2: A generated decision tree from a case base of 35 plans with the chosen branch shown in bold.

MetaFleet 1: New State Gain: 0.20 Plans: 35

66

Chapter 7. Automatically Generating Game Plans

7.3 Evolutionary Based Plan Generation After we have generated the decision tree, we can use it to create a plan. The tree structure where each branch classifies a certain number of positive and negative examples, thus carrying a probability of success, calls for an iterative plan generation process that starts at the root and chooses a branch based on its probability. This approach can be seen as a fitness-proportionate selection, which is an element of an evolutionary algorithm. As discussed in section 5.2.3, we combine this approach with a uniform selection to get a balance between exploration and exploitation of the state space of plans. This is done by using a weighted average between the fitness-proportionate distribution and a uniform distribution. This selection method is applied to each node, following the path along the selected branches. After reaching a leaf node, either all attributes are assigned or all training cases are classified the same. In the latter case, we can assign the missing attributes of the plan randomly, as we do not have any further information. That completes the plan generation, the new plan is then activated in DEFCON and the plan can be carried out.

7.4 Worked Example of Generating a Plan To demonstrate the plan generation through a decision tree, we use a set of cases with simplified attributes and values. In the example we use plans with the attributes A, B, C and D, all with values 1 or 2. A decision tree is generated out of plans from cases returned by a case base. In the process of generating the tree, the ID3 algorithm considers each attribute for the root node. It calculates the information gain for each of them and chooses the attribute with the highest gain for the node. For each possible value of the chosen attribute, a subtree is generated recursively.

7.4. Worked Example of Generating a Plan

67

A 2

1

An example decision tree for plans with attributes

C

B 1

2

1

2

amples. The number of examples classified in each leaf

D 6x

1

2

1x

5x

3x

A

For a fitness-proportionate selection, we calculate the 3/5

fitness in terms of the proportion of positive examples

C

B 0

4/5

in a subtree. The unnormalised fitness values for the 1

D 0

6x

node is given below the leaf nodes.

4x

10/11

1

A, B, C and D is shown on the left, classifying 19 ex-

selection are denoted on each branch. When using a weighted average of the fitness proportionate and the

1

5x

3x

uniform selection, each branch has an additional basic fitness that allows it’s selection even when the fitness

1x

4x

would otherwise be zero.

A

The selection of branches starts at the root node and 2

1

2

1

1

D 1

recurses into the tree until a leaf node is reached. In

C

B

the example, we assume the fitness-proportionate se2

lection led to the shown selection. This means that, in the generated plan, we have A = 1, B = 2 and

2

D = 2. The value for attribute C is not specified by the decision tree and therefore randomly chosen.

8 Carrying Out Game Plans After the plan generation is finished, the game itself can commence. During the starting phase, structures are optimally placed with the help of the case base and fleets are placed according to the high-level plan. Thereafter, the metafleets act according to their states and attacks on the opponent are initiated. We explain these high- and low-level actions and supporting algorithms like influence mapping and simulated annealing. Also, the movement desire model for local, small-scale control of units is presented, and fleet formation methods are explained and compared.

8.1 Initial Structure Placement At the beginning of the match, structures have to be placed. Silos, airbases and radars can be placed anywhere on the bot’s own territory, but not every position is equally well suited for these structures. For example, silos, which are responsible for the air defence against missiles, should be placed near cities to prevent the opponent from destroying them. Radars have to support silos by increasing their range and airbases have to launch planes such that their limited range is sufficient to reach their targets. At the same time, these structures should not be too exposed to hostile units, as they can be destroyed easily if unprotected. An exaggerated example of bad placement is given in figure 8.1. In this screenshot, the silos are all on the East of the territory, while all the big cities are on the West side. The radars seem to be placed sensibly, but they are useless on their own. For these reasons, we want to optimise the structure positions. This optimisation is not part of the high-level plan, as these structures do not change their position during

68

8.1. Initial Structure Placement

69

Figure 8.1: An (exaggerated) example of bad structure placement.

the game and are only placed at the beginning (unlike metafleets, where a strategic movement plan is required). We use the case base of previous matches to estimate an optimal structure placement. Through the analysis of recorded matches, a representation of advantageous and disadvantageous positions is formed and used to locate a valid configuration that increases chances of a good result. To be able to do so, we have to create a good metric to find similar cases and rank cases. The quality of the position of a structure can be estimated by its effectiveness, as described below. The effectiveness is given by its purpose: An air defence silo can shoot down enemy missiles and planes. These units pose a threat to the player, as they can lower his score directly through missile hits on his cities or indirectly through reducing his ability to score points himself. Therefore we evaluate the total number of missiles and planes shot down by silos in recorded games with respect to their positions (see also section 6.1.2). We also take the life-span into account, as a destroyed silo cannot defend any more, and if the silo is destroyed before launching its missiles, the loss is even bigger. The effectiveness ratings of two silos are not independent from each other: While there might be a single position where a silo has shot down most units, optimal

70

Chapter 8. Carrying Out Game Plans

placement cannot be achieved by placing all silos at this position — the limited range of a silo will prevent it from being able to cover all cities. Therefore the position of all silos in a case is evaluated into a single fitness value:

f itnesssilos =

X

as1 · nukesShots + as2 · objectsShots + as3 · timeSurviveds

s∈silos

A similar equation is derived for airbases, where we take into account how successful launched planes were (in terms of survival rate and destroyed units), and radar stations, where the number of detected units is considered. The correlation tests carried out during the development, presented in section 9.1, indicate good weights for these fitness functions. We found a high correlation between surviving silos and score, implying the weight as3 should be significant. The same holds for radar stations, but not for airbases, where we found a negative correlation. However, this negative correlation is relatively small, thus indicating that the according weight is not very relevant. The source code for the fitness selection algorithm can be found in Appendix A.2.

8.2 Handling High-Level Plans As we discussed earlier, a high-level plan consists of attack data and metafleet specifications. This information has to be processed such that it relates to low level actions that can be sent by the bot to the game engine.

8.2.1 Controlling Attacks The most important variable for attacks given by high-level plans is the scheduled attack time. Another factor contributing to an attack was discussed in section 5.1.2, namely the attack location. Thus an attack can take place once the time for it has

8.2. Handling High-Level Plans

71

come and enough attacking units are available to attack the selected location1 . We combine all this information into a fitness function that, when rising above a threshold, triggers the attack to start (see Attacks with Missiles, section 8.3).

Subsequent Attacks After an attack is executed, the next attack cannot immediately follow, because the bombers involved need to recover, i.e., they need to land at a nearby airbase or carrier and re-equip a missile. After this “cool-down” period, the next attack can follow. However, after a successful attack on a region of a players territory, the population centre might have moved so that a reorganisation of metafleet positions would be beneficial. This reorganisation is controlled by the metafleet and thus incorporated into the fitness function automatically, as it takes the movement of metafleets into account.

8.2.2 Controlling Metafleets Most of the metafleet attributes defined in the high-level plan concern the initial set-up. The number of battleship-fleets, carrier-fleets and sub-fleets is straightforward to implement. We explain how the starting zones are acquired and what the metafleet states are.

Starting Zones As outlined in chapter 5, in most of the territories there is a separation into two separate “oceans” in which metafleets can be placed. Figure 5.2 visualises this matter for the territory A, the figure is reproduced below as figure 8.2. To account for this fact, we have introduced the starting zone attribute for metafleets. It has three states: 1

The units used for an attack may require a certain amount of time to arrive at the chosen location, therefore we treat the attack time as the maximum of the required movement time and the scheduled high-level attack time.

72

Chapter 8. Carrying Out Game Plans

Figure 8.2: Separated sea terrain (1 + 2) for territory A

Strong: A zone is called a strong zone if most of the opponent fleet started in the ocean associated with this zone. In the example, this would be sea terrain 2 if the opponent occupies Europe, as all his fleet are within this zone. Weak: The weak zone is always the zone that is not strong. Medium: If the size of the fleet in the weak zone is at least half as large as in the strong zone, then the medium zone is the zone which was also found to be the weak zone. Otherwise the medium zone is set to the strong zone. The reason for the introduction of this classification is that we want an additional characterisation of the opponent fleet distribution. A separation in only strong and weak would not allow for a set-up where the opponent has roughly equal sized fleets. We use the selected similar cases to retrieve the zone classification by analysing starting positions of opponent fleets in these recorded matches. The averaged starting positions then define the classification for the current match.

8.2. Handling High-Level Plans

73

Metafleet States The position within a starting zone is also defined by the initial metafleet state. We expand the definitions of the states given in section 5.1.1 here: Idle: The metafleet has no direct orders. When it is not moving, a random attack position close to the opponent is selected and approached. Await opponent: The metafleet moves to a nearby position where contact with the opponent fleets is likely. This position is given through observation of fleet movement and prediction of fleet positions through recorded games. When the fleet has been defeated or no fleet was found at the time it should have arrived, another spot is calculated and approached. The starting position is chosen such that the likelihood of opponent contact is maximised. Avoid opponent: The desire in the movement model (see section 8.5 below) to avoid opponents is increased. In contrast to the await opponent state, the starting position within the predefined zone is chosen such that the likelihood of opponent contact is minimised. Move direct: The metafleet moves on a direct way to a good attack spot of the current opponent without paying attention to predicted or seen fleet positions. Move intercepting opponent: The initial placement method matches the await opponent state placement method. While moving towards the target, opponents are actively approached when they are within a certain event radius. This is achieved by suspending the movement towards the target and moving to the predicted or seen position of the opposing fleet. The high-level plan also contains a follow-up state that is triggered after a certain length of time or after the previous state goal is reached For example, in the await opponent state this is the case when all opposing fleets that are predicted to come close to the metafleet are destroyed. When a state change is triggered, the actions mentioned in the description above are triggered, except for the starting position actions.

74

Chapter 8. Carrying Out Game Plans

8.3 Low Level Actions Low level actions are actions which directly interact with the game system. Additional to the basic actions that require (almost) no work to realise, we present the synchronised attack action, which controls the interaction between several attacking units for concurrent attacks on opponents.

Basic Actions Most of the required actions are already supplied by the game and do not require much work to them: • Move to: Move a mobile unit to a location specified. • Change state: Set the state of a unit to a new value. • Attack: Attack a given location or unit. Other actions are also fairly straightforward and are not explained in full detail here as this would not give new insights: • Wait: Used in combination with another action which is activated after the specified time. • Schedule plane returns: Evaluates the landing pads that planes approach and optimises the mapping of planes to landing pads (Confer section 8.7).

Attacks with Missiles The only direct defence against incoming missiles is a silo in air defence mode. We want to determine the most efficient attack sequence to maximise the number of missiles m that hit their targets, denoted with mhit : max(

mhit ) m

75

8.3. Low Level Actions

Each silo has a fixed reload time treload between every two shots and a given action range dsilo in which it can shoot at units. Let tattack denote the time missiles are in range of a silo during an attack, given by the time span beginning when the first missile enters the attack range of the silo and ending when the last missile is shot down or reaches its destination. Assuming there is always a missile within the range dsilo of the silo, the silo can shoot tinrange /treload times at missiles. This gives us the expected value of shot down missiles, mdown , to be mdown = tinrange /treload · Pdown where Pdown denotes the probability of a shot destroying the missile. With m = mhit + mdown we get: max(

m − mdown mhit ) = max( ) m m m − tinrange /treload · Pdown = max( ) m tinrange /treload · Pdown ) = 1 − min( m tinrange ) · Pdown /treload = 1 − min( m

Thus by minimising tinrange and maximising m we can optimise the efficiency of an attack. As the speed of missiles is fixed, we have to minimise tinrange by synchronising the launch of missiles such that they hit their targets at the same time. Therefore an attack is most efficient if it involves as many missiles as possible hitting their target in a minimal time frame. While a fully concurrent planner would eventually be able to solve the task of synchronising an attack, this can more easily be achieved by wrapping the explicit code for this task into a meta action. We chose the latter option and implemented a conventional algorithm for this task, as depicted in figure 2. As the missile launching unit types — bombers, silos and subs — have several restrictions concerning range, speed and unavailability for defence while launching missiles, it is important to carefully choose the attack time, attack position and select the right units for the attack:

76

Chapter 8. Carrying Out Game Plans • Submarines have to be within firing range of the target2 . Once they surface to launch, they are very vulnerable and should therefore ideally be protected by other ships. The speed of submarines is very slow, therefore early estimation of the attack position is crucial (see section 8.4). • Silos reveal their position upon launch. This can be a tactical advantage for opponents, as they would otherwise have to actively explore the territory of the player in order to acquire the positions of structures like silos. Therefore some consideration has to be given to the optimal time for the attack. • Bombers have limited fuel and thus can only operate within a certain range3 . As they can be launched from mobile carriers, this range can be extended towards the opponent. Similar to submarines, this requires early estimation of possible attack positions. Furthermore, carriers and airbases have spare missiles, so that bombers can execute several attacks. This requires them to return to a landing pod after their attack, thus advantaging attacks that do not require involved bombers to run out of fuel.

By introducing a fitness function that weights and combines these criteria, we can compare and, through a cut-off threshold, select the best units for an attack. The weights and the threshold have to be determined by experimentation or learning.

Timing Attacks with Planes As discussed in the last section, efficient attacks should be synchronised such that missiles arrive at the same time. In particular, this requires bombers to arrive at the missile launch spot at a specific time. A lower bound for the attack time, or more precisely the impact time of the missiles, is given through the distance of the launch spot from the bomber, because bombers cannot change their speed or stop in mid-air. Furthermore, this requires bombers to fly detours to avoid reaching the target too early. As the path of a plane in DEFCON is defined through waypoints, we calculate one detour waypoint position pwaypoint such that the total distance 2 3

The firing range of a sub is 45 length units in the current version of DEFCON. The travel range of a bomber is 140 length units in the current version of DEFCON, which is about half of the map.

77

8.3. Low Level Actions find best attack position p; find all units U of type bomber, sub and silo within range of p; foreach u ∈ U do evaluate fitness function fu ; if fu < threshold then remove u from U end end find targets structures T close to p; find optimised mapping between T and U ; send attack commands; Figure 8.3: Synchronised Attack

pbomber

a c

ptarget b pwaypoint Figure 8.4: Detour of a plane

results in the correct arrival time. With the current bomber position pbomber , the target position ptarget , the required time span for the flight t and the speed of a bomber sbomber we require the following to hold: sbomber · t = kpbomber − pwaypoint k + kpwaypoint − ptarget k {z } | {z } | =:b

=:c

78

Chapter 8. Carrying Out Game Plans

Additionally we define a := kpbomber − ptarget k and assume a, b 6= 0, i.e., the bomber is not yet at the target and we have excess flight time, thus a < b + c. Using the law of cosines and γ denoting the angle between a and b we get: c2 = a2 + b2 − 2ab cos(γ)

(8.1)

Equation (8.1) is underdefined. To solve it, we define4 b := 2c and can then solve (8.1) for γ: c2 = a2 + b2 − 2ab cos(γ) ⇔ ⇒ ⇔ ⇔

a2 + b2 − c2 2ab a2 + b2 − c2 γ = arccos( ) 2ab a2 + (2c)2 − c2 γ = arccos( ) 4ac a2 + 3c2 γ = arccos( ) 4ac

cos(γ) =

(8.2)

As cos(x) is an even function (i.e., symmetric with respect to the y-axis), the position of the waypoint is only unique modulo a reflection in the line defined by pbomber and ptarget . This is because the sign of γ can not be inferred from the value of cos(γ). Therefore we have two possible positions to use as a waypoint. For each of these positions we check how “dangerous” it is for the plane by consulting the influence map, which is described below.

8.4 Influence Maps In real-time strategy games like DEFCON, obtaining as much knowledge as possible about the opponent is a crucial part of a successful strategy. Humans can easily handle such knowledge, predict behaviour and estimate unit movement and positions. A completely reactive bot like the existing AI is by definition oblivious 4

As the restrictions imposed on b and c affect the path that planes follow, we have to consider the effects of this restriction. Instead of using the chosen length ratio of 2 : 1 it is possible to use more elaborate constraints (see also 10.2, Limitations & Future Work).

8.4. Influence Maps

79

of such indirect information or uses very simple heuristics to pretend the possession of such knowledge. A big part of this knowledge concerns positional information like promising attack positions, possible opponent unit positions and areas free of threats. Influence Maps (see section 3.2.5) can easily provide this information and facilitate the retrieval of more abstract data by combining several layers to reason about them. In our implementation, we use two dimensional arrays to store the map information. The array can be pictured as a grid laid over the word, with the value of each array element representing the value of the influence map on the corresponding world position. We used an array size of 360 × 200, so each cell of the grid has the size of one degree longitude times one degree latitude on the world map. We have found that this size is suitable for most of the uses of influence mapping.

Finding Attack Positions We distinguish between fleet attack positions, which are positions on water that are close to where the bot is going to attack, and opponent attack positions, which denotes the area that will be the centre of the attack. We use the population centre for the latter. The Fleet Attack Position The fleet attack positions are determined by optimising a fitness function over all possible sea attack positions5 . This fitness function is given by a weighted sum of the sailing distance of the metafleet and the distance to the opponent attack position, such that short sailing distance and small distance to the opponent attack position are preferred.

5

Possible sea attack positions are a subset of all sea positions. They are predefined and equally distributed over the whole water body of the map, with a minimum distance of 15 length units for each node.

80

Chapter 8. Carrying Out Game Plans

The Population Centre When planning an attack, we need an estimation of where it might be most effective. Areas with a high population density are good targets, as the winning condition indicates. Thus we need a measure of the population density of a position. Cities in DEFCON are dimensionless points with a certain population value and as all population resides in cities, every position disjunct from a city has zero population. However, we can interpolate the average population of any position as a function of the population of and distance to nearby cities. This average is applied to every grid-cell of the influence map layer representing the population density. Using a simple maximum search then gives the area with the highest population density.

Defining the Danger of a Position When planning movement, there are better and worse areas to move to. For example, a plane should stay out of the range of hostile air defence silos as those can easily destroy planes, and a carrier is threatened by opposing battleships and bombers. We can get a notion of the “danger” a unit A is in by finding all the hostile units that can destroy A and are within attack range to A. Thus by defining a fitness function that takes the distance and attack success probabilities (see Table 3.3) into account, we are able to compare positions with respect to their danger value.

8.5 Movement Desire Model Context sensitive unit movement is an important task for the AI system. Moving units in such a way that they reach their target but do not enter areas where they are likely to be destroyed is common sense for human players but has to be carefully designed in a bot. When moving, a unit has to consider several factors that influence the desire of moving to a certain position: • Proximity to the target position: If the unit has a given target position, for example an opponent unit or a landing pod, positions closer to it should be

8.5. Movement Desire Model

81

preferred. • Distance to threats: The unit should stay out of the attack range of dangerous6 opponent units whose position is known or can be estimated. • Distance to targets: Conversely to the distance of threats, a position closer to opponent units to whom our units are dangerous is preferred. There are two approaches we applied for the movement desire model, namely direction based movement and position based movement. At first we implemented a position based model, which inspected surrounding positions and chose the position with the best combined fitness value. However, we experienced some performance problems when calculating several positions for all objects several times a second, as each position required us to in turn look at almost all opponent units. Therefore we also implemented a direction based model, which required only one such lookup.

Position Based Movement In this model, we look at each position within a certain radius to the unit and define a fitness function for this position. This fitness function comprises weights for each of the characteristics described in 8.5: distance to target position, to threats and to target units. These weights have been initially hand selected, but can be learned, too. However, the selected weights seem to work fine and other variables are deemed more important for learning.

Direction Based Movement To reduce the computation power required, the direction based movement model only iterates over every opponent unit once and creates vectors that carry the required information. The vector towards the target position is just the difference vector between the position of the unit and the target position. The vector that points towards threats and towards target units is calculated using the same 6

We can define unit A being dangerous to unit B when the chance of A destroying B is higher than B destroying A.

82

Chapter 8. Carrying Out Game Plans

method, differing only in the selection of opponent units. We iterated over all opponent units and created an average direction vector of those units that pose a threat or are a target.

8.6 Ship Formation The differing purposes and hit points of naval units motivate relative ship formation. For example, an exposed carrier is nearly defenceless and an easy target for bombers and battleships. But if it is behind some battleships that keep the enemy at a reasonable distance, they can launch bombers and fighters which can pose a immediate threat to opposing fleets. Additionally, each lost carrier means 5 fewer missiles, which has a potentially negative effect on the strength of attacks and might therefore lower the chances of winning. On the other hand, carriers are able to detect submerged submarines, which are then an easy target to them. Undetected, submarines can devastate battleships easily or sneak to the coastline and launch their missiles.

Figure 8.5: Fleet formation in DEFCON with front direction shown

To be able to respond to threats and appearing targets, the position of a ship

8.6. Ship Formation

83

within a fleet or of small fleets of the same kind within a larger group of fleets (a metafleet) should be dynamically evaluated and changed, if required. There are several approaches to this that sound reasonable: Metafleet controlled: The metafleet has the control over the relative placement. All fleets are placed relative to a position that indicates the “position” of the metafleet. This relative placement requires a direction for alignment, e.g., a front where the ships are facing. This can be the direction in which the fleet is moving or where opponents are located. Fleet controlled: The relative placement is not “globally” controlled by the metafleet but is the task of every fleet. Each fleet evaluates its position against the other fleet (and where the front is) and moves accordingly. Thus this decentral control resembles flocking behaviour. Ship controlled: Each ship tries to bring itself into a position relative to the surrounding ships that it deems advantageous. This is in fact the same as fleet controlled formatting with fleets of the size of one.

Fleet Controlled Formation Controlling the formation as a vector towards the optimal position can easily be integrated with the movement desire model explained in section 8.5. To realise formation through flocking behaviour, we have to consider three aspects introduced by Craig [1987]: Separation: Avoid getting too close to other units. When units overlap, they might be destroyed when any other overlapping unit gets destroyed. Iterating over all other fleets and checking the distance towards them takes care of this aspect. Alignment: Steer in the general direction of the other flock members. This leads to approaching the same target area of all members. This is already realised in the movement model for DEFCON, as fleets want to steer in the direction of the target.

84

Chapter 8. Carrying Out Game Plans

Cohesion: Move towards the average position of other flock members. In DEFCON, this can easily be realised by evaluating an average distance vector to all other flock members and including this vector as a desire in the movement desire model. This behaviour model has been implemented into DEFCON during our development. However, the resulting behaviour was not very pleasing. Although the fleets started to aggregate as expected, the emerging behaviour proved to be unstable. Once the distance of a fleet grew too big from the average flock position, it moved towards it, at the same time pushing other fleets away as it approached the separation threshold. This motion induced a chain reaction as the pushed fleets started pushing other fleets. The overall impression of the flock resembled a fish shoal more than an ordered fleet. Introducing movement thresholds that stopped movement if the above mentioned desires were too weak did not improve the situation much; when a new target was set, the same chaotic, disordered behaviour appeared when the fleet arrived at the target or had to pass through sea straits. Therefore, we abandoned this approach. Although flocking is a promising AI technique for controlling the movement of groups, it proved inadequate here because the emerging flocking behaviour was too “alive” for a group of fleets that belongs to a simulated “global superpower”7 .

Metafleet Controlled Formation The top-down approach of controlling the behaviour of all ships through the superordinate metafleet stands in contrast to the flocking model. Although the flocking model allows a more flexible and responsive behaviour, a central command of the fleets can produce more ordered and stable results. Given a position and a direction, we define a formation subdividing it into columns and rows and populating these “cells” with fleets. An example is shown in Figure 8.5 on page 82, where the grey arrow indicates the facing direction and the fleets are subdivided into three lines of four, five and three fleets. 7

We are aware that these criteria are not hard scientific facts, however creating a bot for a computer game requires behaviour decisions to be made on how sensible and “human” actions appear to be.

8.7. Resolving the Assignment Problem

85

The fleets are commanded to form up when moving to a target without interruption. Once an opposing force is encountered, the movement is suspended and the fleet formation loosened in order to hand over the fleet control to the movement desire model, which is better suited to handle tactical movements and small scale positioning.

8.7 Resolving the Assignment Problem One of the problems arising when building an AI bot is the need to optimise associations between units and targets. We encountered the following cases where such optimisation is useful: Bomber – target allocation: When initiating an attack on several targets, the problem of good allocation of bombers to targets arises. The meaning of good allocation is given through a fitness function that takes distance and remaining bomber range into account. A random allocation is used as a starting point for the optimisation process, which has to find an mapping of bombers that is optimised with respect to the used fitness function. Bomber – landing pad allocation: After releasing a missile, the existing unit AI causes a bomber to select the nearest landing pad that has spare capacity. This behaviour can be problematic when bombers are releasing their missiles and are closest to the same landing pad and its capacity is too small to harbour them all. The consequence is that all the excess bombers select the next closest landing pad to land on once the first landing pad is filled up, and so forth. Often the limited range causes bombers to run out of fuel and crash. Here, a mapping of bombers to landing pads optimised w.r.t. to a fitness function that takes distances, remaining fuel and capacities into account poses a significant improvement. The underlying problem of these two cases is called the assignment problem, a fundamental combinatorial optimisation problem known to be NP-complete for non-linear fitness functions [Wilhelm, 1987, Osman, 1995]. Therefore, the optimal solution is not efficiently computable. However, an approximation is sufficient for the men-

86

Chapter 8. Carrying Out Game Plans

tioned cases. The simulated annealing algorithm (see 2.3.2) is a fast and easy to use algorithm for that purpose. The parameters required for an application of simulated annealing are the selection of a state space, the neighbour selection method and the annealing schedule. The state space is clearly defined by the two sets of positions, every mapping that is bijective is a valid state. The neighbour selection is carried out by initially selecting two random elements of the second set, the current element and the partner element, and then subsequently choosing a new swap partner for the current element until no more partners can be found. Then the process is repeated. The initial annealing factor is set to 50% (i.e., we allow two associations to have a 50% worse rating after swapping the associated elements), the annealing schedule is chosen by reducing the annealing factor by 10% every time all elements have been currentelement once. The source code of the annealing algorithm used is given in Appendix A.1.

9 Experimentation and Results During the development of the planning actions, we performed tests to evaluate the improvement of our algorithms and to find out the main indicators for the outcome of a game. These indicators were studied to reveal flaws in the algorithms and to guide the next development cycle. After the implementation was finished, running as many games as possible was required to fill the case base with initially random plans. These examples could then be used to train a decision tree algorithm to generalise and predict successful plans as well as to propose new plans for a guided optimisation process as described in section 5.2.3. The first section of this chapter describes the setup for each of the tests performed and which statistics were recorded. In section 9.2, we present the results of these tests and a summary of the measured variables. Section 9.3 presents the analysis of the retrieved results and the ramifications for the project.

87

88

Chapter 9. Experimentation and Results

9.1 Test Setup General Performance Test To measure the performance of our implementation during the development process, we set up two player games with our bot playing against the existing AI. As there are 6 territories (see section 3.5.1), there are 30 possible starting configurations for two players. We performed at least 10 tests for each of these configurations. This was mainly limited by the time required for each game, which was on average roughly 90 seconds on our workstation1 . This meant one test run took about 5 hours. Later in the development process, we were able to use the lab workstations provided by the Computing Department of Imperial College, which enabled us to increase the test volume through distributing the tasks. As dynamic planning and individual plans for each configuration were implemented last, we used a single, hand selected static plan for the tests we conducted during the early development process: • Test 1 was run after the basic planning engine was implemented and synchronous attacks were possible. • Test 2 was run after further improvements to the basic actions and ship formation was implemented. • Test 3 was run after the case base was implemented and the starting position of ships was influenced by previous games. • Test 4 used the data collected while running random plan games described in chapter 6. • Test 5 was run after the decision tree algorithm for generalising and selecting plans was implemented. • Test 6 was run after the learning analysis was finished and the optimal case base size was determined. 1

We used a personal computer with a Pentium class CPU clocked at 1.6 GHz and 1.25 GB of Memory

9.1. Test Setup

89

Variable Correlation Test To find out which implementation variables and game units are most influential to the outcome of a game, we recorded additional information along with the tests described in 9.1, in particular: Own unit count at the end of the game. This gives the units destroyed during the game, because the number of units each player starts with is fixed and known. Units include stationary structures, ships, planes and remaining nukes. Time the game took in seconds. Distance of the territories in the configuration. The distance is measured between the centres of two territories. Opponent unit count at the end of the game. These values can then be understood as discrete random variables and their correlation can be estimated using the Pearson product-moment correlation coefficient rxy , defined in 9.3 below.

Learning Test To test the effectiveness and efficiency of the implemented learning algorithms we performed gradual tests with increasing case-base size out of which the decision trees for the plan generation were created. To achieve strict separation of testing and learning, we disabled case storing, so that each test match had the same number of training cases. We started with 5 cases per configuration, and increased the amount by 5 for every subsequent test. For each test we ran 150 matches and recorded the percentage of winning configurations.

Simulated Annealing Test To estimate valid parameters for the simulated annealing algorithm described in section 2.3.2 and used in section 8.7, we ran 150 matches against the existing AI

90

Chapter 9. Experimentation and Results

with 35 cases in the case base (using the result analysed in section 9.3 of the learning test described above) with differing settings on annealing parameters, namely on the starting temperature S and on the cool-down rate c, given in table 9.1. The first test was run entirely without a simulated annealing algorithm, i.e., the set relations to optimise were returned unchanged in the algorithm. Test 1 2 3 4 5

S 0 0.3 0.5 1.0 1.0

c 0 0.75 0.9 0.95 0.99

Table 9.1: Test values for simulated annealing algorithm parameters.

91

9.2. Results

9.2 Results General Performance Test Results The results of the tests are shown in Figure 9.1. Recall that we ran 10 matches for each of the 30 configurations per test.

80%

70%

Configurations Won

60%

50%

40%

30%

20%

10%

t e a s B

ti O p

n io

m a l

T re e

C

a s

e

L e a rn

P m o d a n R

D e c is

T e s

t in

la n

s

g

T e s

T e s

t

t T e s e a s B e a s

B

a s

ic

C

F o F le e t

B

a s

ic

P

la n

n

rm a ti

o

in

n

g

T e s

T e s

t

t

0%

Figure 9.1: Percentage of configurations with more games won than lost.

92

Chapter 9. Experimentation and Results

Variable Correlation Test Results Each variable is correlated against the difference in our bot’s score and the existing AI bot’s score of the respective game. The figures below show the correlation of the count of lost or remaining units (corresponding to a negative or positive correlation) to the game score. Figure 9.2 shows the Pearson product-moment correlation coefficient for the recorded game variables related to the bot, and figure 9.3 gives the correlation of the time the corresponding match took to the score, and the correlation of the distance between the two terrains to the score2 . Figure 9.4 shows the correlation for variables related to the opponent.

lost carriers remaining radar lost airbases remaining subs lost bombers lost fighters lost battleships lost missiles remaining silos

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 9.2: Correlation of own unit count to score.

2

A high correlation of the distance to the score indicates that there is a relationship between how far the two players are apart and the outcome of the game.

93

9.2. Results

Time

Distance

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 9.3: Correlation of time and distance to score.

lost carriers lost battleships lost missiles lost fighters lost bombers lost silos lost radar lost airbases lost subs

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 9.4: Correlation of opponent remaining unit count to score.

Type Time Silos Airbases Radar Battleships Carriers Subs Nukes Fighters Bombers Distance

Own 0.01 0.33 -0.03 0.02 -0.2 -0.01 0.05 -0.25 -0.11 -0.06 -0.47

Opponent 0.01 -0.24 -0.29 -0.28 -0.16 -0.16 -0.32 -0.17 -0.21 -0.24 -0.47

Table 9.2: Correlation between measured variables and game score.

94

Chapter 9. Experimentation and Results

Learning Test Results The test results are shown in Figure 9.5. For each case base size the graph shows the amount of configurations with more matches won than lost.

80% 70% Configurations Won

60% 50% 40% 30% 20% 10% 0%

5

10 15 20 25 30 35 40 45 50 55 60 65 70 Training Cases in Case Base

Figure 9.5: Number of winning configurations versus size of training data

95

9.2. Results

Simulated Annealing Test Results The number of winning configurations for each test is shown in figure 9.6, and the mean score differential (i.e., the difference in bot and opponent scores) of the outcome of the matches for each test is shown in figure 9.7. 80% 75% 70% 65%

Configurations Won

60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0%

5)

g

lin ea

No

n An

w Lo

n An

M

.7 =0 , c

.3 =0 (S

g

lin ea

i ed

um

g

lin ea

n An

gh Hi

g

lin ea

n An

.9 =0 , c

.0 =1 (S

gh Hi

ry Ve

9)

5)

)

.9 =0 , c

.5 =0 (S

g

lin ea

.9 =0 , c

.0 =1 (S

n An

Figure 9.6: Won configurations per test. The test values for starting temperature S and cool-down rate c are displayed.

35

Mean Score Differential

30

25

20

15

10

5

0

5)

g

lin ea

No

n An

g

n ali ne

w Lo

An

M

.7 =0 , c

.3 =0 (S

i ed

um

g

An

n ali ne

5)

)

.9 =0 , c

.5 =0 (S

gh Hi

g

n ali ne

An

ry Ve

.9 =0 , c

.0 =1 (S

gh Hi

g

n ali ne

9)

.9 =0 , c

.0 =1 (S

An

Figure 9.7: Mean score differential per test. The test values for starting temperature S and cool-down rate c are displayed.

96

Chapter 9. Experimentation and Results

9.3 Analysis General Performance Test Analysis As expected, the results in Figure 9.1 show that the efficiency improved throughout the development as we built more sophisticated bots. However, there is still some interesting information in the graph. The first three tests were carried out with hand tailored plans, showing that we could achieve a maximum of 63% winning configurations against the existing AI. There is a big improvement as we implemented the case based fleet positioning for the third test, showing that knowledge from past matches is very valuable. The random plans test shows a drop in the percentage of won configurations, only half the games were won. This is reasonable, considering that we did not optimise the used plan at all, but rather used a random plan that often led to suboptimal behaviour. This suboptimal behaviour is not a negative side effect but rather the intention of this test-run, as we created the case-base to learn from, and negative examples are required to successfully build a decision tree learner. The last two tests not only show the superiority of our bot against the existing AI, but also that automatically generated plans can achieve better performance than hand-crafted ones. This is an important result that supports the argument for use of machine learning in the context of DEFCON in particular and strategy games in general.

Variable Correlation Test Analysis Definition 9.1. Let X and Y be discrete random variables. Let xi and yi , i = 1, . . . , n be Measurements of X and Y . The Pearson product-moment correlation coefficient rxy is defined as: rxy =

Pn

−x ¯)(yi − y¯) , (n − 1)sx sy

i=1 (xi

97

9.3. Analysis

where x ¯ and y¯ are the sample means and sx and sy are the sample standard deviations of X and Y . The Pearson coefficient gives the best estimate of the correlation of X and Y . The equation can be rewritten to rxy

P P P n xi yi − xi yi q q = , P P P P n x2i − ( xi )2 n yi2 − ( yi )2

where each sum is from i = 1, . . . , n. Using this equation, we calculated the estimated correlation of measured test statistics to the game outcome, which is best described as the difference between the scores of both players.

Correlation of Own Units The results show a strong correlation to the game score for these variables: Lost planes: The correlation indicates that it is beneficial to lose fighters and bombers during the game. Obviously, that does not mean we should send them on suicide missions, but rather shows that they have been used to fight against the opponent and launch missiles in hostile terrain. These actions often lead to the destruction of the plane. This also shows that games are not lost by losing all the units, but rather by inactivity, i.e. when planes are not used enough during the game, this allows the opponent to score undisturbed. Lost missiles: The fewer missiles that remain at the end of the game, the more have been launched3 . It is obvious that a high score requires many missiles to be launched at the opponent, as only missiles can hit cities. Lost battleships: The high correlation of lost battleships to the score suggests that it is beneficial for a player to sacrifice battleships in order to destroy other more valuable units as carriers and submarines. This also shows that aggressive behaviour may be preferable over defensive behaviour. 3

Some missiles might have also been destroyed, for example when a bomber was shot down before it could launch its missile.

98

Chapter 9. Experimentation and Results

Lost carriers: There is a small correlation of lost carriers to the score which can be explained through the fact that carriers are often grouped together with battleships in a metafleet. As we have seen, it is beneficial for the outcome of a game when battleships sacrifice themselves, which means that carriers might also get lost. Lost airbases: A possible explanation for the small correlation of lost airbases is the significant correlation of the distance to the score, which is explained below, and can be summarised as: the closer the opponent is, the higher the score will be. However, a close opponent is more likely to scout the positions of an airbase and the airbase is therefore more likely to be destroyed4 . Another explanation for this correlation is that airbases close to the opponent are more likely to inflict damage on the opponent, because planes can reach critical targets quickly. However, the opponent can conversely attack the airbase more easily if it is close to him, which explains why it can be beneficial to risk losing airbases by placing them in unsafe areas. Remaining silos: The high correlation indicates that the longer a player’s silos survive, the better his chances of winning are. This is due to the fact that silos contain 60 missiles with an unlimited range and are the only means of intercepting incoming missiles. These two factors contribute significantly towards the score, as it is calculated from own losses and destroyed opponent cities and emphasise the importance of initial structure placement. Remaining radar stations: Similar to the situation with silos, it is good to keep radar stations alive as long as possible to allow an earlier detection of threats.

Correlation of Opponent Units There is a fairly high negative correlation on all the surviving enemy units and structures, which means the better the score, the more of our units and structures have been razed by the opponent. Although there is a significant correlation, care 4

As we explained in 3.5, only structures that have been scouted, i.e., been within radar range of an opponent unit once, can be attacked by missiles.

9.3. Analysis

99

has to be taken concerning causality. It is not immediately clear if the destroyed units are a cause or rather an effect of a good score. We suspect the latter, as good play usually involves effective use of nukes and ships, thus being able to destroy many opposing forces. We hope to test this suspicion with more experimentation, in future work.

Correlation to other Data There is almost no correlation to the time a game took, both very long and very short games can lead to victory. The correlation of the distance between the terrains of the two parties and the score is significant. The correlation is negative (as shown in Table 9.2, page 93), meaning that the smaller the distance gets, the higher are the chances of a high score for the bot. This observation can be explained by the plan the existing AI bot has. The existing AI bot is programmed to attack early in the game with all his silos. At close range, this behaviour can be used by our bot to initiate a counter attack on the then exposed silos, but at a bigger range, the fleet needs a longer time to get into attack range, which is often enough to allow the opponent silos to switch back to air defence mode. After this, more missiles will be intercepted leading to fewer missiles hitting their targets, eventually resulting in a lower score.

Learning Test Analysis The first test has a winning ratio of 46.6%. With only 5 plans in the case base, the resulting decision tree is too general to provide much help for the plan generation. The fact that the winning ratio is 3.4% lower than the winning ratio with purely random plans can be explained through random variation of the opponent behaviour and plan generation. Another explanation for this difference is the small amount of knowledge the 5 plans provide, it is likely that misleading features are found important which lead to scores that are worse than the scores from matches with random plans.

100

Chapter 9. Experimentation and Results

For the subsequent plans up to a case base size of 35 cases, the winning ratio grows to 76.6%. The growth is roughly linear and clearly shows the effects of the decision tree generalisation on the outcomes of the matches. The generated plans perform better than manually generated plans, as discussed in section 9.3. For tests with a case base of more than 35 cases, the performance does not improve further, in fact there is a decrease of up to 8% for the tests with a size of 45 and 65. This decrease can be explained by the decision tree generation procedure used. To understand this, we analyse the decision-trees generated with 35 and 40 training examples for one configuration where the test with 35 training examples succeeded and the test with 40 failed. The trees are shown in Figure 9.8 and Figure 9.9 on pages 101 and 102, respectively5 . Both trees split on the starting zone first, clearly marking zone two (the weak zone) as inferior. The tree built from 35 cases splits on the attack time for both other zone values, with the fittest branches being 5000 and 6000 seconds, respectively. The tree has leaf nodes in both cases, meaning that the rest of the plan is generated randomly. The tree generated from 40 examples, however, has one branch of the zone attribute node splitting on the follow-up state (“new state”) of the first metafleet and the other branch splitting on the first attack time, like the tree of 35 examples. Herein lies the difference of the success of the two trees. The branch with the best fitness from the node with the “new state” has the value “avoid enemy”. This means that the fleets do not try to attack the opponent fleet but rather dodge them. This behaviour is potentially inferior to offensive behaviour, as the correlation tests in chapter 9 have shown. Due to the two plans that succeeded with this behaviour and the lack of counter-examples, this plan attribute value is more likely to be picked than with the random assignment happening in the 35-example decision tree. This evidence supports the argument that overfitting occurs here. The generated decision trees start to learn idiosyncracies from the training set which leads to suboptimal plan generation performance. An additional factor is the noise in the data, due to the randomness of the initialisation of the existing AI. Subsequent tests with more than 40 cases all show similar effects, with none surpassing the winning ratio of the test with 35 cases. 5

The trees have been pruned to a depth of two for readability and the fitness values of leaf nodes have been highlighted.

9.3. Analysis

1

0

First Attack Time Gain: 0.40 Plans: 10

First Attack Time Gain: 0.39 Plans: 15

3000

0 Entropy! avgFit: 0.340 Plans: 3

2

First Attack Time Gain: 0.32 Plans: 10

2000

MetaFleet 1: Initial State Entropy: 0.92 Fitness: −0.04 (66% won) Plans: 3

2000

5000

6000

0 Entropy! avgFit: −0.120 Plans: 1

MetaFleet 1: New State Entropy: 1.00 Fitness: −0.74 (50% won) Plans: 4

6000

3000

3000

0 Entropy! avgFit: 0.738 Plans: 5

5000

4000

4000

MetaFleet 1: New State Entropy: 0.81 Fitness: 0.40 (75% won) Plans: 4

0 Entropy! avgFit: 0.360 Plans: 1 0 Entropy! avgFit: 0.740 Plans: 1

0 Entropy! avgFit: −0.105 Plans: 2

4000

2000

5000

MetaFleet Count Entropy: 1.00 Fitness: 0.30 (75% won) Plans: 4

0 Entropy! avgFit: −0.550 Plans: 1

0 Entropy! avgFit: 0.530 Plans: 2

6000

0 Entropy! avgFit: −1.960 Plans: 1

0 Entropy! avgFit: −1.060 Plans: 1

0 Entropy! avgFit: −0.140 Plans: 2

101

Figure 9.8: Generated decision tree from case base with 35 training examples

MetaFleet 1: Start Zone Gain: 0.20 Plans: 35

102

1

0

MetaFleet 1: New State Gain: 0.45 Plans: 12

First Attack Time Gain: 0.40 Plans: 17

Move Direct

First Attack Time Entropy: 0.54 Fitness: 0.49 (87% won) Plans: 8

2

MetaFleet 1: Battleships Gain: 0.29 Plans: 11

2000

MetaFleet 1: Initial State Entropy: 1.00 Fitness: −0.34 (50% won) Plans: 4

Avoid Enemy

1

0 Entropy! avgFit: 0.240 Plans: 1

6000

Await Enemy

0 Entropy! avgFit: 0.845 Plans: 2

4

3000

0 Entropy! avgFit: 0.738 Plans: 5

3

MetaFleet 1: Wait Planchange Entropy: 0.54 Fitness: −0.77 (12% won) Plans: 8

4000

0

5000

0 Entropy! avgFit: −0.455 Plans: 2

MetaFleet Count Entropy: 1.00 Fitness: 0.30 (75% won) Plans: 4

0 Entropy! avgFit: 0.520 Plans: 3

0 Entropy! avgFit: −0.550 Plans: 1

0 Entropy! avgFit: −1.960 Plans: 1

0 Entropy! avgFit: −0.050 Plans: 1

Chapter 9. Experimentation and Results

Figure 9.9: Generated decision tree from case base with 40 training examples

MetaFleet 1: Start Zone Gain: 0.18 Plans: 40

9.3. Analysis

103

Possible Remedies The overfitting in the discussed example arises partly from the fact that the decision tree algorithm groups all cases together once they all are classified the same. In a scenario with 30–40 cases, this can happen after only a fraction of the attributes have been assigned. That leaves many attributes undefined. Using random initialisation is in theory a valid approach here, as we do not have any more information. However, this randomness can lead to suboptimal plans as the attributes set by the tree might be insufficient to guarantee satisfactory results. Instead of choosing a random value for undefined attributes, an approach that employs choosing the remaining attributes from a random case in the same leaf node might be more appropriate. However, this reduces the number of possible plans drastically, as there would be almost no exploration. A hybrid approach similar to the explorationexploitation issue discussed in 5.2.3 might be required. Another attempt may be to change the decision tree algorithm. The ID3 algorithm favours short trees over deeper ones and tries to place the most distinctive attributes at the top. While this is a reasonable approach, an alteration of the attribute selection method (like introducing backtracking) might improve its accuracy.

Simulated Annealing Test Analysis The first test, in which we disabled simulated annealing resulted in 53% winning configurations. This is a very low value compared to the other tests, where simulated annealing was enabled with differing parameters. All the other tests have a mean win-ratio of 73% with a deviation of at most 4%, which indicates an almost constant winning ratio. The win-ratio decreases from 76.7% for the medium annealing test to 69% on the high annealing test. This behaviour cannot be explained by the annealing algorithm itself, as the optimisation process cannot justify a lower score with a better optimisation result. Rather, the random behaviour of the existing AI is suspected to have caused this decline. An additional factor is the plan generation: Plan attributes not classified by the decision tree algorithm are instantiated randomly and

104

Chapter 9. Experimentation and Results

thus can lead to a different outcome, as we also remarked in section 9.3. We also recorded the mean score differential of all matches per test. The result of the first test shows a lower result than the other tests, which again indicates the positive effect of a plane-target and plane-landing pad optimisation on the outcome of a game. The mean score differential grows with a longer optimisation process, the results indicate a diminishing improvement that converges to a mean score differential between 35 and 40. This indicates that a good trade-off between speed and accuracy of the optimisation algorithm can be achieved by choosing the parameters as they are in the medium annealing test, namely a starting temperature of 0.5 and a cool-down rate of 0.9.

10 Conclusion In this project, we studied the application of AI methods to the real-time strategy game DEFCON. The task was to write an AI bot that could beat the existing AI developed by Introversion. We studied the game system and the existing bot design of DEFCON to expose weaknesses and propose improvements, and researched AI techniques from machine learning and other AI problem solving methods to evaluate their applicability. We developed a two-tiered bot design: On the bottom layer there are enhanced lowlevel actions that make use of in-match history and information from recorded games to estimate and predict opponent behaviour and manoeuvre units accordingly. The information is gathered in influence maps and used by actions such as synchronous attacks, a movement desire model, fleet formation and target allocation. On top of these tactical means, we built a learning system that is employed with the fundamental long-term strategy of a match. We used a case base to record and retrieve matches and a decision tree algorithm to create a new high-level plan. The use of variable correlation tests during the development helped us to focus on important aspects and adjust game parameters. We implemented this design in a bot for DEFCON and evaluated its performance against the existing bot from Introversion. The results indicate that our bot is superior and can beat the existing bot consistently, with a success rate of over 75%.

105

106

Chapter 10. Conclusion

10.1 Applications Although the developed bot is in itself already an application of the used techniques, the underlying concept of combining artificial intelligence methods to benefit from synergy effects is applicable to many problems, including, but not restricted to other computer games that have similar requirements of optimising and planning actions to be able to compete with skilled humans. In particular, the combination of case bases and decision trees to retrieve, generalise and generate plans is a promising approach that is applicable to a wide range of problems that exhibit the following properties: Discrete attributes: The problem state space must be discrete or discretisable. This is required for decision tree algorithms to build trees. Attributes with a low cardinality are preferable, as a high number of possible values can cause problems with the used decision tree algorithm. Recognisable opponent states: Problem instances must be comparable through a similarity measure, which is required for retrieving cases. In a game domain, it should be based on opponent attributes or behaviour to allow an adaptation to take place. Static problem domain: The interpretation of a plan has to be constant, or else the similarity measure might retrieve irrelevant cases that show similarity to an obsolete interpretation. This also means that, for a hierarchical planner, lower level plans should not change much when reasoning on high-level plans, as the case base is biased towards previously successful plans. Availability of training sets: The problem has to be repeatable or past instances of problem-solution pairs have to be available to train the case base.

10.2. Limitations & Future Work

107

10.2 Limitations & Future Work Multiplayer Scenarios DEFCON supports up to 6 player matches. Although our current implementation of the AI Bot would be able to run in such a game with very little modification, this was not made part of this project. The main scope of this project was to develop a bot that could beat the existing AI bot through the use of machine learning and other advanced AI methods. The addition of more players to a match increases the degree of freedom (thus slowing down the learning process) and does not immediately add more value with respect to the use of AI methods. An important aspect emerging with games that have more than two players is the ability to form alliances. Although this is not a negligible aspect of DEFCON, it is not easy to create smart learning algorithms for it, especially as the original AI bot of DEFCON ignores alliances completely. Therefore human interaction will be required to gather training data. There is a project1 that introduces saving and replaying DEFCON games, which can be used to gather game data and we could use offline learning to enable an AI bot to handle alliances properly.

Unit Movement Patterns Not all ideas for good bot behaviour could be realised within the given time, and we used simplifying heuristics for some algorithms that we deemed less important. The chosen movement pattern for planes, discussed in 8.3, can be further enhanced to better take threats to the plane into account. The current implementation creates one waypoint at a certain distance from the origin of the plane and checks two possible routes for threats. In future versions, this behaviour could be elaborated to, for example, reflect a fitness value given by the influence map such that planes actively dodge threats and still arrive at the specified time.

1

Dedicated Server for DEFCON, http://moosnet.homelinux.net/~manuel/defcon/dedcon/ (last checked and working: 26.08.2007)

108

Chapter 10. Conclusion

Reinforcement Learning A metric for evaluating the bot’s behaviour while a match is running is required to apply reinforcement learning to DEFCON. The use of feedback during the lifetime of an “individual” is a main factor that distinguishes RL from genetic algorithms, where the feedback is given after the experiment. This post-match feedback has been implemented in the project. Further research can be applied to research the feasibility and gains of using reinforcement learning.

Avoiding Overfitting The learning algorithms start to overfit at about 40 training examples, where the generated decision trees degenerate due to random features becoming prevalent in the attribute selection process. In 9.3 we discussed possible remedies for this problem, namely modifying the ID3 algorithm or changing the attribute selection function after the decision tree classification.

10.3 Final Summary We have successfully reached our goal to design and implement a novel combination of reasoning systems for playing complex video games. The joint use of case-based reasoning, decision tree algorithms and hierarchical planning was for the first time applied to the field of real-time strategy games. A two-tiered approach helped to tackle the complexity given by the video game DEFCON. We combined high-level strategical planning and learning with sophisticated tactical low-level actions such as synchronous attacks, fleet formations, a movement desire model and case based structure placement. The presented application of these techniques resulted in the first learning bot for DEFCON, which was shown to be able to beat the AI bot built by Introversion consistently. The extensive training and parameter testing helped to optimise the behaviour and increase the overall winning ratio of the bot to over 75%.

A Selected Source Code The implementation carried out in this project was realised with the source code of the commercial video game DEFCON. Therefore, the source code is not public. To give some insight into the done work, we present excerpts of the algorithms created here.

A.1 Simulated Annealing Algorithm void T o o l s : : O p t i m i s e A s s o c i a t i o n s ( L L i s t ∗> ∗ a , L L i s t ∗> ∗ b , F i x e d ( ∗ c o s t ) ( L L i s t ∗ , L L i s t ∗ ) ) { i f ( a−>S i z e ( ) < 2 ) { // O p t i m i s e A s s o c i a t i o n s f o u n d t o f e w e l e m e n t s t o s o r t , a b o r t i n g return ; } if {

( a−>S i z e ( ) != // A r r a y s return ;

b−>S i z e ( ) ) should

h a v e same

size

} // c r e a t e l o o k t h r o u g h a r r a y DArray s h u f f l e d I n d e x ; s h u f f l e d I n d e x . S e t S i z e ( a−>S i z e ( ) ) ; f o r ( i n t i = 0 ; i < a−>S i z e ( ) ; i ++) s h u f f l e d I n d e x . PutData ( i , i ) ; // s h u f f l e t h e i n d i c e s f o r ( i n t i = 0 ; i < a−>S i z e ( ) ; i ++) { i n t s h u f f l e W i t h = s y n c r a n d ( ) % a−>S i z e ( ) ; i n t tmp = s h u f f l e d I n d e x [ i ] ; s h u f f l e d I n d e x . PutData ( s h u f f l e d I n d e x [ s h u f f l e W i t h ] , s h u f f l e d I n d e x . PutData ( tmp , s h u f f l e W i t h ) ; }

i);

F i x e d a n n e a l F a c t o r = F i x e d : : Hundredths ( 5 0 ) ; Fixed c u r r e n t C o s t = GetCurrentCost ( a , b , cost ) ; AppDebugOut ( ” F i t n e s s b e f o r e : %f \n” , c u r r e n t C o s t ) ; bool f i n i s h e d = f a l s e ;

109

110

Appendix A. Selected Source Code

int n = 0 ; int currentIndex = s h u f f l e d I n d e x [ n ] ; int counter = 0 ; int couldnt = 0 ; int i t e r = 0 ; int partnerIndex = 0 ; while {

(! finished ) Fixed

t h i s C o s t = c o s t ( a−>GetData ( c u r r e n t I n d e x ) , b−>GetData ( c u r r e n t I n d e x ) ) ;

// f i n d a random swap p a r t n e r do { p a r t n e r I n d e x = s y n c r a n d ( ) % a−>S i z e ( ) ; } while ( p a r t n e r I n d e x == c u r r e n t I n d e x ) ; // g e t t h e c o s t o f t h e swap p a r t n e r F i x e d p a r t n e r C o s t = c o s t ( a−>GetData ( p a r t n e r I n d e x ) , b−>GetData ( p a r t n e r I n d e x ) ) ; // c h e c k i f t h e c o s t i s a l l o w e d w i t h t h e c u r r e n t a n n e a l i n g f a c t o r i f ( c o s t ( a−>GetData ( c u r r e n t I n d e x ) , b−>GetData ( p a r t n e r I n d e x ) ) + c o s t ( a−>GetData ( p a r t n e r I n d e x ) , b−>GetData ( c u r r e n t I n d e x ) ) < ( thisCost + partnerCost ) ∗ (1 + annealFactor ) ) { // swap DoSwap ( b , c u r r e n t I n d e x , p a r t n e r I n d e x ) ; currentIndex = partnerIndex ; c o u n t e r ++; } else { n++; i f ( n == a−>S i z e ( ) ) { // c o o l down a n n e a l F a c t o r ∗= F i x e d : : Hundredths ( 9 0 ) ; n = 0; } currentIndex = shuffledIndex [ n ] ; counter = 0; } if {

( counter >

a−>S i z e ( ) )

a n n e a l F a c t o r ∗= F i x e d : : Hundredths ( 9 0 ) ; counter = 0; } ( a n n e a l F a c t o r <= F i x e d : : Hundredths ( 4 ) ) finished = true ; i t e r ++; if

} }

111

A.2. Selecting Optimal Structure Placement

A.2 Selecting Optimal Structure Placement void P l a n n e r : : S e l e c t B e s t S t r u c t u r e L a y o u t ( ) { Fixed b e s t F i t n e s s = 0 ; i n t b e s t F i t n e s s I n d e x = −1; Fixed weightNukesDestroyed = 5 ; Fixed weightObjectsDestroyed = 3 ; F i x e d w e i g h t N u k e s S h o t A t = F i x e d : : Hundredths ( 1 0 ) ; F i x e d w e i g h t N u k e s S p o t t e d = F i x e d : : Hundredths ( 1 0 ) ; F i x e d w e i g h t P l a n e s L a u n c h e d = F i x e d : : Hundredths ( 5 0 ) ; F i x e d w e i g h t P l a n e s Q u i c k l y L o s t = −5; F i x e d w e i g h t K i l l e d E a r l y = −10; F i x e d w e i g h t K i l l e d L a t e = −3; f o r ( i n t i =0; i < m enemyPlanLibrary . S i z e ( ) ; i ++) { Plan ∗ c u r r e n t P l a n = m enemyPlanLibrary . GetData ( i ) ; Fixed c u r r e n t F i t n e s s = 0 ; f o r ( i n t j = 0 ; j < c u r r e n t P l a n −>m r e c o r d e d S t r u c t u r e s . S i z e ( ) ; { RecordedStructure ∗ rs = c u r r e n t P l a n −>m r e c o r d e d S t r u c t u r e s . GetData ( j ) ; Fixed k i l l W e i g h t = 0 ; i f ( r s −>m t i m e S u r v i v e d > 0 ) { i f ( r s −>m t i m e S u r v i v e d < 4 0 0 0 ) killWeight = weightKilledEarly ; else killWeight = weightKilledLate ; }

j ++)

c u r r e n t F i t n e s s += w e i g h t N u k e s D e s t r o y e d ∗ r s −>m n u k e s D e s t r o y e d + w e i g h t O b j e c t s D e s t r o y e d ∗ r s −>m o b j e c t s D e s t r o y e d + w e i g h t N u k e s S h o t A t ∗ r s −>m nukesShotAt + w e i g h t N u k e s S p o t t e d ∗ r s −>m n u k e s S p o t t e d + w e i g h t P l a n e s L a u n c h e d ∗ r s −>m p l a n e s L a u n c h e d + w e i g h t P l a n e s Q u i c k l y L o s t ∗ r s −>m p l a n e s Q u i c k l y L o s t + killWeight ; } AppDebugOut ( ” Plan %d S t r u c t u r e F i t n e s s : %f \n” , i , if ( currentFitness > bestFitness ) { bestFitness = currentFitness ; bestFitnessIndex = i ; } } if }

currentFitness ) ;

( b e s t F i t n e s s I n d e x > −1) m b e s t S t r u c t u r e P l a n = m enemyPlanLibrary . GetData ( b e s t F i t n e s s I n d e x ) ;

112

Appendix A. Selected Source Code

A.3 Implementation of the ID3 algorithm d e c i s i o n T r e e N o d e ∗ B u i l d D e c i s i o n T r e e I D 3 ( L L i s t ∗p , d e c i s i o n T r e e N o d e ∗ p a r e n t ) { d e c i s i o n T r e e N o d e ∗newNode = new d e c i s i o n T r e e N o d e ( ) ; newNode−>m p a r e n t = p a r e n t ; int currentNodeId = m decisionTreeNodeCount ; m d e c i s i o n T r e e N o d e C o u n t ++; // l i s t o f t h e v a r i a b l e n u m b e r s ( 0 = f i r s t v a r i a b l e , 1 = s e c o n d v a r i a b l e , e t c ) L L i s t v a r i a b l e N u m b e r s U s e d ; GetUsedVariableNumbers ( p a r e n t , &v a r i a b l e N u m b e r s U s e d ) ; // c a l c u l a t e a v e r a g e f i t n e s s o f a l l e x a m p l e s Fixed l a b e l V a l u e = 0 ; f o r ( i n t i = 0 ; i < p−>S i z e ( ) ; i ++) l a b e l V a l u e += p−>GetData ( i )−>G e t F i t n e s s ( ) ; l a b e l V a l u e /= p−>S i z e ( ) ; // no v a r i a b l e s u s e d=r o o t n o d e . s e t b i a s s o t h a t h a l f o f t h e e x a m p l e s i s p o s i t i v e i f ( v a r i a b l e N u m b e r s U s e d . S i z e ( ) == 0 ) m d e c i s i o n T r e e B i a s = l a b e l V a l u e ; // z e r o e n t r o p y means a l l e x a m p l e s a r e c l a s s i f i e d t h e same i f ( C a l c u l a t e I n f o r m a t i o n E n t r o p y ( p ) == 0 ) { newNode−>m v a r i a b l e N u m b e r = −1; newNode−>m l e a f N o d e L a b e l = l a b e l V a l u e ; return newNode ; } bool f i n i s h e d = f a l s e ; i n t s k i p = 0 ; i n t bestVarNumber = −1; F i x e d b e s t G a i n = 0 ; // c a l c u l a t e t h e p o s s i b l e i n f o r m a t i o n g a i n f o r e a c h f r e e v a r i a b l e while ( ! f i n i s h e d ) { i n t nextVarNumber = GetNextFreeVarNumber(& v a r i a b l e N u m b e r s U s e d , s k i p ) ; s k i p ++; i f ( nextVarNumber == −1) finished = true ; // no f r e e variable else { F i x e d t h i s G a i n = C a l c u l a t e I n f o m a t i o n G a i n ( nextVarNumber , p ) ; i f ( thisGain > bestGain ) { bestGain = thisGain ; bestVarNumber = nextVarNumber ; } } } // was t h e r e any g a i n > 0 ? i f ( b e s t V a r i a b l e N u m b e r == −1) { newNode−>m v a r i a b l e N u m b e r = −1; newNode−>m l e a f N o d e L a b e l = l a b e l V a l u e ; return newNode ; } else { newNode−>m v a r i a b l e N u m b e r = bestVarNumber ; // s p l i t e x a m p l e s i n t o s u b s e t s a c c o r d i n g t o v a l u e s o f b e s t v a r i a b l e L L i s t < L L i s t < Plan ∗ > ∗ > s u b S e t s ; S p l i t I n t o S u b s e t s ( bestVarNumber , p , &newNode−>m v a r i a b l e V a l u e s , &s u b S e t s ) ; L L i s t ∗ c h i l d r e n = new L L i s t () ; // f o r e a c h p o s s i b l e v a l u e , b u i l d a d e c i s i o n t r e e f o r ( i n t i = 0 ; i < s u b S e t s . S i z e ( ) ; i ++) c h i l d r e n −>PutData ( B u i l d D e c i s i o n T r e e I D 3 ( s u b S e t s [ i ] , newNode , childActive , f i l e ) ) ; } return newNode ; }

B Proofs Proposition B.0.1. The distribution P r(X = nchild ) = e ·

1 w(nchild ) + (1 − e) · P |C(nparent )| n∈C(nparent ) w(n)

is indeed a discrete probability distribution. Proof. We have to show

X

P r(X = u) = 1

u

where u runs through the set of all possible values of X. This set is exactly C(nparent ) and thus we have X

e·

u∈C(nparent )

=

X

u∈C(nparent )

= e·

X

e·

u∈C(nparent )

= e + (1 − e)

1 |C(nparent )| 1 |C(nparent )| 1 |C(nparent )|

+ (1 − e) · P

w(u)

n∈C(nparent ) w(n)

X

+

u∈C(nparent )

+P

(1 − e)

(1 − e) · P

n∈C(nparent ) w(n)

! w(u)

n∈C(nparent ) w(n)

·

X

!

(w(u))

u∈C(nparent )

= 1

113

114

Appendix B. Proofs

Proposition B.0.2. The information entropy of a set S relative to a binary classification is maximal for p+ = p− = 1/2. Proof. The entropy is defined as Entropy(S) ≡

c X

−pi log2 pi

i=1

and for c = 2 we have Entropy(S) = −p+ log2 p+ − p− log2 p− As p+ + p− = 1, we get Entropy(S) = −(1 − p− ) log2 (1 − p− ) − p− log2 p−

(B.1)

The derivative of Entropy(S) as a function of p− is d Entropy(S) = log2 (1 − p− ) − log2 p− dp− To find the maximum, we set log2 (1 − p− ) − log2 p− = 0 ⇔

log2 (1 − p− )

⇔

1 − p−

⇔

p−

= log2 p− = p− 1 = 2

As the second derivative of (B.1) is negative, we have the global maximum at p+ = p− = 1/2.

Glossary Notation

Description

ANN

Artificial Neural Network

CBR

Case-Based Reasoning

configuration

initial set up concerning player starting territories

DTG

Decision Tree Generalisation

existing AI

the AI bot originally developed by Introversion

fleet

fixed group of up to six ships

FSM

Finite State Machine

GA

Genetic Algorithm

GOAP

Goal-Oriented Action Planning

IM

Influence Map

match

one run of DEFCON

metafleet

group of fleets

player

one party in a match of DEFCON

RL

Reinforcement Learning

115

116

Glossary

Notation

Description

territory

one of six disjunct areas that belong to a player during a match of DEFCON

unit

moving or stationary object in DEFCON

Bibliography A. Aamodt and E. Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. Artificial Intelligence Communications, 7(1):39 – 59, 1994. James F. Allen and George Ferguson. Actions and events in interval temporal logic. University of Rochester, Dept. of Computer Science, Rochester, N.Y., 1994. C. Baekkelund. A brief comparison of machine learning methods. In Steve Rabin, editor, AI Game Programming Wisdom, volume 3. Charles River Media, 2006. Mathias Bauer. Acquisition of user preferences for plan recognition. In Fifth International Conference on User Modeling, pages 105–112, 1995. M. A. Bramer. Computer Game - Playing: Theory and Practice. Prentice Hall PTR, 1983. M. Campbell. Deep blue. Artificial intelligence, 134(1):57, 2002. V. Cerny. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45(1):41–51, 1985. 10.1007/BF00940812. John Horton Conway. Regular algebra and finite machines. Chapman and Hall mathematics series. Chapman and Hall, London, 1971. W. Reynolds Craig. Flocks, herds and schools: A distributed behavioral model. SIGGRAPH Comput. Graph., 21(4):25–34, 1987. 37406. I. Davis. Strategies for strategy game ai. Papers from the AAAI 1999 Spring Symposium on Artificial Intelligence and Computer Games, Technical Report SS-99-02:24–27, 1999.

117

118

Bibliography

Kutluhan Erol, James A. Hendler, and Dana S. Nau. Semantics for hierarchical task-network planning. University of Maryland, College Park, Md., 1994. Eleazar Eskin and Eric Siegel. Genetic programming applied to othello: introducing students to machine learning research. SIGCSE Bull., 31(1):242–246, 1999. 299771. Charniak Eugene and P. Goldman Robert. A bayesian model of plan recognition. Artif. Intell., 64(1):53–79, 1993. 164630. Michael Fagan and Pdraig Cunningham. Case-based plan recognition in computer games. In Lecture Notes in Computer Science. 2003. R. Fikes. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3/4):189, 1971. Michael Gelfond and Vladimir Lifschitz. Action languages. Electronic Transactions on AI, 3, 1998. Damin Isla. Probabilistic target tracking and search using occupancy maps. In Steve Rabin, editor, AI Game Programming Wisdom, volume 3. Charles River Media, 2006. Henry Alexander Kautz.

A formal theory of plan recognition.

University of

Rochester, Computer Science Dept., Rochester, NY., 1987. Boris Kerkez and Michael T. Cox. Incremental case-based plan recognition using state indices. Lecture Notes in Computer Science, 2080:291–305, 2001. S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, and I. B. M. Thomas J. Watson Research Center. Optimization by simulated annealing. IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y., 1982. D. E. Knuth. An analysis of alpha-beta pruning. Artificial intelligence, 6(4):293, 1975. J. Laird and M. van Lent. Human-level ai’s killer application: Interactive computer games. Laird, J.E. and van Lent, M. Human-level AI’s Killer Application: Interactive Computer Games. AAAI Fall Symposium Technical Report, North Falmouth, Massachusetts, 2000, 80-97., 2000.

119

Bibliography

J.E. Laird. Machine learning for computer games, game developers conference, 2005. Neal Lesh, Charles Rich, and Candace L. Sidner. Using plan recognition in humancomputer collaboration, 1999. 317331 23-32. Michael Mateas. Expressive ai: Games and artificial intelligence, 2003. T. Mitchell. Machine Learning. McGraw-Hill Higher Education, 1997. 541177. Hctor Muoz-Avila and Hai Hoang. Coordinating teams of bots with hierarchical task network planning. In Steve Rabin, editor, AI Game Programming Wisdom, volume 3. Charles River Media, 2006. Daniel Kenneth Olson. Learning to play games from experience : an application of artificial neural networks and temporal difference learning. PhD thesis, 1993. J. Orkin. Applying goal-oriented action planning to games. In Steve Rabin, editor, AI Game Programming Wisdom, volume 2. Charles River Media, 2004. J. Orkin. Agent architecture considerations for real-time planning in games. In Proceedings of Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE-05). AAAI Press, 2005. I. H. Osman. Heuristics for the generalised assignment problem: simulated annealing and tabu search approaches. OR Spektrum, 17(4):211, 1995. Michael Pfeiffer. Reinforcement learning of strategies for settlers of catan. In Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education, Reading, UK, 2004. Ingo Rechenberg.

Evolutionsstrategie; Optimierung technischer Systeme nach

Prinzipien der biologischen Evolution. Problemata, 15. Frommann-Holzboog, [Stuttgart-Bad Cannstatt, 1973. Ingo Rechenberg. Evolutionsstrategie ’94. PhD thesis, Frommann-Holzboog, 1994. Stuart J. Russell and Peter Norvig. Artificial intelligence : a modern approach. Prentice Hall series in artificial intelligence. Prentice Hall, Englewood Cliffs, N.J., 1995.

120

Bibliography

Jonathan Schaeffer, Neil Burch, Yngvi Bjornsson, Akihiro Kishimoto, Martin Muller, Robert Lake, Paul Lu, and Steve Sutphen. Checkers is solved. Science, page 1144079, 2007. B. Schwab. AI Game Engine Programming. Charles River Media, 2004. David Silver, Richard S. Sutton, and Martin Mller. Reinforcement learning of local shape in the game of go. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI07), Hyderabad, 2007. Pieter Hubert Marie Spronck. Adaptive game AI. SIKS dissertation series, no. 2005-06. UPM, Universitaire Pers Maastricht, [Maastricht], 2005. Pieter Hubert Marie Spronck. Dynamic scripting. In Steve Rabin, editor, AI Game Programming Wisdom, volume 3. Charles River Media, 2006. Richard S. Sutton, Andrew G. Barto, and Inc NetLibrary. Reinforcement learning an introduction, 1998 1998. Penny Sweetser. Environmental awareness in game agents. In Steve Rabin, editor, AI Game Programming Wisdom 3, volume 3. Charles River Media, 2006. Paul Tozour. Influence mapping. In Game Programming Gems, volume 2. Charles River Media, 2001. Paul Tozour. Using a spatial database for runtime spatial analysis. In Steve Rabin, editor, AI Game Programming Wisdom, volume 2. Charles River Media, 2004. Wikipedia.

Decision tree model, http://en.wikipedia.org/wiki/Image:

Decision_tree_model.png, 2005. Wikipedia.

Finite state machine, http://en.wikipedia.org/wiki/Image:

Finite_state_machine_example_with_comments.svg, 2007. M. R. Wilhelm. Solving quadratic assignment problems by simulated annealing. IIE transactions, 19(1):107, 1987. B. Williams. Model-based reactive programming of cooperative vehicles for mars exploration. Int. Symp. Artif. Intell., Robotics, Automation Space (ISAIRAS-01), Montreal, QB, Canada, 2001., 2001.

Artificial Intelligence - GitHub

Artificial Intelligence anoXmous

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING.pdf ...

Artificial intelligence: an empirical science