Investigative Data Mining in Fraud Detection

Viewer
Transcript

Investigative Data Mining in Fraud Detection

Transforming Minority Report from Science Fiction to Science Fact

Clifton Phua Honours Student [email protected] 2003

Overview (1)

•

Investigative Data Mining and Problems in Fraud Detection • •

•

Existing Fraud Detection Methods •

•

Definitions Technical and Practical Problems Widely used methods

The Crime Detection Method • • • • •

Comparisons with Minority Report Classifiers as Precogs Combining Output as Integration Mechanisms Cluster Detection as Analytical Machinery Visualisation Techniques as Visual Symbols

Investigative Data Mining in Fraud Detection

Overview (2)

•

Implementing the Crime Detection System: Preparation Component • • •

•

Implementing the Crime Detection System: Action Component • • •

•

Investigation objectives Collected data Preparation of collected data to achieve objectives Which experiments generate best predictions? Which is the best insight? How can the new models and insights be deployed within an organisation?

Contributions and Recommendations • •

Significant research contributions Proposed solutions

Investigative Data Mining in Fraud Detection

Literature and Acknowledgements Dick P K (1956) Minority Report, Orion Publishing Group, London, Great Britain. Abagnale F (2001) The Art of the Steal: How to Protect Yourself and Your Business from Fraud, Transworld Publishers, NSW, Australia. Mena J (2003) Investigative Data Mining for Security and Criminal Detection, Butterworth Heinemann, MA, USA.

T h e

T h e

Elkan C (2001) Magical Thinking in Data Mining: Lessons From CoIL Challenge 2000, Department of Computer Science and Engineering, University of California, San Diego, USA. Prodromidis A (1999) Management of Intelligent Learning Agents in Distributed Data Mining Systems, Unpublished PhD thesis, Columbia University, USA. Berry M and Linoff G (2000) Mastering Data Mining: The Art and Science of Customer Relationship Management, John Wiley and Sons, New York, USA.

T h e

Han J and Kamber M (2001) Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers. Witten I and Frank E (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java, Morgan Kauffman Publishers, CA, USA.

Investigative Data Mining in Fraud Detection

Investigative Data Mining and Problems in Fraud Detection

Investigative Data Mining in Fraud Detection

Investigative Data Mining - Definitions

•

Investigative •

•

Data Mining • •

•

Official attempt to extract some truth, or insights, about criminal activity from data Process of discovering, extracting and analysing of meaningful patterns, structure, models, and rules from large quantities of data. Spans several research areas such as database, machine learning, neural networks, data visualisation, statistics, and distributed data mining.

Investigative Data Mining • • •

Applied to law enforcement, Industry, and Private databases

Investigative Data Mining in Fraud Detection

Fraud Detection - Definitions

•

Fraud •

•

Diversity of Fraud • • • • •

•

Criminal deception, use of false representations to obtain an unjust advantage, or to injure the rights and interests of another Against organisations, governments, and individuals Committed by external parties, internal management, and non-management employees Caused by customers, service providers, and suppliers Prevalent in insurance, credit card, and telecommunications Most common in automobile, travel, and household contents

Cost of Fraud •

Automobile insurance fraud alone – AUD$32 million for nine Australian companies

Investigative Data Mining in Fraud Detection

Fraud Detection Problems - Technical

•

Imperfect data • •

•

Highly skewed data • •

•

Usually not collected for data mining Inaccurate, incomplete, and irrelevant data attributes Many more legitimate than fraudulent examples Higher chances of overfitting

Black-box predictions •

Numerical outputs incomprehensible to people

Investigative Data Mining in Fraud Detection

Fraud Detection Problems - Practical

•

Lack of domain knowledge • •

•

Great variety of fraud scenarios over time • •

•

Important attributes, likely relationships, and known patterns Three types of fraud offenders and their modus operandi Soft fraud – Cost of investigation > Cost of fraud Hard fraud – Circumvents anti-fraud measures

Assessing data mining potential •

Predictive accuracy are useless for skewed data sets

Investigative Data Mining in Fraud Detection

Existing Fraud Detection Methods

Investigative Data Mining in Fraud Detection

Widely Used Methods in Fraud Detection

•

Insurance Fraud • •

•

Credit Card Fraud •

•

Cluster detection -> decision tree induction -> domain knowledge, statistical summaries, and visualisations Special case: neural network classification -> cluster detection Decision tree and naive Bayesian classification -> stacking

Telecommunications Fraud •

Cluster detection -> scores and rules

Investigative Data Mining in Fraud Detection

The Crime Detection Method

Investigative Data Mining in Fraud Detection

Comparisons with Minority Report

•

Precogs • •

•

Integration Mechanisms •

•

Combine predictions

Analytical Machinery • •

•

Foresee and prevent crime Each precog contains multiple classifiers

Record, study, compare, and represent predictions in simple terms Single “computer”

Visual Symbols • •

Explain the final predictions Graphical visualisations, numerical scores, and descriptive rules

Investigative Data Mining in Fraud Detection

The Crime Detection Method Precog P1 = L1(D)

Main Predictions + Predictions Examples and Instances D

Precog P2 = L2(D)

Precog P1 = L1(P1, P2, P3)

Precog P3 = L3(D)

Attribute Selection Main Predictions Final Predictions

Graphs and Scores Analytical Machinery CL = L4(D)

Visual Symbols

Rules

Investigative Data Mining in Fraud Detection

Classifiers as Precogs

•

Precog One: Naive Bayesian Classifiers • • •

•

Precog Two: C4.5 Classifiers • • •

•

Computer metaphor Explain patterns and quite fast Scalability and efficiency issues*

Precog Three: Backpropagation Classifiers • •

•

Statistical paradigm Simple and Fast Redundant and not normally distributed attributes*

Brain metaphor Long training times and extensive parameter tuning*

Advantages and disadvantages *For details on how the problems were tackled, please refer to the thesis

Investigative Data Mining in Fraud Detection

Combining Output as Integration Mechanisms

•

Cross Validation Divides training data into eleven data partitions Each data partition used for training, testing, and evaluation once* Slightly better success rate

• • •

•

Bagging Unweighted majority voting on each example or instance Combine predictions from same algorithm or different algorithms* Increases success rate

• • • 1

2

3

4

5

6

7

8

9

10

11

Main Prediction

fraud

fraud

legal

fraud

legal

fraud

legal

fraud

fraud

legal

fraud

fraud

fraud

fraud

fraud

legal

legal

fraud

legal

legal

legal

fraud

legal

legal

*For details on how the technique works, please refer to the thesis

Investigative Data Mining in Fraud Detection

Combining Output as Integration Mechanisms

•

Stacking • • •

Meta-classifier Base classifiers present predictions to meta-classifier* Determines the most reliable classifiers 1 Partition 1

Partition 2

Partition 3

Naive Bayesian Algorithm

1

C4.5 Algorithm

3 NB Classifi erss

2

3 NB Predictio ns

3 C4.5 Predictio ns

3 C4.5 Classifi erss

Backpropagation Algorithm

3

3 BP Predictio ns

3 BP Classifi erss 3 4

MetaClassifi er

Naive Bayesian Algorithm

4 Combined Training Data

*For details on how the technique works, please refer to the thesis

Investigative Data Mining in Fraud Detection

Combining Output as Integration Mechanisms

•

Stacking (2)

1

2

3 NB Predictio ns

3 NB Classifi erss

3 C4.5 Predictio ns

3 C4.5 Classifi erss

Score Data Set

3 BP Predictio ns

3 BP Classifi erss

Final Predictio n

Investigative Data Mining in Fraud Detection

4

4 MetaClassifi er

Combined Training Data

3

Cluster Detection as Analytical Machinery Visualisation Techniques as Visual Symbols

•

Analytical Machinery: Self Organising Maps • • •

•

Clusters high dimensional elements into more simple, low dimensional maps Automatically groups similar instances together Do not specify an easy-to-understand model*

Visual Symbols: Classification and Clustering Visualisations •

Classification visualisation

•

Clustering visualisation

– confusion matrix - naive Bayesian visualisation - column graph

*For details on how the problems were tackled, please refer to the thesis

Investigative Data Mining in Fraud Detection

Steps in the Crime Detection Method Precog P1 = L1(D)

Main Predictions + Predictions Examples and Instances D

Precog P2 = L2(D)

Precog P1 = L1(P1, P2, P3)

Precog P3 = L3(D)

Attribute Selection Main Predictions Final Predictions

Graphs and Scores Analytical Machinery CL = L4(D)

Visual Symbols

Rules

Investigative Data Mining in Fraud Detection

Implementing the Crime Detection System: Preparation Component

Investigative Data Mining in Fraud Detection

The Crime Detection System Preparation Component Data Understanding

Problem Understanding

Data Preparation Crime Detection Method

Data

Modelling

Deployment

Evaluation

Investigative Data Mining in Fraud Detection

Action Component

The Crime Detection System: Preparation Component

•

Problem Understanding •

•

• •

Determine investigation objectives - Choose - Explain Assess situation - Available tools - Available data set - Cost model* Determine data mining objectives - Max hits/Min false alarms Produce project plan - Time - Tools

*For details, refer to the thesis

Investigative Data Mining in Fraud Detection

The Crime Detection System: Preparation Component

•

Data Understanding •

•

•

Describe data - 11550 examples (1994 and 1995) - 3870 instances (1996) - 33 attributes - 6% fraudulent Explore data - Claim trends by month - Age of vehicles - Age of policy holder Verify data - Good data quality - Duplicate attribute, highly skewed attributes

Investigative Data Mining in Fraud Detection

The Crime Detection System: Preparation Component

•

Data Preparation • •

•

Select data - All, except one attribute, are retained for analysis Clean data - Missing values replaced - Spelling mistakes corrected Format data - All characters converted to lowercase - Underscore symbol

Investigative Data Mining in Fraud Detection

The Crime Detection System: Preparation Component

•

Data Preparation •

Construct data - Derived attributes - weeks_past* - is_holidayweek_claim* - age_price_wsum* - Numerical input - 14 attributes scaled between 0 and 1 - 19 attributes represented by one-of-N or binary encoding*

*For details, refer to the thesis

Investigative Data Mining in Fraud Detection

The Crime Detection System: Preparation Component

•

Data Preparation •

Partition data - Data multiplication or oversampling - For example, 50/50 distribution 1994 and 1995 Training Data 11550 Examples

1996 Score Data 4083 Examples

923 Fraud Examples

10840 Legal Examples

923 legal 923 examples legal 923 examples legal 923 examples legal 923 examples legal 923 examples legal 923 examples legal 923 examples legal 923 examples legal 923 examples legal 923 examples

Legal Examples

Investigative Data Mining in Fraud Detection

Implementing the Crime Detection System: Action Component

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling •

Generate experiment design (1)

Experiment Number

Technique or Algorithm

Data Distribution

I

Naive Bayes

50/50

II

Naive Bayes

40/60

III

Naive Bayes

30/70

IV

Backpropagation

Determined by Experiments I, II, III

V

C4.5

Determined by Experiments I, II, III

VI

Bagging

-

VII

Stacking

-

VIII

Stacking and Bagging

-

IX

Backpropagation

5/95

X

Self Organising Map

5/95

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling •

Generate experiment design (2)

Test A

B

C

D

E

F

G

H

I

J

K

Training Set

Partition 1

2

3

4

5

6

7

8

9

10

11

Testing Set

Partition 2

3

4

5

6

7

8

9

10

11

1

Evaluation Set

Partition 3

4

5

6

7

8

9

10

11

1

2

Evaluating

Success Rate A

B

C

D

E

F

G

H

I

J

K

Average W

Bagging

Predictions A

B

C

D

E

F

G

H

I

J

K

Bagged X

Producing

Classifier 1

2

3

4

5

6

7

8

9

10

11

Scoring Set

Success Rate A

B

C

D

E

F

G

H

I

J

K

Average Y

Bagging Main

Score Predictions A

B

C

D

E

F

G

H

I

J

K

Bagged Z

Investigative Data Mining in Fraud Detection

Overall Success Rate

The Crime Detection System: Action Component

•

Modelling •

Build models (1) - Bagged X outperformed Averaged W - Bagged Z performed marginally better than Averaged Y - Experiment II achieved highest cost savings than I and III - 40/60 distribution most appropriate under the cost model - Experiment V achieved highest cost savings than II and IV - C4.5 algorithm is the best algorithm for the data set

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling •

Build models (2) - Experiment VIII achieved slightly better cost savings than V - Combining models from different algorithms is better than the single algorithm - The top 15 classifiers from stacking consisted of 9 C4.5, 4 backpropagation, and 2 naive Bayesian classifiers*

*For details, refer to the thesis

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling •

Build models (3) - No scores from D2K software - Experiment IX demonstrates sorted scores and predefined thresholds result in focused investigations* - Satisfies Pareto’s Law - Rules did not provide insights - Already in domain knowledge and data attribute exploration* -

Experiment X requires 5 clusters for visualisation* age_of_policyholder weeks_past, is_holidayweek_claim make, accident_area, vehicle_category, age_price_wsum, number_of_cars, base_policy *For details, refer to the thesis

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling Assess models (1) - Training and score data sets too small* - Student’s t-test with k-1 degrees of freedom*

•

Rank

Experiment Number

Technique or Algorithm

Cost Savings

Overall Success Rate

Percentage Saved

1

VIII

Stacking and Bagging

$167,069

60%

29.71%

2

V

C4.5 40/60

$165,242

60%

29.38%

3

VI

Bagging

$127,454

64%

22.66%

4

VII

Stacking

$104,887

70%

18.65%

5

II

Naive Bayes 40/60

$94,734

70%

16.85%

6

IX

Backpropagation 5/95

$89, 232

75%

15.87%

7

IV

Backpropagation 40/60

-$6,488

92%

-1.15%

- McNemar’s hypothesis test*

*For details, refer to the thesis

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling Assess models (2) - Clusters 1, 2, and 3 have higher occurrences of fraud in 1996

•

- Clusters 1, 3, and 5 consist of several makes of inexpensive cars - Utility vehicles, rural areas, and liability policies - Clusters 2 and 4 contain claims submitted many weeks after the “accidents” - Toyota, sport cars, and multiple policies Cluster

Number of instances

Descriptive Cluster Profile

1

215

Cluster 1 contains a large number of 21 to 25 year olds. The insured vehicles are relatively new.

2

166

Cluster 2 also contains a large number of 21 to 25 year olds. The claims are usually reported 10 weeks past the accident. The insured vehicles are usually sport cars.

3

268

Cluster 3 has almost all 16 to 17 year old fraudsters. The insured vehicles are mainly Acuras, Chevrolets, and Hondas. The insured vehicles are usually utility cars.

4

103

Cluster 4 has claims are usually reported 20 weeks past the accident. Almost all insured cars are Toyotas and the fraudster has a high probability of getting 3 to 4 cars insured. Claims are unlikely to be submitted during holiday periods.

5

171

Cluster 5 consists of mainly Fords, Mazdas, and Pontiacs. Higher chances of rural accidents and the base policy type are likely to be liability.

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling •

Assess models (3) - Statistical evaluation of descriptive cluster profiles - Cluster 4 - 3121 Toyota car claims, 6% or 187 fraudulent - 2148 Toyota sedan car claims, expect 6% or 129 to be fraudulent with ±10 standard deviation - Actual 171 fraudulent Toyota sedan car claims, z-score of 3.8 standard deviation - This is an insight because it is statistically reliable, not known previously, and actionable Cluster

Group

Claims

No. and % of Fraud

Sub-Group

Claims

Expected No. of Fraud

Actual No. of Fraud

z-Score

1

All claims

15420

923 (6%)

21 to 25 year olds

108

2

16

5

2

Sport cars

5358

84 (1.6%)

21 to 25 year olds + Sport cars

32

1

10

9.5

3

16 to 17 year olds

320

31 (9.7%)

Honda + 16 to 17 year olds

31

3

31

9.3

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Modelling •

Assess models (4) - Append main predictions from 3 algorithms and final predictions from bagging to 615 fraudulent instances - 25 cannot be detected by any algorithms, highest lift in Clusters 1 and 2 - All can be detected by at least 1 algorithm in Cluster 3 - Not all fraudulent instances can be detected - Domain knowledge, cluster detection, and statistics offer explanation - 101 cannot be detected by 2 algorithms - Weakness of bagging - Other alternatives

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Evaluation Evaluate results - Experiment VIII generate the best predictions with cost savings of about $168, 000. This is almost 30% of total cost savings possible - Most statistically reliable insight is the knowledge of 21 to 25 year olds who drive sport cars Review process - Unsupervised learning to derive clusters first - More training data partitions - More skewed distributions - Cost model too simplistic - Probabilistic Neural Networks •

Investigative Data Mining in Fraud Detection

The Crime Detection System: Action Component

•

Deployment •

•

Plan deployment - Manage geographically distributed databases using distributed data mining - Take time into account Plan monitoring and maintenance - Determined by rate of change in external environment and organisational requirements - Rebuild models when cost savings are below a certain percentage of maximum cost savings possible

Investigative Data Mining in Fraud Detection

Contributions and Recommendations

Investigative Data Mining in Fraud Detection

Contributions

• • • • • • • •

New Crime Detection Method Crime Detection System Cost Model Visualisations Statistics Score-based Feature Extensive Literature Review In-depth Analysis of Algorithms

Investigative Data Mining in Fraud Detection

Recommendations – Technical Problems

•

Imperfect data • • • •

•

Highly skewed data • •

•

Statistical evaluation and confidence intervals Preparation component of crime detection system Derived attributes Cross validation Partitioned data with most appropriate distribution Cost model

Black-box predictions • •

Classification and clustering visualisation Sorted scores and predefined thresholds, rules

Investigative Data Mining in Fraud Detection

Recommendations – Practical Problems

•

Lack of domain knowledge • •

•

Great variety of fraud scenarios over time • • •

•

Action component of crime detection system Extensive literature review SOM Crime detection method Choice of algorithms

Assessing data mining potential • • •

Quality and quantity of data Cost model z-scores

Investigative Data Mining in Fraud Detection

Transforming Minority Report from Science Fiction to Science Fact: INVESTIGATIVE DATA MINING IN FRAUD DETECTION 1 INTRODUCTION • The world is overwhelmed with terabytes of data but there are only few effective and efficient ways to analyse and interpret it. • The purpose of the research is to simulate the Precrime System from the science fiction novel, Minority Report, using data mining methods and techniques, to extract insights from enormous amounts of data to

detect white-collar crime

3 RESULTS ON AUTOMOBILE INSURANCE DATA

4 DISCUSSION

• Through the use of integration mechanisms, the highest cost savings is achieved • The analytical machinery facilitated the interesting discovery of 21 to 25 year old fraudsters who used sport cars as their crime tool

semi-transparent approach

• Integration Mechanisms are needed. As each precog outputs its many predictions for each instance, all are counted and the class with the highest tally is chosen as the main prediction • Figure 1 shows that the main predictions can be combined either by majority count (bagging) or the predictions can be fed back into one of the precogs (stacking), to derive a final prediction

2 THE CRIME DETECTION METHOD

to increase the accuracy of the predictions, without violating competitive and legal requirements • The analytical machinery transforms multidimensional data into two-dimensional clusters which contain similar data to enable the data analyst to easily

differentiate the groups of fraud. It also allows the data analyst to

assess the algorithms’ ability

Preco g P1 = L1(D)

to cope with evolving fraud • The crime detection method provides a Main Predictions + Predictions

Examples and Instances D

Preco g P2 = L2(D)

Precog P1 = L1(P1, P2, P3)

Preco g P3 = L3(D)

flexible step-by-step approach to generating predictions from any three algorithms, and uses some form of integration mechanisms to increase the likelihood of correct final predictions

Attribute Selection Main Predictions Final Predictions

Analytical Machinery CL = L4(D)

Visual Symbols

Graphs and Scores

Rules

Figure 1: Predictions using Precogs, Analytical Machinery, and Visual Symbols • Analytical Machinery, or cluster detection, records, studies, compares, and represents the precogs’ predictions in easily understood terms • The analytical machinery is represented by the Self Organising Map (SOM) which clusters the similar data into groups • Figure 1 demonstrates that main predictions and final predictions are appended to the clustered data to determine the fraud characteristics which cannot be detected, and the most important attributes are selected for visualisation

by using analytical machinery and visual symbols to analyse and interpret the predictions • Precogs can be

shared between organisations

• The application is in uncovering fraudulent claims in automobile insurance • The objectives are to overcome the technical and practical problems of data mining in fraud detection

• Precogs, or precognitive elements, are entities which have the knowledge to predict that something will happen. Figure 1 uses three precogs to foresee and prevent crime by stopping potentially guilty criminals • Each precog contains multiple classification models, or classifiers, trained with one data mining technique to extrapolate the future • The three precogs are different from each other because they are trained by different data mining algorithms. For example, the first, second, and third precog are trained using naive Bayesian, C4.5, and backpropagation algorithms. • The precogs require numerical inputs of past examples to output corresponding predictions for new instances

• Black-box approach from the precogs are transformed into a

• Visual Symbols, or visualisations, integrate human perceptual abilities in the data analysis process by presenting the data in some visual and interactive form • The naive Bayesian and C4.5 visualisations facilitate analysis of classifier predictions and performance, and column graphs aid the interpretation of clustering results

• Scores are numbers with a specified range, which indicates the relative risk that a particular data instance maybe fraudulent, to rank instances • Rules are expressions in the form of Body → Head, where Body describes the conditions under which the rule is generated and Head is the class label

5 CONCLUSION • Other possible applications of this crime detection method are: -Anti-terrorism -Burglary -Customs declaration fraud -Drug-related homocides -Drug smuggling -Government financial transactions -Sexual offences

REFERENCES Dick P K (1956) Minority Report, Orion Publishing Group, London, Great Britain.

Done by Clifton Phua for Honours 2003 Supervised by Dr. Damminda Alahakoon

Questions?

Investigative Data Mining in Fraud Detection

Investigative Data Mining in Fraud Detection

A Comprehensive Survey of Data Mining-based Fraud Detection - arXiv

FRAUD DETECTION

Data Mining in Resilient Identity Crime Detection ...

Strategic Fraud Detection

A Survey on Brain Tumour Detection Using Data Mining Algorithm