Using R to Predict Financial Scores with Big Data Technology in Emerging Markets
Executive Summary • • • •
Cignifi develops risk and marke2ng scores for emerging consumers with mobile phone data. Cignifi heavily uses R to develop its models and connect to AWS databases. Cignifi partnered with Interna2onal Finance Corpora2on (IFC) to provide big data analy2cs for customer profiling using Call Detail Records (CDR). By using R, Cignifi was able to unearth mobile money insights through modeling, social network analysis, and geo-‐loca2onal mapping.
2
Agenda 1. Introduc+on to Cignifi 2. The Cignifi Technical Environment 3. Use Case: Mobile Money in Uganda
3
Cignifi Overview What We Do: •
Yield rich and accurate behavioral insights from Call Detail Records (CDR) – the world’s most ubiquitous digital footprint – Credit score – Marke2ng propensity score
How We Do It: • •
The first proprietary data pla>orm to track thin-‐file consumer behavior and dynamically interpret billions of granular records Deep credit analy2cs exper2se behind customizable behavioral modeling
Who’s Buying: •
Mobile operators, financial ins2tu2ons, insurers, and retailers
CDR CRM Mobile Phone Payments
Robust Analy2cs Engine
Marke2ng Propensity Score
Best Time/ Channel to Contact Customer
Credit Risk Score
4
IFC Overview •
• •
•
Interna2onal Finance Corpora2on (IFC) is one of the four member organiza2on of the World Bank Group (WBG). Through investment and advisory services, IFC contributes to Private Sector Development, globally and to achieving WBG’s twin goals: ending extreme poverty and boos2ng shared economic prosperity. Access to Finance and formal financial services are powerful components to posi2vely impact people’s lives: crea2ng opportuni2es for small businesses to grow and for individuals to transact, save, invest, and to make produc2ve economic choices and plans. At the intersec2on of Big Data and Access to Finance, this case study illustrates the use of data science to advance key development strategies that can help to improve people’s lives and create value for IFC’s clients by increasing usage of Digital Financial Services and Mobile Money. Data Science for Development is a burgeoning field with rich opportuni2es to apply cuYng edge skills and technology to problems that maZer and find solu2ons that bring meaningful, posi2ve changes to poor and underserved segments of society in developing countries. Learn More
ü Big Data in Ac2on for Development -‐ hZp://data.worldbank.org ü UN Sustainable Development Goals #8: beZer livelihoods and employment through access to financial services -‐ hZp://www.un.org/sustainabledevelopment/
5
Agenda 1. Introduc2on to Cignifi 2. The Cignifi Technical Environment 3. Use Case: Mobile Money in Uganda
6
Cignifi Architecture Cignifi Pla>orm – Big Data Analy+cal Farm API Services (AZached/Detached) Manage Analy2c Requests
Financial Data Campaign Responses Ac2va2ons/Defaults
Mobile Operator A
Mobile Operator B
Mobile Operator C
On-‐Demand Servers
On-‐Demand Servers
On-‐Demand Servers
File Server
Database
File Server
Database
File Server
Database
Cignifi Pla>orm Portal
Web Server
Dashboard
Data Processing and Modeling 7 7
Data Processing Environment 1
Uploaded Files
2
Normaliza2on
3
Aggrega2on
4
Generated Scores
2
4
3
1
8
Technology Stack Models S3 Glacier
Storage Durable web service for scalable object storage Storage service for data archiving & long-‐term backup
Deployment
Processing/Analysis Resizable compute capability in the cloud EC2 Elas2c MapReduce for Hadoop-‐based processing EMR Petabyte-‐scale data warehouse solu2on for large-‐scale data analysis RedshiQ
Web Framework
9
R and Big Data AWS Cloud:
Resizable compute capability in the cloud EC2
Petabyte-‐scale warehouse solu2on RedshiQ
ü Modeling ü Social Network Analysis
10
R Libraries ² RedshiQ Connect to Redshii database ² Glmnet Logis2c regression algorithm ² Caret Machine learning library
² Ggplot2 PloYng library ² Igraph and ggmap Social network analysis & maps
Modeling (glmnet) Penalized Logis2c Regression
λ: Regulariza2on parameter α: Elas2c-‐net mixing parameters, α=0 (Ridge), α=1 (Lasso)
Social Network Analysis ü Genera2ng network (Igraph) ü PloYng networks and maps (ggmap and ggplot2) 11
Agenda 1. Introduc2on to Cignifi 2. The Cignifi Technical Environment 3. Use Case: Mobile Money in Uganda
12
Business Background • • •
Airtel Uganda, a leading mobile network operator, wants to drive the adop2on of their mobile money product (Airtel Money). Cignifi partnered with Interna2onal Finance Corpora2on (IFC) to provide big data analy2cs for customer profiling using Call Detail Records (CDR). The Bill & Melinda Gates Founda2on provided funding.
Goals
1. Iden2fy ac2ve mobile money users & understand associated characteris2cs that are 2ed to GSM profiles 2. Understand mobile money flow dynamics through social network analysis and geo-‐loca2onal mapping.
13
The Technical Approach Understand Characteris+cs of Ac+ve Mobile Money Users 1. 2. 3. 4. 5.
Target Variable Defini2on CDR and Mobile Money Data Cleaning & Processing Loca2on & Opera2onal System Data Processing Predic2ve Modeling with GLM & Data Mining Methodology Lead Genera2on & Results Profiling
✚
Social Network Analysis & Geo-‐ Loca+onal Mapping (GLM) 1. 2. 3.
Study Scope Defini2on Social Network Data Processing Geo-‐loca2on Mapping & Clustering
= A thorough solu2on with the understanding of individual user behavior and operator’s opera2on mechanics.
NOTE: ALL DATA HAS BEEN MASKED.
14
Data Summary Call Data Records (CDR) Voice Calls SMS Internet • Counts • Counts • Dura2on • Dura2on • Consistency • Number of access • Consistency • Time of day • Geo-‐loca2on Account Informa+on • Account age • Billing loca2on • Account vintage • Payment delays • Payment method
Valida2on
Recharge Data • Timestamp • Recharge amount • Source • Balance
Cleaning
Other • Counterparts (social network) • Interna2onal • In/off net Target Variable
• Payment default • Offer acceptance • Contact rate • Product ac2va2on …and more.
Normaliza2on
Aggrega2on
15
Mobile Money Model Results
Density
30 Day Ac3va3on for Cash-‐In Model
Density Distribu+on for Predicted Probability of Ac+va+on Variable Importance for Cash-‐In Model
NOTE: ALL DATA HAS BEEN MASKED.
16
Segmentation By Main Variables
Cash-‐In Model
Total duration of Outgoing voice calls b|w 7pm and 8am cash in 30D active rate 0.67
Total call revenue cash in 30D active rate 0.75
Activity Index
●
0.65
● ●
●
●
0.70 ● ● ●
0.65
● ●
●
●
0.63
Activity Index
●
0.66
0.64
●
●
● ●
10%
20%
30%
40%
●
0.60 50%
60%
70%
80%
90%
100%
10%
Sum of recharge during 6pm to midnight cash in 30D active rate
20%
30%
40%
50%
60%
70%
80%
90%
100%
Voice duration entropy cash in 30D active rate ●
0.65
●
●
●
0.70
●
Activity Index
Activity Index
●
●
0.68 ●
0.66 ●
●
●
●
●
0.63 ●
0.62
●
0.64
●
0.64
●
●
10%
20%
●
●
●
30%
40%
50%
0.61 60%
70%
80%
90%
100%
●
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
The more that customers use their mobile phones, the more likely they are to cash in money.
NOTE: ALL DATA HAS BEEN MASKED.
17
Mobile Money Model Results
Density
30 Day Ac3va3on for Cash-‐Out Model
Variable Importance for Cash-‐Out Model
NOTE: ALL DATA HAS BEEN MASKED.
Density Distribu+on for Predicted Probability of Ac+va+on
18
Segmentation By Main Variables
Cash-‐Out Model Total duration of Outgoing voice calls b|w 7pm and 8am cash out 30D active rate
Total call revenue cash out 30D active rate ●
Activity Index
0.68
●
0.67
●
0.66
●
●
●
Activity Index
●
0.70 ● ● ● ●
0.65
0.65 ●
0.64 10%
●
●
20%
30%
●
40%
50%
●
0.60 50%
●
●
●
40%
60%
70%
80%
90%
100%
10%
Sum of recharge during 6pm to midnight cash out 30D active rate
20%
30%
60%
70%
80%
●
0.675
Activity Index
Activity Index
●
●
●
0.66 ●
●
●
●
● ● ● ●
0.650 ● ●
0.625
●
●
●
●
0.62 10%
20%
30%
100%
●
0.70 0.68
90%
Voice duration entropy cash out 30D active rate ●
0.64
●
40%
50%
60%
70%
80%
90%
100%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
The more that customers use their mobile phones, the more likely they are to cash out money.
NOTE: ALL DATA HAS BEEN MASKED.
19
Cluster Analysis Money Sent Out By Source Number
Money Received
Customers in Kamapala, Masinidi, and Gulu clusters sent more money than they received. This is correlated by the fact that a majority of customers live in these areas.
NOTE: ALL DATA HAS BEEN MASKED.
20
Cluster Analysis Cluster Centers & Transac+on Amounts
I
F
E
B
H
D
A
G
C
City Clusters for P2P Transac+ons
In order to measure money transfer clearly, Cignifi created nine geo-‐loca2on clusters and analyzed the flow between them.
NOTE: ALL DATA HAS BEEN MASKED.
21
Cluster Analysis Aggregated Counts for P2P Money Received Transac2ons
Transac2on Matrix Between Clusters
Cluster
Within Cluster
Outgoing
Incoming
Net (Out-‐In)
A 0.0881 0.0194 0.0067 0.1270 0.0037 0.0063 0.0097 0.0017 0.0024
A
0.0881
0.1770
0.1999
-‐0.0230
B 0.0133 0.0569 0.0016 0.0371 0.0006 0.0009 0.0064 0.0009 0.0010
B
0.0569
0.0619
0.0971
-‐0.0353
C
0.0630
0.0511
0.0746
-‐0.0234
D
0.9603
0.4001
0.3279
0.0723
E
0.0169
0.0283
0.0370
-‐0.0087
F
0.0217
0.0294
0.0424
-‐0.0130
G
0.0297
0.0594
0.0620
-‐0.0024
0.0044 0.0014 0.0009 0.0197 0.0006 0.0217 0.0006 0.0013 0.0004
H
0.0059
0.0220
0.0177
0.0043
G 0.0100 0.0080 0.0011 0.0374 0.0007 0.0006 0.0297 0.0004 0.0010
I
0.0137
0.0539
0.0246
0.0293
A
B
C
D
E
F
G
H
I
C 0.0063 0.0020 0.0630 0.0379 0.0014 0.0019 0.0009 0.0001 0.0006 D 0.1491 0.0601 0.0613 0.9603 0.0283 0.0310 0.0401 0.0116 0.0186 E 0.0041 0.0013 0.0013 0.0203 0.0169 0.0004 0.0004 0.0003 0.0003 F
H 0.0031 0.0013 0.0004 0.0146 0.0006 0.0006 0.0009 0.0059 0.0006 I
0.0093 0.0034 0.0013 0.0337 0.0011 0.0009 0.0029 0.0013 0.0137
In order to measure money transfer clearly, Cignifi created nine geo-‐loca2on clusters and analyzed the flow between them. The cluster centroids are listed in the tables.
NOTE: ALL DATA HAS BEEN MASKED.
22
Conclusion •
•
The more oien that mobile phone subscribers use their phone, the more likely they are to adopt the Airtel Money program. • This applies for revenue genera2ng & non-‐revenue genera2ng ac2vi2es. Social network and geo-‐loca2onal analysis provides insights about target markets and spa2al trends with money transfer.
23
Nicolais Guevara, Senior Data Scientist
[email protected]
Cambridge, USA | São Paulo | Mexico City | Manila