Using R to Predict Financial Scores with Big Data Technology in Emerging Markets

Executive Summary •  •  •  • 

Cignifi  develops  risk  and  marke2ng  scores  for  emerging  consumers   with  mobile  phone  data.   Cignifi  heavily  uses  R  to  develop  its  models  and  connect  to  AWS   databases.   Cignifi  partnered  with  Interna2onal  Finance  Corpora2on  (IFC)  to   provide  big  data  analy2cs  for  customer  profiling  using  Call  Detail   Records  (CDR).   By  using  R,  Cignifi  was  able  to  unearth  mobile  money  insights  through   modeling,  social  network  analysis,  and  geo-­‐loca2onal  mapping.  

2  

Agenda 1.  Introduc+on  to  Cignifi   2.  The  Cignifi  Technical  Environment   3.  Use  Case:  Mobile  Money  in  Uganda  

3  

Cignifi Overview What  We  Do:   • 

Yield  rich  and  accurate  behavioral  insights   from  Call  Detail  Records  (CDR)  –  the  world’s   most  ubiquitous  digital  footprint   –  Credit  score   –  Marke2ng  propensity  score  

  How  We  Do  It:   •  • 

The  first  proprietary  data  pla>orm  to  track   thin-­‐file  consumer  behavior  and  dynamically   interpret  billions  of  granular  records     Deep  credit  analy2cs  exper2se  behind   customizable  behavioral  modeling  

  Who’s  Buying:   • 

Mobile  operators,  financial  ins2tu2ons,   insurers,  and  retailers  

CDR   CRM   Mobile   Phone   Payments  

Robust  Analy2cs  Engine  

Marke2ng   Propensity   Score  

Best  Time/ Channel  to   Contact   Customer  

Credit  Risk   Score  

4  

IFC Overview • 

•  • 

• 

Interna2onal  Finance  Corpora2on  (IFC)  is  one  of  the  four  member  organiza2on  of  the   World  Bank  Group  (WBG).    Through  investment  and  advisory  services,  IFC  contributes  to   Private  Sector  Development,  globally  and  to  achieving  WBG’s  twin  goals:  ending  extreme   poverty  and  boos2ng  shared  economic  prosperity.   Access  to  Finance  and  formal  financial  services  are  powerful  components  to  posi2vely   impact  people’s  lives:  crea2ng  opportuni2es  for  small  businesses  to  grow  and  for   individuals  to  transact,  save,  invest,  and  to  make  produc2ve  economic  choices  and  plans.     At  the  intersec2on  of  Big  Data  and  Access  to  Finance,  this  case  study  illustrates  the  use   of  data  science  to  advance  key  development  strategies  that  can  help  to  improve   people’s  lives  and  create  value  for  IFC’s  clients  by  increasing  usage  of  Digital  Financial   Services  and  Mobile  Money.   Data  Science  for  Development  is  a  burgeoning  field  with  rich  opportuni2es  to  apply   cuYng  edge  skills  and  technology  to  problems  that  maZer  and  find  solu2ons  that  bring   meaningful,  posi2ve  changes  to  poor  and  underserved  segments  of  society  in  developing   countries.   Learn  More      

ü  Big  Data  in  Ac2on  for  Development  -­‐  hZp://data.worldbank.org   ü  UN  Sustainable  Development  Goals  #8:  beZer  livelihoods  and  employment  through  access  to  financial  services  -­‐   hZp://www.un.org/sustainabledevelopment/  

5  

Agenda 1.  Introduc2on  to  Cignifi   2.  The  Cignifi  Technical  Environment   3.  Use  Case:  Mobile  Money  in  Uganda  

6  

Cignifi Architecture Cignifi  Pla>orm  –  Big  Data  Analy+cal  Farm   API  Services  (AZached/Detached)   Manage  Analy2c  Requests    

Financial  Data   Campaign  Responses   Ac2va2ons/Defaults  

Mobile  Operator  A  

Mobile  Operator  B  

Mobile  Operator  C  

On-­‐Demand  Servers  

On-­‐Demand  Servers  

On-­‐Demand  Servers  

File  Server  

Database  

File  Server  

Database  

File  Server  

Database  

Cignifi  Pla>orm  Portal  

Web  Server  

Dashboard  

Data  Processing  and  Modeling   7   7  

Data Processing Environment 1  

Uploaded  Files  

2  

Normaliza2on  

3  

Aggrega2on  

4  

Generated  Scores  

2  

4  

3  

1  

8  

Technology Stack     Models           S3        Glacier                  

Storage   Durable  web  service  for   scalable  object  storage   Storage  service  for  data   archiving  &  long-­‐term  backup  

Deployment  

  Processing/Analysis       Resizable  compute  capability     in  the  cloud     EC2       Elas2c  MapReduce  for     Hadoop-­‐based  processing     EMR       Petabyte-­‐scale  data  warehouse     solu2on  for  large-­‐scale  data     analysis   RedshiQ        

Web  Framework  

9  

R and Big Data AWS  Cloud:                              

Resizable  compute   capability  in  the  cloud   EC2  

Petabyte-­‐scale   warehouse   solu2on   RedshiQ  

ü  Modeling   ü  Social  Network  Analysis  

10  

R Libraries ²  RedshiQ              Connect  to  Redshii  database   ²  Glmnet              Logis2c  regression  algorithm   ²  Caret                Machine  learning  library    

²  Ggplot2              PloYng  library   ²  Igraph  and  ggmap              Social  network  analysis  &  maps  

Modeling  (glmnet)   Penalized  Logis2c  Regression  

λ:  Regulariza2on  parameter   α:  Elas2c-­‐net  mixing  parameters,  α=0  (Ridge),  α=1  (Lasso)  

Social  Network   Analysis   ü  Genera2ng  network   (Igraph)   ü  PloYng  networks  and   maps  (ggmap  and   ggplot2)     11  

Agenda 1.  Introduc2on  to  Cignifi   2.  The  Cignifi  Technical  Environment   3.  Use  Case:  Mobile  Money  in  Uganda  

12  

Business Background •  •  • 

Airtel  Uganda,  a  leading  mobile  network  operator,  wants  to  drive  the   adop2on  of  their  mobile  money  product  (Airtel  Money).   Cignifi  partnered  with  Interna2onal  Finance  Corpora2on  (IFC)  to  provide  big   data  analy2cs  for  customer  profiling  using  Call  Detail  Records  (CDR).   The  Bill  &  Melinda  Gates  Founda2on  provided  funding.  

Goals  

1.  Iden2fy  ac2ve  mobile  money  users  &  understand  associated   characteris2cs  that  are  2ed  to  GSM  profiles   2.  Understand  mobile  money  flow  dynamics  through  social  network  analysis   and  geo-­‐loca2onal  mapping.  

13  

The Technical Approach Understand  Characteris+cs  of  Ac+ve   Mobile  Money  Users   1.  2.  3.  4.  5. 

  Target  Variable  Defini2on   CDR  and  Mobile  Money  Data   Cleaning  &  Processing   Loca2on  &  Opera2onal  System  Data   Processing   Predic2ve  Modeling  with  GLM  &   Data  Mining  Methodology   Lead  Genera2on  &  Results  Profiling  

✚  

Social  Network  Analysis  &  Geo-­‐ Loca+onal  Mapping  (GLM)   1.  2.  3. 

  Study  Scope  Defini2on   Social  Network  Data  Processing   Geo-­‐loca2on  Mapping  &  Clustering  

=   A  thorough  solu2on  with  the  understanding  of  individual  user  behavior    and  operator’s   opera2on  mechanics.  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

14  

Data Summary Call  Data  Records  (CDR)   Voice  Calls   SMS   Internet   • Counts   • Counts   • Dura2on   • Dura2on   • Consistency   • Number  of   access   • Consistency       • Time  of  day       • Geo-­‐loca2on     Account   Informa+on   • Account  age   • Billing  loca2on   • Account  vintage   • Payment  delays   • Payment   method  

Valida2on  

Recharge  Data   • Timestamp   • Recharge   amount   • Source   • Balance  

Cleaning  

Other   • Counterparts   (social   network)   • Interna2onal   • In/off  net   Target  Variable  

• Payment  default   • Offer  acceptance   • Contact  rate   • Product   ac2va2on   …and  more.  

Normaliza2on  

Aggrega2on  

15  

Mobile Money Model Results

Density  

30  Day  Ac3va3on  for  Cash-­‐In  Model  

Density  Distribu+on  for  Predicted   Probability  of  Ac+va+on   Variable  Importance  for  Cash-­‐In  Model  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

16  

Segmentation By Main Variables

Cash-­‐In  Model  

Total duration of Outgoing voice calls b|w 7pm and 8am cash in 30D active rate 0.67

Total call revenue cash in 30D active rate 0.75

Activity Index



0.65

● ●





0.70 ● ● ●

0.65

● ●





0.63

Activity Index



0.66

0.64





● ●

10%

20%

30%

40%



0.60 50%

60%

70%

80%

90%

100%

10%

Sum of recharge during 6pm to midnight cash in 30D active rate

20%

30%

40%

50%

60%

70%

80%

90%

100%

Voice duration entropy cash in 30D active rate ●

0.65







0.70



Activity Index

Activity Index





0.68 ●

0.66 ●









0.63 ●

0.62



0.64



0.64





10%

20%







30%

40%

50%

0.61 60%

70%

80%

90%

100%



10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

The  more  that  customers  use  their  mobile  phones,  the  more  likely  they  are  to  cash  in  money.  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

17  

Mobile Money Model Results

Density  

30  Day  Ac3va3on  for  Cash-­‐Out  Model  

Variable  Importance  for  Cash-­‐Out  Model  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

Density  Distribu+on  for  Predicted   Probability  of  Ac+va+on  

18  

Segmentation By Main Variables

Cash-­‐Out  Model   Total duration of Outgoing voice calls b|w 7pm and 8am cash out 30D active rate

Total call revenue cash out 30D active rate ●

Activity Index

0.68



0.67



0.66







Activity Index



0.70 ● ● ● ●

0.65

0.65 ●

0.64 10%





20%

30%



40%

50%



0.60 50%







40%

60%

70%

80%

90%

100%

10%

Sum of recharge during 6pm to midnight cash out 30D active rate

20%

30%

60%

70%

80%



0.675

Activity Index

Activity Index







0.66 ●







● ● ● ●

0.650 ● ●

0.625









0.62 10%

20%

30%

100%



0.70 0.68

90%

Voice duration entropy cash out 30D active rate ●

0.64



40%

50%

60%

70%

80%

90%

100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

The  more  that  customers  use  their  mobile  phones,  the  more  likely  they  are  to  cash  out  money.  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

19  

Cluster Analysis Money  Sent  Out  By  Source  Number  

Money  Received  

Customers  in  Kamapala,  Masinidi,  and  Gulu  clusters  sent  more  money  than  they  received.  This   is  correlated  by  the  fact  that  a  majority  of  customers  live  in  these  areas.  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

20  

Cluster Analysis Cluster  Centers  &  Transac+on  Amounts  

I  

F  

E

B

H

D

A

G

C

City  Clusters  for  P2P   Transac+ons  

In  order  to  measure  money  transfer  clearly,  Cignifi  created  nine  geo-­‐loca2on  clusters  and   analyzed  the  flow  between  them.  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

21  

Cluster Analysis Aggregated  Counts  for  P2P  Money  Received  Transac2ons  

Transac2on  Matrix  Between  Clusters  

Cluster  

Within   Cluster  

Outgoing  

Incoming  

Net     (Out-­‐In)  

A   0.0881  0.0194  0.0067  0.1270  0.0037  0.0063  0.0097  0.0017  0.0024  

A  

0.0881  

0.1770  

0.1999  

-­‐0.0230  

B   0.0133  0.0569  0.0016  0.0371  0.0006  0.0009  0.0064  0.0009  0.0010  

B  

0.0569  

0.0619  

0.0971  

-­‐0.0353  

C  

0.0630  

0.0511  

0.0746  

-­‐0.0234  

D  

0.9603  

0.4001  

0.3279  

0.0723  

E  

0.0169  

0.0283  

0.0370  

-­‐0.0087  

F  

0.0217  

0.0294  

0.0424  

-­‐0.0130  

G  

0.0297  

0.0594  

0.0620  

-­‐0.0024  

0.0044  0.0014  0.0009  0.0197  0.0006  0.0217  0.0006  0.0013  0.0004  

H  

0.0059  

0.0220  

0.0177  

0.0043  

G   0.0100  0.0080  0.0011  0.0374  0.0007  0.0006  0.0297  0.0004  0.0010  

I  

0.0137  

0.0539  

0.0246  

0.0293  

A  

B  

C  

D  

E  

F  

G  

H  

I  

C   0.0063  0.0020  0.0630  0.0379  0.0014  0.0019  0.0009  0.0001  0.0006   D   0.1491  0.0601  0.0613  0.9603  0.0283  0.0310  0.0401  0.0116  0.0186   E   0.0041  0.0013  0.0013  0.0203  0.0169  0.0004  0.0004  0.0003  0.0003   F  

H   0.0031  0.0013  0.0004  0.0146  0.0006  0.0006  0.0009  0.0059  0.0006   I  

0.0093  0.0034  0.0013  0.0337  0.0011  0.0009  0.0029  0.0013  0.0137  

In  order  to  measure  money  transfer  clearly,  Cignifi  created  nine  geo-­‐loca2on  clusters  and   analyzed  the  flow  between  them.  The  cluster  centroids  are  listed  in  the  tables.  

NOTE:  ALL  DATA  HAS  BEEN  MASKED.  

22  

Conclusion • 

• 

The  more  oien  that  mobile  phone  subscribers  use  their  phone,   the  more  likely  they  are  to  adopt  the  Airtel  Money  program.   •  This  applies  for  revenue  genera2ng  &  non-­‐revenue   genera2ng  ac2vi2es.   Social  network  and  geo-­‐loca2onal  analysis  provides  insights   about  target  markets  and  spa2al  trends  with  money  transfer.  

23  

Nicolais Guevara, Senior Data Scientist [email protected]

Cambridge,  USA  |    São  Paulo  |  Mexico  City  |  Manila  

EARL Conference Deck 10292015.pdf

thin-file consumer behavior and dynamically. interpret billions of granular records. • Deep credit analy2cs exper2se behind. customizable behavioral modeling. Who's Buying: • Mobile operators, financial ins2tu2ons,. insurers, and retailers. Mobile. Phone. Payments. CDR. CRM. Robust Analy2cs Engine. Credit Risk. Score.

3MB Sizes 8 Downloads 123 Views

Recommend Documents

Earl Foster.pdf
Page 1 of 3. Earl Foster, Secretary. Earl Foster has been a BCISD Board member since May of 2016. Mr. Foster is the General. Manager for the Lakeway Municipal Utility District. Mr. Foster has been married for over 22 years to Kaci White-Foster. They

BLOG_Resolve-Earl Particulars.pdf
increasing or decreasing lengths relative to Wire or Rope Diameter. Pennant wire reel: 2 x 1000 m 77 mm wire each. Shark jaws: 2 x Triplex, SWL 300/350 t.

Earl Warren Math
We will stay away from extra drill practice and explore the math topics through inquiries and discussions. Although there is no guarantee on success in.

Earl Sweatshirt Earl.pdf
Page 1. Whoops! There was a problem loading more pages. Earl Sweatshirt Earl.pdf. Earl Sweatshirt Earl.pdf. Open. Extract. Open with. Sign In. Main menu.

Weather Deck
Reporters: Bill Giers, FSO-PA. Jim Roche, VFC. Photographers: Bill Giers,. Rick Bloom. The Weather Deck: DHS/USCG-AUX,. District 7, Division 17, Flotilla 6,. P.O. Box 540867, Merritt Island,. FL. 32954, (877) 835-3760. This publication is intended fo

Pitch Deck Template - Playbooks
Sequoia Capital. Pitch Deck Template. Reproduced by PitchDeckCoach from info presented at http://www.sequoiacap.com/grove/posts/6bzx/writing-a-business- ...

Pitch deck SwissBorg.pdf
Page 1 of 17. The New Era of Swiss Private Banking with Smart contracts. Page 1 of 17 ... Investment Process 2/2. 2 Investment Mandate and Token Fund choice. Return. Risk. 3 Live Reporting. Page 5 of 17. Pitch deck SwissBorg.pdf. Pitch deck SwissBorg

Building the Reality Deck - POWERWALL
Apr 27, 2013 - Abstract. We have constructed a gigapixel resolution display that offers a full 360◦ horizontal field-of-view. This system, called the Reality Deck, ...

Maritim conference rates - Conference Hotel Group
including technical support. • Data projector (determined ... Business premises of M Hotelgesellschaft mbH · Herforder Strasse 2 · 32105 Bad Salzuflen · Germany.

SAFA Regional CFO Conference 2017 Conference Theme
Jan 27, 2017 - others, the first SAFA Quiz & Elocution Contest, SAFA Best Presented ... The theme of the Conference is Navigating through Digital.

Deck Repair Belfast.pdf
flooring belfast. flooring installer belfast. flooring repair belfast. water damage repair belfast. fire damage repair belfast. Page 3 of 4. Deck Repair Belfast.pdf.

this is earl nightingale pdf
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. this is earl nightingale pdf. this is ear

this is earl nightingale pdf
There was a problem previewing this document. Retrying... Download. Connect more apps ... this is earl nightingale pdf. this is earl nightingale pdf. Open. Extract.

pdf-0888\earl-k-long-the-saga-of-uncle-earl-and ...
... apps below to open or edit this item. pdf-0888\earl-k-long-the-saga-of-uncle-earl-and-louisi ... ography-series-by-michael-l-kurtz-morgan-d-peoples.pdf.

national conference - national library conference - NMIMS
Challenges of the electronic era, our educational institute NMIMS in association with ... To bring together Academic and Public Library and Information.

My name is earl - s01e01pt
Gamefor max.Booty talk talk.My nameisearl- ... Assorted magazines bundle[group 1] november 26 2015 (true pdf).Download My nameisearl- s01e01pt -.

Deck Repair Belfast.pdf
flooring belfast. flooring installer belfast. flooring repair belfast. water damage repair belfast. fire damage repair belfast. Page 3 of 4. Deck Repair Belfast.pdf.

deck the halls.PDF
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. deck the halls.

Building the Reality Deck - POWERWALL
Apr 27, 2013 - We also looked into customizations that could be ... Monitor Testing, Customization ... a 6 monitor area can be covered by a single application.

Nottingham Deck Fire.pdf
Apr 1, 2018 - Page. 1. /. 1. Loading… ... Engine 7, Engine 5, Engine 12 and Battalion 7 responded, and were assisted by Tower 3 from Vail Fire and Emergency. Services, the ... Modified. Created. Opened by me. Sharing. Description. Download Permissi

CONFERENCE PROGRAMME
Mar 21, 2016 - Faculty of Economics and Business. Working ... The Online Dispute Resolution as Contribution ... „Cloud computing" opportunities and.