Contents
IMPROVE DATA QUALITY A COMMERCIAL BANK CASE STUDY
1 2 3 4 5 6
Data is the new oil Data Governance framework Data Stewardship Management Data Quality Management A case study in a commercial bank Data quality issues to consider
15th JUN, 2017 PRESENTED BY DUC NGUYEN
Data is the new oil
What is Data Governance?
500 million Tweets sent
Data Governance is a system of decision rights and accountabilities for information-related processes,
More than 4 million hours of content uploaded to YouTube
executed according to agreed-upon models which describe who can take what actions with what
4.3 billion Facebook messages posted 3.6 billion Instagram Likes 6 billion Google searches
information, and when, under what circumstances, using what methods*
Who owns data in your organization?
205 billion emails are sent
Who is responsible for solving day to day data issues in the organization?
About 2.5 billion Gigabytes data is produced everyday
Is your data secure? Who will be responsible in case of a security breach? How good is the quality of your data? Who has the responsibility for improving it? Everyone in your organization understands the same meaning of data? Who decides how to capture & structure the data, where to store the data, how long to retain the data?
Source: Team Gwava, IBM
* Datagovernance.com
Data Governance Framework Creation
Data Stewardship Management
Conversion
Conservation
Business Units
Consumption
DG Streams
Data Owners
Finance
Data Ownership & Data Stewardship Management
Data related issues from each division
Data Quality Management Data Security & Privacy Management • Data Capture Policy • Reference Data Management • Master Data Management
• Data Extraction Rules • Data Transformation Rules • Data Loading Rules
Operations
• Data Architecture • Metadata Management • Data Retention Policy
Data Governance Management Team
Data steward
• Data Distribution Policy • Business Glossary Management • DG Compliance Reporting
Data inputters
Risk
People
IT
Data Council or Data Committee
Data stewards from each division to be identified and trained
Chief Data Officer, Head of Data Governance
Detailed DG workflows for data related issues to be developed and applied
DGO: Data Owners, Data Stewards, Data Ownership & Policies Manager, Data Quality Manager, Data Security Manager, Data Administrators
KPIs for data stewards to be finalized
Data related issues to be tracked in the log file
Data Quality Management (1)
Data Quality Management (2) 1
Framework Profile and Validate
Data Quality Policy & Guidance drafted CDEs selected for DQ application
4
2
6 DQ metrics identified to score data* 6-Sigma principle applied**
Manage and Govern
Prioritize Six DQ metrics*
3
Completeness Validity Accuracy
Cleanse and Persist
Uniqueness Consistency Timeliness
Six Sigma Principle**
Bank’s Vision
Bank’s Data Governance Journey
To become one of 5 leading banks and one of 3 leading retail banks in Vietnam by 2017
Data Committee established Sep 2013
Mar 2014
May 2014
Data Governance Unit established
Overview of data quality Business issues
System issues
No validation rule in system
and input data
No synchronize between systems
Government issues
information
No system to manage address
importance of data and how to input
No policy, guidance on data governance
No data quality KPI for inputter
No mechanism to cross check or audit the data
Data issues from Dept. in BU/SU
No system to manage personal
No schedule to training staff on the
Apr 2015
May 2015
Issued DG policies, frameworks
Extended Data Stewardship model to all divisions Jul 2015
Jan 2016
Measure quality of customer information
Measured Critical Data Elements (CDE) quality Mar 2016
Jun 2016
Improved customer information quality
Apply Data stewardship management (1)
Unclear process and guidance to collect
corrected data
Applied Data Stewardship model for 4 divisions
Implemented DG project
Meeting
DG Team Analyze & break data issues down into data field, table and system level
Raise data issues
DG Team
Data Stewards
Other related parties
Meeting with related parties
or data quality Data Steward Collects, consolidate & analyze data issues
IT
DG Team Escalate to Data Committee if related parties cannot agree on the solutions
- Discuss & agree on the solutions to data issues including: • Solutions • Main party in charge • Timeline • Supporting parties
DG to update with Data Stewards DG to work with related parties
DG to update the data issue tracking log
Data Committee Discuss & agree the solutions and instructions
Execute solutions to fix data issues
Data Governance – Data quality Apply Data quality management (1)
Apply Data stewardship management (2) O ve rvi e w of data i s s ue re m e di ati on Fi ni s he d i s s ue s
Data i s s ue s by di vi s i on
Ong oi ng i s s ue s Und e r Disc ussion
1
N o of i s s ue s
2
CDE
13
Table
DQ SCORES Sam ple s ize (No of DQ Sc ore c us tom e rs / ac c ounts / c ontrac ts as of 2015-12-31 …) (123 CDEs ) as of 2016-10-06
No of CDEs : 147 Misse d De a d line > 3Ws
0
Misse d De a d line 2Ws-3Ws
0
Pe nd ing > 3Ws
1
%
13%
Ops
0
Re ta il
Fin a n c e
Risk
Data i s s ue s by s tatus
Pe nd ing 1W-2Ws Misse d De a d line 1W-2Ws
0 SME
1
Misse d De a d line < = 1W
0
O ps
Pe nd ing < = 1W
DQ Sc ore 20160101 20161006 (147 CDEs )
3. 61
3. 72
CDE CATEGORY
87%
SME
Pe nd ing 2Ws-3Ws
0
3. 51
DQ Sc ore as of 2016-10-06 (147 CDEs )
CUS_NAME
VPB_CUSTOMER
2,290,306
5
5
5
Custome r
VPB_GENDER
VPB_CUSTOMER
2,204,104
3
3
5
Custome r
MARITAL_STAT
VPB_CUSTOMER
2,204,104
1
1
1
Custome r
COUNTRY
VPB_CUSTOMER
2,290,306
1
1
1
Custome r
EDUCATION
VPB_CUSTOMER
2,204,104
1
1
1
Custome r
VPB_JOB_TITLE
VPB_CUSTOMER
2,204,104
1
1
1
Custome r
NAME_OF_OFFICE
VPB_CUSTOMER
2,204,104
1
1
1
Custome r
ANNUAL_INCOME
VPB_CUSTOMER
2,204,104
1
1
1
Custome r
COMPANY_BOOK
VPB_CUSTOMER
2,290,306
5
5
5
Custome r
PB_DAO
VPB_CUSTOMER
2,204,104
1
1
1
Custome r
EMAIL_ADDR
VPB_CUSTOMER
2,290,306
1
1
1
Custome r
ADDRESS
VPB_CUSTOMER
2,290,306
5
5
5
Custome r
Re tail
Finishe d on time
11
1
On-Tra c k
Financ e
To ta l 12 3 - Unde r dis cus s io n: No so lu tio n afte r 1 o r 2 me e tin g s - Pe nding : Pro ce ssin g b u t misse d d e ad lin e o r n o so lu tio n afte r 3 me e tin g s - On-Tra ck : Be in g re so lve d fo llo win g ag re e d d e ad lin e s - Finis h o n tim e : Re so lve d o n time - M is s e d De a dline : Re so lve d b u t misse d d e ad lin e
1
Risk
1
1
0%
1
10%
Pe nding > 3W
1
20%
10
30%
40%
Und er Discus sion
50%
O n-Track
60%
70%
Finishe d on tim e
80%
90%
100 %
Mis se d De ad line < = 1W
Data Governance – Data quality Apply Data quality management (2) DQ Score as of Oct 06, 2016 (147 Prioritize d CDEs )
Data quality issues to consider
DQ s core of ne w data & all data (123 Prioritize d CDEs )
DQ s core as of 2015 vs as of Oct 06, 2016 (123 CDEs )*
3. 80
3. 80
3. 60
3.69
3. 40
3.61
3.51
3. 40
3. 20
2. 80 DQ scor e as o f De c 31, 2015
DQ scor e n ew d ata Jan 1 - Oct 06, 2016
DQ s core as of 2015 vs as of Oct 06, 2016 by CDE s e t DQ Score as of De c 31, 2015
4. 64
3. 60
3. 60
2. 33
2. 81
4. 64
Custom er
Limit
Collater al
Card
Tr ansactions
4. 00
3. 46
3. 34
2. 33
Loans&Dep osits
Other
Top Im prove d CDEs 5 5 4 3 2 1 0
5
4
5
4
3
4
3
3
3
2
2
1 Custom er Seg ment
Birthdate/ Incorp Date
Revenue
D Q Sc ore as of D e c 31, 2015
1 DAO code
Loan Term
DQ scor e as o f O ct 0 6, 201 6
- Cre ate d BRD to make ke y custome r data fie lds mandatory in T24 syste m (e .g. Le gal ID, phone numb e r, b irthdate / incorp date , Tax code of SME custome rs, …) - Worke d with Finance to cle an custome r se ctor and se gme nt data - Update d data capture rule for+ SME custome r re ve nue data fie ld in T24 and cle anse d historical data - Cre ate d BRD to fix T24 rule of calculating loan te rm - Cre ate d BRD to fix T24 rule of calculating ove rdue day
DQ Score as of O ct 06, 2016
4. 88
DQ scor e as o f De c 31, 2015
Some data quality issues cannot be fixed by bank only
Actions tak e n to im prove DQ
4. 13
2. 67
3.59
3. 00
2. 80
4. 75
3.51
3. 20
3. 00
5.0 4.5 4.0 3.5 3.0 2.5 2.0
Bank need to focus more on training people
3. 60
Overdue day
D Q Sc ore as of Oc t 06, 2016
Loan Res tructure
Need to have guidance from SBV
What is Data Governance? Who owns data in your organization? Who is responsible for solving day to day data issues in the organization?
Is your data secure? Who will be responsible in case of a security breach? How good is the quality of your data? Who has the responsibility for improving it? Everyone in your organization understands the same meaning of data? Who decides how to capture & structure the data, where to store the data, how long to retain the data?
THANK YOU!