The Inside Report “Choosing a Data Catalog: Guidance for a Critical Decision” April 4, 2018
Eric Kavanagh
CEO, Inside Analysis
Andy Sheldon
VP Marketing, Unifi
Dave Wells
Educator, infocentric
Dave Wells
[email protected]
What Is a Data Catalog?
4
A Collection of Metadata
+
A Set of Capabilities Discovery
Curation
Governance data about datasets data about processes data for search & discovery data for governance data about people & usage
Preparation
Collaboration
4
Why a Critical Decision?
5
A Collection of Metadata
+
A Set of Capabilities
An essential component of modern data management Discovery The new standard for metadata management Curation
Fundamental enabler of self-service data analysis The “go to” technology for data curation An imperative data forabout modern data governance datasets data about processes data for search & discovery data for governance data about people & usage
Governance
Preparation
Collaboration
5
The Stakeholders
6
A Collection of Metadata
+
A Set of Capabilities Discovery
Line-of-Business and Self-Service Analysts Data Scientists
Curation
Data Engineers Governance
Data Curators and Data Stewards Data
data about datasets Governance Organizations data about processes data for search & discovery data for governance data about people & usage
Preparation
Collaboration
6
Data Analysts and the Data Catalog 7
7
The Data Catalog Marketplace
Evaluating Data Catalog Tools 9
EASE OF USE
DISCOVERY … cataloging, searching, evaluation, recommendations INSIGHTS … access, usage, socialization, interoperability GOVERNANCE … security, lineage, compliance, quality, and more ENVIRONMENT … deployment, services, pricing, roadmap
9
Discovery Criteria
10
Cataloging Datasets • automated discovery, machine learning, semantic inference, automated tagging • all types of datasets? reports? dashboards? scorecards? Cataloging Data Operations • individual data transformations and workflow sequences • mandatory operations such as masking of PII Searching • natural language search, facets, keywords, business terms • sort by relevance, hide datasets not authorized to access Recommendations • usage history and machine learning bases • recommend related datasets and data operations Dataset Evaluation • previews and profiles • ratings, reviews, and annotations
DATASET DISCOVERY
10
Insights Criteria
11
Data Access • access directly from catalog, seamless discovery and acquisition • security and privacy sensitive data protections Data Usage • who uses? what use cases? frequency of use? user experience? • combined with what datasets? which data operations? Socialization • crowdsourcing, collaboration, community • ratings, reviews, feedback Integration and Interoperability • seamless experience across the analytic lifecycle • working together with data preparation and analysis tools • working together with security and governance controls
DATA DRIVEN INSIGHTS
11
Governance Criteria
12
Metadata Catalog • richness and completeness of metadata • datasets, processes, people, and usage metadata Security • work with existing security infrastructure and processes? • what levels of access controls – record, row, field, values ? Lineage • full traceability from original source to analysis & reporting Compliance • PII and privacy sensitive data, obfuscation and masking Quality • expose data conflicts, identify data deficiencies • annotate and show curator & user comments on quality
DATA GOVERNANCE SUPPORT
Curation • features to add and remove datasets, annotate, tag, create and change metadata Valuation • help to quantify value of data assets, known use cases, frequency of use, etc. 12
Environment Criteria
13
Deployment • on premises? cloud? multi-cloud? hybrid? • mobile and geographically dispersed users Services • training and consulting • forums and user groups Pricing • pricing model – number of users? number of datasets? volume of data? • initial costs, ongoing costs, estimating TCO Vendor Roadmap • future plans and timelines • more integration and interoperability? new partnerships? • new data connectors? • advanced data governance features? • increase collaboration and socialization?
ENVIRONMENT COMPATIBILITY
13
Choose the Right Data Catalog 14
Managing data without a data catalog is ill advised and impractical! An essential component of modern data management The new standard for metadata management Fundamental enabler of self-service data analysis The “go to” technology for data curation An imperative for modern data governance
14
15
Dave Wells
[email protected]
Choosing a Data Catalog: Guidance for a Critical Decision Andy Sheldon, Vice President of Marketing Russell Christopher, Director of Customer Solutions
April 4, 2018
unifisoftware.com