Discovering Blind Spots of Predictive Models: Representations and Policies for Guided Exploration Himabindu Lakkaraju, Stanford University [email protected]

Exciting Times

ML Applied to Critical Domains

Biases in ML

[Lakkaraju, Caruana, Horvitz; AAAI 2017]

Outline    

Blind spots: Overview Problem Formulation Our Approach Experimental Results

Focus: Detection of unknown unknowns 





Unknown unknowns: Instances with highlyconfident but incorrect predictions Blind-spots: Feature subspaces with high concentration of unknown unknowns Unknown unknowns and blind spots occur due to a variety of reasons. 

mismatch between training and execution data.

Common Assumption in ML dogs

M cats training data real-world concepts

Biases in Training Data x

M

training data real-world concepts

wrong label high confidence

Biases in Training Data dogs

M cats cat (conf = 0.96)

Discovery of unknown unknowns in the Wild 

Goal: Discover unknown unknowns  

 

The predictive model is a black box No access to the training data

Exploration space: Execution data Assumptions 



Unknown unknowns do not occur at random (Attenberg et. al., 2015 ) There exist features in the data that can characterize unknown unknowns (No free lunch theorem)

Inputs 





Threshold for ‘confidence’ and class of interest chosen by the user

A set of N instances 1 2 N which were confidently assigned to a class of interest by the black box predictive model M and the corresponding confidence scores 1 2 N An oracle which takes as input a datapoint and returns its true label as well as the cost incurred to determine the true label A budget, , on the number of times the oracle can be queried

Problem Definition M

. Set of high confidence instances Utility function: Problem statement: Find

s.t.

is maximized.

Problem Definition M

. Set of high confidence instances Utility function:

How to search the data space? Problem Find discoveries with oracle s.t. feedback? is maximized. How to statement: guide future How to trade-off exploration with exploitation? How to interpret regions of unknown unknowns?

Our Framework Input: Execution data points with high confidence

Step 1: Descriptive Space Partitioning Step 2: Multi-armed bandits for unknown unknowns

White Dogs

White Cats

Brown Cats Brown Dogs

Descriptive Space Partitioning 



Partition the instances such that those with similar feature values and confidence scores are grouped together. Each group must be associated with a descriptive pattern highlighting the characteristics of the instances in the group.

Descriptive Space Partitioning 



Obtain candidate patterns using frequent itemset mining algorithms [E.g., Apriori] Choose a set of patterns to ‘group’ the instances in the set such that:  

 

Descriptive Space Partitioning Input: candidate pattern set , Objective: Find

s.t.

Reduction to weighted set cover  NP-hard approximation with greedy algorithm which picks at each step a pattern with maximum coverage-to-weight ratio

Bandits for Unknown Unknowns  



Each partition  an arm Pulling an arm  sampling a point without replacement Various stationary and non-stationary bandit algorithms 

Step 1: space partitions

 

UCB1 Discounted UCB, Sliding window UCB UUB

(

Multi-Armed Bandit Algorithms 

UCB1: 





Regret = log(T)

Mean reward : Average reward obtained by pulling arm till time Upper confidence bound :

Sliding window UCB 



:

Regret = T log(T)

Mean reward : Average reward obtained by pulling arm over the past plays. Upper confidence bound : Same as UCB1 except and are computed over the past plays

Multi-Armed Bandit Algorithms 

Discounted UCB 

reward at time

is weighted by

Upper confidence bound except: 



Regret = T log(T)

Mean reward : Average discounted reward obtained by pulling arm till time 



:

When computing by

Sliding

and

: Similar to UCB1 , pull at time

is weighted

Multi-Armed Bandit Algorithms 

Our algorithm – UUB: 



No need to set discounting factor Mean reward : Average discounted reward obtained by pulling arm till time 



Regret = T log(T)

reward at time

is weighted by

Upper confidence bound except: 

When computing by the ratio above

and

: Similar to UCB1 , pull at time

is weighted

Experiments 

Sentiment Snippets 



Subjectivity dataset from Rotten Tomatoes 



Bias: Missing subspaces of data

Amazon Reviews 



Bias: Missing subspaces of data

Bias: domain adaptation; train on electronics reviews and deploy on book reviews.

Image Data 

Bias: Missing subspaces of data; training data comprises of black dogs and non-black cats

Evaluation: Images Data 

Blind spots: non-black dogs, black cats Blind spot: black cats

cats dogs

Blind spot: white dogs white cats

Evaluation: Images Data 

Blind spots: non-black dogs, black cats Blind spot: black cats

cats dogs

Blind spot: white dogs white cats

Evaluation: Images Data 

Blind spots: non-black dogs, black cats Blind spot: black cats

cats dogs

Blind spot: white dogs white cats

Exploration resources spent heavily on blind spots

Evaluating DSP

Lower entropy  Better separation of unknown unknowns

Evaluating Bandits

Lower regret  More effective discovery of unknown unknowns

Comparison with Alternative Methods

From unknown unknowns to blind spots 

Interactively discovering blind spots: 



Incentivizing diversity: 



The system designer can interactively decrease (or increase) reward for an arm Reward of discovering similar unknown unknowns decreases with each additional discovery

Our framework is generic enough to adapt to either of these extensions

Questions

[email protected]

Example

Discovering Blind Spots of Predictive Models

Problem Definition. M. Set of high confidence instances . Utility function: Problem statement: Find. s.t. is maximized. How to search the data space? How to guide ...

6MB Sizes 0 Downloads 146 Views

Recommend Documents

Discovering Unknown Unknowns of Predictive ... - Stanford University
Unknown unknowns primarily occur when the data used for training a ... To the best of our knowledge, this is the first work providing an ..... Journal of computer.

Predictive Models of Cultural Transmission
Thus, only those ideas that are best fit for a mind are remembered and ... The key idea behind agent-based social simulation (ABSS) is to design simple bottom- ... goal directed agents that plan sequences of actions to achieve their goals. ... caused

FREE [P.D.F] Big Money Thinks Small: Biases, Blind Spots, and ...
Online PDF Big Money Thinks Small: Biases, Blind Spots, and Smarter Investing (Columbia Business School Publishing), Read PDF Big Money Thinks Small: ...

Competitive blind spots in an institutional field
Published online 20 November 2008 in Wiley InterScience ..... veterinarians, and the focal swine genetics firm) to have heterogeneous ... the TMT, the greater the degree of heterogeneity ..... key competitive attributes consisting of size, tech-.

Building patient-level predictive models - GitHub
Jul 19, 2017 - 3. 3.2 Preparing the cohort and outcome of interest . .... We will call this the cohort of interest or cohort for short. .... in a way that ensures R does not run out of memory, even when the data are large. We can get some .... plotDe

Points of view and blind spots: ELF and SLA
L1 plays in the acquisition of an L21,2 (see e.g. Odlin 1989). On the other hand ... where such L1 transfer deviates from NS use, the result should be regarded .... have their own linguistic characteristics at each stage of development, from beginner