Explainable Image Understanding using Vision and Reasoning Somak Aditya Department of Computer Science, Arizona State University, Tempe, USA
AAAI-17 Doctoral Consortium
Image Understanding Through Text
General Architecture What is Understanding? (do you understand) ● Ask students questions about a subject, if the student can answer then he/she “understands” it. ● UNDERSTANDING here is equivalent to Question-Answering Quality of understanding (how much do you understand) ● Increase difficulty of questions. ● According to Bloom’s Taxonomy [4], they are:
Image Understanding through text: ● Gained huge popularity recently. ● Primarily two tasks are designed: ○ Caption Generation ○ Visual Question Answering
○ Knowledge, Comprehension, Application, Analysis, Synthesis, Evaluation Architecture Should: ー Explicitly model Connections between Vision and Reasoning and Knowledge. (Modular or not, reasoning and knowledge has to be modeled/learnt by any Image Understanding system, VQA and Captioining models learn some such knowledge implicitly, but not clearly)
Explainability
i. ii. iii. iv.
(Knowledge) List the objects in the image. (Comprehension) what will the man do next? (Application) how to cut tofu? (Analysis) Why is the man holding the bowl with his other hand? v. (Synthesis) Can you propose how else to cut a tofu? vi. (Evaluation) Is there a better way to cut a tofu?
Got the Results: ● Why did you do that? ● Why not something else? ● When do you succeed? ● When do you fail? ● When Can I trust You? ● How do I correct an error? How Do you Explain: ● (Customer) Natural Language, simple ● (Manager) Structured, detailed.
Difficulty in Current Architectures Difficulties of Explainability in End-to-End Learning: ● What to fix? ○ module/parameter/function ● ● Some work on understanding Learnt Models: [3] ● Can we explain in Natural Language space or symbolic space? ● ● Is structured explanations possible? ● Largely Un-explored. ● What Can we do? ● ○ [Impose A Structure] Use Knowledge and Probabilistic Reasoning to replace final layers. (Current Work) ○ [Explanation Interface] Use Knowledge and ● Probabilistic Reasoning to explain. (Have plans to explore)
DeepIU: An Architecture
What Do we Try? [Examples] Visual Commonsense for Scene Understanding using Perception, Semantic Parsing and Reasoning [Image Captioning] Image Understanding Through Scene Description Graphs [Architecture] DeepIU: An Architecture for Image Understanding [Puzzles/New Challenge] Image Riddles using Vision Reasoning
An Example was The Implementation Used In SDG Another Example
Type of Knowledge Used: • A Knowledge Graph: Combined semantic Parses of sentences. • Bayesian Network: Dependencies among objects and scene constituents.
How did we store the knowledge: • Graph on File-system with self-built Query-Engine
Reasoning Module Used: • Probabilistic Reasoning using Bayesian Network, and IF-THEN reasoning using Constructed Knowledge-Base
Components Used
Application: Image Riddles
Toy Example What is the Connecting Word? Answer is: Fall Why? i) First image depicts the season Fall, ii) the second one has water-fall, iii) the third one has rainfall and iv) a statue is “fall”ing
The Tofu example: ● A snapshot of a cooking video. ● We detect triplets :
Explainable Image Understanding using Vision and ...
Department of Computer Science,. Arizona State University, Tempe, USA. Image Understanding Through Text. Summary. General Architecture. Application: Image Riddles. AAAI-17 Doctoral. Consortium. Image Understanding through text: â Gained huge popularity recently. â Primarily two tasks are designed: â Caption ...
fication and identification systems have been implemented and impressive ... These systems usu- ..... systems usually are into a more transparent one, since ba-.
computer vision and pattern Recognition, Las Vegas, June 2006. [8] R.M. Mersereau, âThe processing of Hexagonally Sampled Two-. Dimensional Signals,â Proceedings of the IEEE. 67: pp. 930 949, 1979. [9] X. He and W. Jia, âhexagonal structure for
present two techniques to remove illegal discrimination from the training data. Section 6 ..... There are two programs: medicine (med) and computer science (cs) with potentially different ...... Science) degree from University of the Central Punjab (
... are an important source of information for monitoring climate change issues, ... due to cloud coverage, which may be especially high over rain forests. So the ...
addresses and data of hexagonal pixels. As shown in Fig. 2, the spiral architecture is inspired from anatomical consideration of the primate's vision system.
with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real ..... Generative Adversarial Networks,â International Conference on Learning ...
digital data is one, and the transmission is switched off when the digital data is zero. ..... communication using a chaos based signal encryption scheme", IEEE ...
Indian Statistical Institute, 203 B. T. Road, Kolkata, India 700108. E-mail: {dsen t, sankar}@isical.ac.in. ... Now, we present the first- and second-order fuzzy statistics of digital images similar to those given in [7]. A. Fuzzy ... gray values in
Aug 28, 2009 - ABSTRACT. An image and video retargeting algorithm using an adaptive scaling function is proposed in this work. We first construct an importance map which uses multi- ple features: gradient, saliency, and motion difference. Then, we de