International Journal of Pattern Recognition and Arti¯cial Intelligence Vol. 29, No. 3 (2015) 1553003 (26 pages) # .c World Scienti¯c Publishing Company DOI: 10.1142/S0218001415530031

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Camera-Based Whiteboard Reading for Understanding Mind Maps

Szilard Vajda* Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health Bethesda, Maryland, USA [email protected]

Thomas Pl€otz School of Computing Science Newcastle University, NE7 1NP Newcastle upon Tyne, UK [email protected]

Gernot A. Fink Department of Computer Science TU Dortmund University, 44221 Dortmund, Germany [email protected] Received 8 July 2013 Accepted 2 January 2015 Published 30 March 2015 Mind maps, i.e. the spatial organization of ideas and concepts around a central topic and the visualization of their relations, represent a very powerful and thus popular means to support creative thinking and problem solving processes. Typically created on traditional whiteboards, they represent an important technique for collaborative brainstorming sessions. We describe a camera-based system to analyze hand-drawn mind maps written on a whiteboard. The goal of the presented system is to produce digital representations of such mind maps, which would enable digital asset management, i.e. storage and retrieval of manually created documents. Our system is based on image acquisition by means of a camera, followed by the segmentation of the particular whiteboard image focusing on the extraction of written context, i.e. the ideas captured by the mind map. The spatial arrangement of these ideas is recovered using layout analysis based on unsupervised clustering, which results in graph representations of mind maps. Finally, handwriting recognition derives textual transcripts of the ideas captured by the mind map. We demonstrate the capabilities of our mind map reading system by means of an experimental evaluation, where we analyze images of mind maps that have been drawn on whiteboards, without any further constraints other than the underlying topic. In addition to the promising recognition results, we also discuss training strategies, which e®ectively allow for system bootstrapping using out-of-domain sample data. The latter is important when *Corresponding

author. 1553003-1

S. Vajda, T. Pl€ otz & G. A. Fink addressing creative thinking processes where domain-related training data are di±cult to obtain as they focus on novelty by de¯nition. Keywords : Camera-based document recognition; whiteboard reading; mind map recognition; handwriting recognition; document layout analysis.

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

1. Introduction The visualization of mental associations has a long history in a variety of challenging processes related to creative thinking and idea ¯nding,2 most prominently used in brainstorming sessions. Examples of which are all kinds of learning processes and problem solving. Concept maps in general, and mind maps2 in particular, are e®ective tools for transcribing, organizing and representing ideas and their relations. Mind maps are diagrams of words, ideas, and tasks, which are linked to a general topic. The latter is typically the centered root point in a radial graph, where nodes represent the conceptual entities — usually short texts with a single or just a small number of words each — and connecting edges visualize their relations. Arguably, brainstorming is most e®ective if the participants of creative thinking meetings can focus exclusively on the idea ¯nding process. Therefore, collaboratively creating mind maps using traditional means of pens and whiteboard still represents the standard technique in brainstorming sessions (cf. Fig. 1). However, for archiving and retrieval a digital representation of the document is usually desirable. We present a camera-based whiteboard reading system, which processes handdrawn mind maps as they are typically created in brainstorming sessions. The system

Fig. 1. Example of an automatic mind map image: The segmented layers of the mind map image are shown in distinguished grids. 1553003-2

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Camera-Based Whiteboard Reading for Understanding Mind Maps

stays completely in the background, i.e. it does not interfere with the brainstorming process itself, but provides a digital representation of mind maps. Based on our previous work in the ¯eld,35,46,47,49,52 this paper is the ¯rst of its kind, which presents a complete camera-based mind map reading system for real-world scenarios starting with the image acquisition, and ending up with the complete recognition of the mind map. Our mind map reading system consists of three modules, namely (i) image segmentation, (ii) document layout analysis, and (iii) word recognition. Following basic image pre-processing, i.e. normalization and de-noising, the mindmap image is segmented into basic document elements, i.e. textual and graphical items constituting the mind map, combining connected component (CC) extraction and a statistical classi¯er. Subsequently, the textual components identi¯ed are grouped into text patches using an unsupervised clustering approach, which e®ectively analyzes the mind map structure by recovering a graph representation of its nodes and edges. All node labels — i.e. the text patches associated with them — are then forwarded to an HMM-based, writer-independent handwriting recognition (HWR) module, which provides the machine-interpretable transcription of the ideas conveyed by the analyzed mind map. All modules are integrated into a software framework, which represents the interface for both the automatic mind map analysis, and for the digital asset management (archiving and retrieval). In addition to the presentation of the technical contributions, we discuss strategies for robust system bootstrapping in scenarios, where — by de¯nition — the amount of sample data is typically very limited. The latter is reasoned by the fact that brainstorming addresses the creation of novel ideas. Thus annotated training data is typically hard to obtain. This paper summarizes the results of a long-term research endeavor. Consequently, parts of the results presented here have already been published invididually in workshop and conference contributions35,46,47 as well as invited extended versions thereof.49,48 This paper puts together the complete set of methods developed, proposes an improved approach to clustering handwritten document elements, and presents new text recognition results using a robust writing model trained on out-ofdomain data. 2. Related Work Automatic mind map reading from whiteboard images represents a new application domain, which basically touches three research ¯elds: (i) basic image (pre-) processing, (ii) automatic analysis of graphical structures, and (iii) HWR. Figure 2 illustrates mind map analysis as it is addressed in this paper by means of the input data, which needs to be processed (Fig. 2(a)), and the desired output (Fig. 2(b)). Camera-based whiteboard reading needs to deal with the classical image acquisition, and pre-processing procedure as it is standard for computer vision 1553003-3

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

S. Vajda, T. Pl€ otz & G. A. Fink

(a)

(b)

Fig. 2. Example of automatic mind map analysis: (a) Camera-captured image of a mind map document hand-drawn on a whiteboard and desired output, i.e. (b) digital representation of the mind map.

applications. In particular this includes image de-noising, color and illumination normalization, etc.15 More speci¯c to the ¯nal recognition step is the pre-processing of the actual handwriting, where pre-processing operations are applied that attempt to normalize the appearance of the writing with respect to baseline orientation (frequently also referred to as skew), slant angle, and size (cf., e.g. Ref. 11). Handwritten documents can contain a variety of items such as text blocks, lines, words, ¯gures, tables, etc. The primary goal of document structure and layout analysis is to detect these di®erent regions and to identify their functional roles and relationships.31 Somewhat related to the analysis of mind maps, the recognition of line drawings aims to recover the high-level design from engineering drawings, e.g. to recognize pipes, lines, roads, or rivers in maps.44 Similar to the use-case addressed in this paper, a limited and well-de¯ned set of graphical items needs to be recognized, which includes segmentation and classi¯cation. For engineering drawings, text/graphic separation is not straightforward since text portions overlap with other objects in the images.45,44 Another important application domain for layout analysis is automated document analysis, where the functional elements of documents need to be detected and identi¯ed. For example, in postal automation letter envelopes are typically analyzed aiming for the separation of the address block and the stamp.38,50 In Ref. 30, the structure of business cards is unveiled, whereas Ref. 38 addresses the segmentation of legal documents. The pre-dominant approach for virtually all layout detection and segmentation applications is based on the analysis of CCs.9 The detection of CCs is typically based on blob analysis in raw image data employing some form of image segmentation approach.15 The actual classi¯cation of CCs is usually performed by means of straightforward threshold comparison based on certain geometric features like height, width, aspect ratio, pixel density, number of horizontal and vertical segments.9,17,30,33,45 1553003-4

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Camera-Based Whiteboard Reading for Understanding Mind Maps

For printed text detection a larger variety of methods is available.19 Some of them use texture, some of them color, while some others are region based. To detect regions, Gao et al.13 employ visual attention models. Even though characters could be not salient, but regions containing text are salient, therefore they apply a second pixel-based ¯ltering after the extraction of global salient regions. Maximally stable extremal regions (MSER)29 were considered in Refs. 17 and 54 to detect possible candidate regions by pruning the MSRE tree by ¯ltering out the unlikely regions based on color, size, aspect ratio, and number of holes. A method based on transfer learning involving Adaboost14 was successfully evaluated on ICDAR2001 scene text detection competition dataset involving windows classi¯cation based on histogram of oriented gradient (HOG), number of extended edges in the image, average and variance of stroke width, local binary patterns (LBP), etc. Color distance, color variance, is also considered54 beside spatial distance among possible textual components characterized originally by HOG. Similar attempt to use color is considered by Phan et al.33 to group together possible candidates. A k-means (k ¼ 3) based on color was applied by Zeng et al.55 to decide for each pixel, if it belongs to text foreground, to the background or to the noise class. Edge-based methods were considered using di®erent color bands in Ref. 33. An uncommon, but rather interesting method based on skeleton structure classi¯cation is proposed in Ref. 43, which seems to work up to 70% accuracy for text/nontext separation. The automatic recognition of handwritten script has been subject to both industrial and academic research for more than 50 years,1,12 resulting in a variety of approaches and systems for both online and o²ine processing. The latter is the technological basis for mind map reading as it is addressed by this paper. Images of handwriting — captured after the text has been written — are analyzed with the objective to unveil a textual transcription.34 O²ine HWR techniques follow either a holistic approach, where isolated words are analyzed as a whole (cf. Ref. 1 and the references therein). Alternatively, and more widely used, the recognition is performed on character level, which is either based on explicit segmentation (employing all imaginable varieties of pattern recognition techniques, e.g. using neural networks16), or performed in a segmentation-free manner. In the latter case temporal models — most prominently Markov Models37 — are applied. Automatic whiteboard reading has ¯rst been proposed in Ref. 53. Using cameracaptured whiteboard images the task has been approached as a special kind of o²ine HWR problem combining robust pre-processing and feature extraction methods with Markovian models for representing the appearance and the linguistic structure of the texts to be recognized.52 The problem of analyzing whiteboard documents has also been tackled in constrained settings40,42 and by using specialized sensing equipment.20,22 The Brightboard system42 continuously observes the whiteboard, and grabs a suitable image when the movement of the writer has completed. The image is then analyzed to detect and recognize special marks which can control a computer. 1553003-5

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

S. Vajda, T. Pl€ otz & G. A. Fink

A similar system is proposed by the ZombieBoard system40 which scans the whiteboard for special marks and their corresponding commands using an active camera and a mosaicing algorithm. Using a pen-tracking device and analyzing pen trajectory data in contrast to images of handwritten script, the problem can be signi¯cantly simpli¯ed at the price of requiring a specialized hardware setup. The approaches presented in Refs. 20 and 22 both make use of hardware solutions for pen-tracking. In Ref. 20 thus an online recognition approach can be applied for the recognition of text written on a whiteboard. In Ref. 22, a multi-touch table is coupled with a pen providing self-tracking capabilities in order to manipulate objects and annotate content. The main drawback of such approaches is the strict requirement of special and costly hardware which cannot be easily found in a usual meeting room. Therefore, the usability of the systems is limited to quite special settings. 3. Camera-Based Mindmap Reading Camera-based whiteboard reading is an extremely challenging task, and can still be considered an open research problem. Our previous work in the ¯eld was focused on developing fundamental techniques for o²ine recognition of whiteboard documents.35,46,47,49,52 This previous research also showed that camera-based whiteboardreading can realistically be considered as an o²ine document recognition problem only. The primary reason for this is that whenever whiteboard-documents are created by naive users in realistic scenarios, the pen used for writing is hardly ever visible to an observer. Therefore, camera-based pen-tracking as applied in Ref. 6 is not feasible, and the images of the whiteboard content — or patches thereof — have to be processed as o²ine documents. In this paper, we focus on the recognition of images of mind maps handwritten and hand drawn on whiteboards addressing the following three central challenges. First, though the task of mind map recognition is well de¯ned on a semantic level, there still exist hardly any constraints with respect to the layout of the considered documents, that could be robustly used for detecting and identifying elementary document units as, e.g. textual items or arrows connecting nodes in the mind map. Consequently, we use machine learning techniques for detection and unsupervised clustering methods for grouping of elementary units in the documents. Secondly, as it is common for special recognition tasks which have not become mainstream yet, we are faced with the fact that only a quite limited amount of domain speci¯c document samples are available. Therefore, for model training we employ large data sets of handwritten material that are related to the task, but are neither directly from the same domain nor of the same rather low document image quality, as it is considered here with camera-captured documents. Thirdly, we investigate methods for increasing the generalization capability of a handwriting recognizer. This can be seen as a supporting measure in order to deal with the fact that training is achieved on out-of-domain data. 1553003-6

Camera-Based Whiteboard Reading for Understanding Mind Maps

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

3.1. Proposed approach We chose to base our method for mind map image segmentation on CC (connected component) analysis as this representation is rather well suited for handwritten documents, and is widely and successfully used in the document analysis ¯eld (cf. e.g. Ref. 9). However, the main drawback of these methods usually is the presumption of a certain amount of well-organized, well-structured text/graphics material, which can serve to build rule-based strategies for distinguishing text from nontext and for identifying di®erent document items. (cf. Refs. 3, 9 and 45). Color could have been considered as quite a strong clue to segment text/nontext regions, but in our mind map scenario usually only one marker was used during the creation process for text and nontext alike, therefore we have not seen the importance of the usage of the di®erent color channels. This could have lead us to heuristics such as applied in Refs. 17, 33 and 55 or even leading us to confusions. In order to avoid as much as possible such heuristics, our method is based on the use of a statistical classi¯er, namely a neural network for distinguishing between relevant textual and graphical items. The main advantage of such a machine learning approach is that the model can be estimated on annotated sample data automatically.a As mind map documents do not follow a well-de¯ned layout structure and may show large variations in format and style, simple layout analysis techniques, as, e.g. pro¯le-based methods, will fail completely. Therefore, it is necessary to use a modeling approach that is able to °exibly adapt to the actual document layout observed. Consequently, we proposed to use methods based on unsupervised learning, as such techniques — to some extent — are able to automatically discover structure inherently given by some set of patterns without requiring prior knowledge about the data to be given (cf. Ref. 26). In the case of mind map images the main goal of document layout analysis is the discovery of groups of textual elements forming larger text patches. In order to overcome the limitations of the text patch grouping using hierarchical clustering based on the Euclidean distance in our previous work,47 we propose to apply an adaptive clustering to the textual elements identi¯ed in the segmentation step. The adaptive clustering is realized by combining a growing neuronal gas (GNG)10 for the extraction of dense regions of textual items and density-based spatial clustering of applications with noise (DBSCAN)39 for extracting text element clusters that are considered as text patches in the subsequent processing. The text patches identi¯ed by the automatic layout analysis are assumed to correspond to the node labels of the mind map considered. In order to recover a digital mind map representation these text-patch images have to be transcribed using a suitable handwriting recognizer. Based on experience acquired in our previous research and following a general trend towards such methods in the document analysis ¯eld (cf. Ref. 37), we apply a segmentation-free approach based on hidden aA

preliminary version of this work has been published in Ref. [49]. 1553003-7

S. Vajda, T. Pl€ otz & G. A. Fink

Markov models (HMMs). As all machine learning methods this approach o®ers the advantage that model parameters can automatically be learned from sample data but, consequently, also requires a su±ciently large set of training data to be available. Unfortunately, in our scenario domain speci¯c training data is not available in the necessary quantity for training a general handwriting recognizer. Therefore, we use out-of-domain data for that purpose.

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

3.2. Overview All these proposed strategies are implemented in an integrated software framework, and allow for an automatic reading of handwritten mind maps. Based on a camera image of a hand-drawn mind map a digital representation of the mind map document can be created which can serve as the starting point for further automatic document processing and analysis on a symbolic level. An overview of the automatic mind map reading process is shown in Fig. 3. In the remainder of this paper we will ¯rst describe in detail the image acquisition and segmentation approach in Sec. 4. In Sec. 5, our unsupervised approach to the layout analysis of mind map documents will be presented. Afterwards, the development of an o²ine handwriting recognizer for camera-captured mind map documents will be explained in Sec. 6. The evaluation of the proposed approaches on a challenging data set of mind map images will be presented in Sec. 7, while the results obtained and the implications for future research will be discussed in the ¯nal section.

Fig. 3. Overview of the proposed whiteboard-reading system showing exemplary results of the di®erent processing steps. 1553003-8

Camera-Based Whiteboard Reading for Understanding Mind Maps

4. Image Segmentation The image segmentation is meant to separate elements written on the whiteboard with a marker from the whiteboard background and noisy image parts, followed by categorizing the written content into di®erent mind map elements (e.g. text, line, circle, arrow).

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

4.1. Segmentation of the camera image After the image acquisition by the camera, the objective is to extract only the written content from the whiteboard. This relevant information is then added to a binary region memory (cf. Fig. 1). The region memory represents the current state of written content on the whiteboard and it is robust to changes in the camera image, like illumination or particular users standing in front of the camera. Therefore, the general assumption is that the camera image does not contain anything but the interior of the whiteboard. The camera and the whiteboard are ¯xed. The system handles images that can consist of three di®erent regions, namely: (i) text (indicated by bright blocks in Fig. 1), (ii) background (indicated by dark blocks in Fig. 1), and (iii) noise (indicated by blocks with grid pattern in Fig. 1). The applied segmentation is an implementation of the original method proposed by Wienecke et al.53 The segmentation is not performed on pixel but rather on block level, to provide a certain robustness w.r.t. to illumination changes and other minor changes in the whiteboard scene. The image is therefore divided into two layers of overlapping blocks. Each block is segmented into one of the formerly mentioned categories based on three features: gradients, gray level and changes between two consecutive images. Based on these measures and the corresponding thresholds estimated on some trial runs, the whiteboard content is successfully separated from the rest of the scene. For further details please see Refs. 47 and 53.

4.2. Extraction of connected components The upcoming step is the extraction of the CCs. Disregarding probable °aws in the image (e.g. inhomogeneous lighting, or nonopaque marker color) separating the mind map by CC analysis is reasonable, and no prior knowledge about the whiteboard content is necessary. This choice is motivated by the fact that instead of using complex skeletonization and curve tracing procedures, this system manipulates CC which can reliably be extracted without any heuristics. For the CCs extraction the image is binarized ¯rst, using Niblack's local method,32 considering a variant which applies threshold optimization,41 and local thresholding in a 51  51 pixels windows using integral images51 for e±ciency. The height, width, aspect ratio and pixel density of the CCs9 serve as selection criteria for ¯ltering. Small CCs (5  5) containing only a few pixels or rather large components (70% of the original image height or width) like whiteboard borders are discarded. 1553003-9

S. Vajda, T. Pl€ otz & G. A. Fink

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

4.3. CC classi¯cation After ¯ltering, the remaining items (CCs) are classi¯ed into: (i) text, (ii) line, (iii) arrow and (iv) circle. In order to classify the CCs, a feature vector (shape set) composed of 12 measures (i.e. contrast, edge density, homogeneity, number of foreground gray levels, foreground mean gray level, relative amount of gradient orientations, Sobel gradient orientation and magnitude, etc.)b is derived to characterize each component. Alternatively, intensities of gradient histograms (values ranging from 0 to 255, equally divided into 16 bins) of the CCs serve also as another type of features (gradient set). Due to the limited dimension of the feature vector, 12 and 16, respectively, the authors have not seen the interest to perform any feature selection strategy. The nature of each feature is di®erent, therefore no high correlation is to be expected. A multi-layer perceptron (MLP) with one hidden layer is used to perform the classi¯cation. No prior knowledge about possible correlations between the extracted feature components has been considered. In consequence, a fully connected network topology was used, and trained with classical error backpropagation. The number of neurons used in the input layer was de¯ned by the dimensionality of the feature vectors, which is 12 and 16, respectively. Due to the fact that four classes were to be identi¯ed (i.e. text, line, arrow, circle), four neurons were considered in the output layer, and 15 and 20 neurons were used in the hidden layer. The number of hidden neurons was established by several trial runs,49 involving 5; 10; 15; 20; 25 neurons. For learning rate we used 0.0001, while for the momentum we considered 0.3.

5. Document Layout Analysis Knowing that characters usually appear closer to each other than the other elements, by clustering they should group with their kind rather than with nontext elements. For this reason we discard in the further processing all nontext elements from the document, and focus the attention only on text snippets. 5.1. Unsupervised layout modeling Instead of using dendrogram analysis built by single linkage clustering to group the nearest components,46,49 — our ¯rst attempt to tackle this problem — in this paper a totally new idea is proposed, namely the layout modeling by a self-organizing neural network.10,18 The aim is to adapt on the °y the modeling to varying layouts, sizes and orientations of scripts. The di®erent text CCs' gravity centers, can be represented in a two-dimensional Euclidean space. The goal is to model these coordinates with a self-organizing network. bA

complete formal description of the feature set can be found in Ref. 49. 1553003-10

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Camera-Based Whiteboard Reading for Understanding Mind Maps

The main advantage: There is no need for any labeled training data. The network adapts its neurons and their spatial arrangements to the topology (gravity centers of CCs) of the current document based on competitive learning. Another advantage over other methods (e.g. k-means): There is no need to specify beforehand the number of clusters as the number of words (clusters) di®er from one document to another. The GNG has N units (neurons), each one characterized by its coordinates (w), the accumulated error (error ), the edges between itself and other units and the age of the edge controlling the topology of the GNG network. The agemax ¼ 80, controls the maximal time to have an edge between nodes, b ¼ 0:25 and n ¼ 0:0002 are learning constants, while  ¼ 0:0005 serves as a learning factor to diminish the error in the units all over the network. The parameter k ¼ 100 controls the speed of growth in the network. The algorithm stops when all the available units are consumed in the network building, and the error is below a certain threshold  ¼ 0:1. A short description of the GNG algorithm applied on the gravity centers of the extracted text components is given in Algorithm 1. For more details w.r.t. the algorithm, please refer to the seminal work published by Fritzke.10 The modeling performances of GNG and SOM can be seen in Fig. 4. The gravity centers of the textual items (see Fig. 4(b)) are approximated. The GNG's modeling capability (see Fig. 4(d)) overcomes the more noisy representation provided by the SOM (see Fig. 4(c)). The noise introduced by the SOM is due to the fact that in this network the neurons remain always connected, while for GNG the connections might be annulled by removing the edges between di®erent nodes (see Algorithm 1). The number of units considered for the modeling is directly proportional to the number of CCs to be modeled. The number of units should be higher (in our case N ¼ 5  ðnumber of CCs)) in order to exploit the capabilities of the density clustering which follows. In Fig. 4(d) a clear distinction can be observed around di®erent words. Near the words a huge agglomeration of nodes can be spotted, creating already a visual distinction between di®erent words in the document. 5.2. Unsupervised word grouping After the modeling process, di®erent nodes (neurons) are merged into di®erent clusters to establish words. For this purpose, the DBSCAN was considered, a partitioning-based clustering method proposed originally by Ester et al.4 Instead of some classical clustering based only on distance measures (e.g. hierarchical clustering49), this approach considers not only the distance between elements, but also the density of the elements in a certain neighborhood. This allows to recognize the word clusters because within clusters there is a high density of point agglomerations (see Fig. 4(d)) — considerably higher than outside the clusters. In DBSCAN a point p belongs to a cluster, if and only if in the neighborhood of a given radius contains at least a minimum number of points, i.e. the density of the given neighborhood exceeds some threshold. The shape of the neighborhood is de¯ned by the distance measure considered. In our scenario the Manhattan distance 1553003-11

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

S. Vajda, T. Pl€ otz & G. A. Fink

will provide a rectangular neighborhood, while the Euclidean distance will de¯ne a circle. The latter metric was considered in our clustering task due to the unconstrained layout of the documents. The reason of an increased number of points (GNG units) considered for the modeling purpose as observed in Fig. 4(d) is motivated by the density. Instead of clustering the original points (see Fig. 4(b)), the DBSCAN clusters rather the GNG units (see Fig. 4(d)) which form dense regions around the words. The algorithm described brie°y in Algorithm 2 starts with an arbitrary point p and retrieves all the points density reachable from p w.r.t. , measuring the 1553003-12

Camera-Based Whiteboard Reading for Understanding Mind Maps 0

0 Original data points (CCs) distribution 200

400

400

600

600

800

800

1000

1000

1200

1400 200

1200

400

600

800

(a)

1000

1200

1400

1600

1800

2000

2200

1400 200

0

400

400

400

600

600

600

800

800

800

1000

1000

1000

1200

1200

1200

1400 200

1400 200

1200

(d)

1400

1600

1800

2000

2200

1200

1400

1600

1800

2000

2200

Color representation of the text items 200

1000

1000

GNG neurons agglomeration based on DBSCAN 200

800

800

0

GNG based neuron distribution

600

600

(c)

200

400

400

(b)

0

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

SOM based neuron distribution

200

400

600

800

1000

1200

1400

(e)

1600

1800

2000

2200

1400 200

400

600

800

1000

1200

1400

1600

1800

2000

2200

(f)

Fig. 4. The modeling process: (a) original document, (b) gravity centers of the text patches, (c) SOMbased modeling, (d) GNG-based modeling, (e) the DBSCAN on the original points and (f) the mapping of original point into the DBSCAN cluster space.

neighborhood and MinPoints, counting the minimum number of points necessary to form a cluster. If p is a core point then the procedure develops a cluster. If p is a border point, no other points are density reachable from the reference point p, thus the algorithm starts to analyze other points in the data set. The parameter  and MinPoints are estimated automatically, based on a \sorted k-distance plot".39 Finally each point in the data set is labeled with a speci¯c cluster identi¯er. The description of the ExpandCluster is beyond the scope of the current paper, therefore, for further details please see Ref. 4.

1553003-13

S. Vajda, T. Pl€ otz & G. A. Fink

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Finally, we map the original data (see Fig. 4(b)) into this newly created cluster space generated by DBSCAN. Each original data point is labeled accordingly to the nearest unit in the GNG modeling (see Fig. 4(c)). To reproduce the same topological structure of the mind map document a graph representation of the mind map is generated (see Fig. 3). Text and circle elements are treated as nodes and lines. Arrows are treated as edges. There is no prior knowledge of which nodes are to be connected by a line, an estimation is necessary. Line components are determined through intersection of the parametric line with neighboring components. 6. Word Recognition The goal of the word recognition stage36 is the transcription of the text patches extracted during layout analysis, and thus the recovery of the handwritten labels of the nodes within the mind map documents. As quite commonly and successfully used in the ¯eld of HWR, we apply a segmentation-free recognizer based on HMMs which is developed using the methods and tools provided by the ESMERALDA framework.8 In order to train a general purpose handwriting recognizer large amounts of handwriting data need to be available. Therefore, it was clear from the beginning that for our task the HWR model would need to be trained on out-of-domain data. Though this seems to be technically quite straightforward, an important prerequisite of this approach is that from images of handwritten script of potentially di®erent sizes and pixel resolutions compatible feature representations are obtained. This is true for the geometrical feature representation that we use in our work on HWR,7,52 which was inspired by the feature set proposed by Marti and Bunke.27 The most important aspect enabling this transferability of feature representations between handwriting data of di®erent resolutions is a quite robust normalization of the apparent size of the writing based on an estimation of the average distance between local minima of the script's contour25 which is closely related to an estimate of the average character width. However, using a compatible feature representation alone does not ensure that recognition models can be transferred successfully between domains. In our ¯rst experiments on the recognition of handwriting in camera-captured mind map images,47 the recognition model was trained on scanned document images, i.e. quite clear and high quality data.c This model performed rather poorly on the actual task data, a fact that motivated the following modi¯cations to the training of the recognition model. 6.1. Out-of-domain data Instead of using handwriting material written on paper, and scanned later on to train the recognizer, we decided to investigate the use of data which was also written on a c The

training data was taken from the IAM database of handwritten English sentences.28 1553003-14

Camera-Based Whiteboard Reading for Understanding Mind Maps

whiteboard, and thus might show similar variations in writing style to the mind map documents. Such material is available from the IAM Online Handwriting Database.21 Unfortunately, it consists of online data, i.e. pen trajectories recorded by a pen-tracking device. However, o²ine versions of the data can be rendered with characteristics quite similar to clean o²ine representations of data from the same source (cf. Refs. 23 and 35).

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

6.2. Context size reduction As the script images obtained from mind map documents frequently show artifacts caused by errors in the text extraction and patch grouping stages (see Fig. 6), we decided to reduce the size of the analysis windows extracted when serializing the text-patch images by the sliding-window method (cf. Ref. 37). In our previous work we obtained the best performance especially on scanned handwriting data with a width of 8 pixels. For less sensitivity to image noise we reduced this size to 4, which is the minimum required by our feature extraction method.d 6.3. Recognition models with reduced complexity Unfortunately, there is strong evidence that in the case of camera-based recognition of handwritten mind maps with a model trained on data rendered from pen trajectories — even though the handwriting material was also written on a whiteboard — there will be a signi¯cant mismatch between the characteristics of training and testing material. Especially in the speech recognition area such mismatch situations usually are tackled by model adaptation strategies. Unfortunately, such an approach again requires manually or automatically annotated adaptation material and, therefore, would reduce our test material to an insigni¯cant size. Therefore, we decided to explore the use of lower-complexity models instead of model adaptation. In our standard handwriting recognizer for Roman script we use semi-continuous HMMs with a codebook of 2k diagonal-covariance Gaussian densities and a set of 80 basic character models with Bakis topology.e In conjunction with the heuristic to initialize model lengths proportionally to the minimum length of the associated basic unit in the annotation of the training data, this con¯guration leads to a quite complex writing model with approximately 2400 model states and a similar number of mixture densities for modeling the respective outputs. In order to reduce the model complexity we investigated both the use of a reduced number of states per basic unit in conjunction with a restriction of the model topology to a linear onef and the reduction of the codebook size to 1k Gaussians only. Additionally, we investigated a kind of \early-stopping" technique, i.e. we used models after a d Among others, after image binarization orientations of contours within the analysis windows are estimated, which does not make sense with window widths below 4 pixels. e Bakis models allow self-transitions between states, transitions to the successor state, and the skipping of a single state within a linear sequence. f In

linear models, only self-transitions and transitions to successor states are allowed. 1553003-15

S. Vajda, T. Pl€ otz & G. A. Fink

limited number of re-estimation steps for recognition instead of the ones showing maximum performance on the validation set belonging to the database used for training. 7. Evaluation

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

This section is dedicated to the evaluation w.r.t. the di®erent system modules described in the previous sections. After the description of the data sets, results of the di®erent modules will be presented. 7.1. Data description The mind map collection consists of 30 photos taken from mind map drawings. 11 di®erent writers were asked to freely draw one mind map for each of the topics: (i) \holiday", (ii) \party" and (iii) \study" (one writer only sketched two and one writer sketched only one mind map). The writers were provided with a standard whiteboard marker set, containing four di®erent colors (black, blue, green, red) and a whiteboard eraser. Except for a basic set of words for each topic, which had to be used, and an obligation to add at least three other words to the mind map, there were no restrictions in the creativity producing these documents. After a writer had ¯nished his mind map, a photo of the whiteboard was taken with a digital camera set to a resolution of 2048  1536 pixels (see Fig. 2(a)). All images were annotated with respect to text, lines, circles/ellipses and arrows. A single annotation corresponds to a rectangle stating the bounding box of the particular graphical element. In the case of text patches the annotations were made on a word-basis. 19 documents were considered for training purpose, while the remaining 11 documents served as test documents. The reduced number of documents is due to the con¯dential aspect of such mind maps. Due to the reduced number of mind maps available for test purposes, some wellknown data sets were also considered. These data sets do not contribute to the evaluation framework, they serve only the purpose of training our handwriting recognizer. The IAM-OnDB21 is an online large sentence database. This online data was rendered in order to produce similar quality data as encountered in the mind maps. The IAM-DB is a large English o®-line database.28 A short summary of the data sets can be found in Table 1. 7.2. Separation of text and nontext items The overall classi¯cation accuracies for the shape set and the gradient set are 95.7% and 93.0%, respectively.46 It can be observed that both types of features are suitable to separate text and nontext items. However, the shape features set produces better results. For the sake of clarity, the further results will be limited only to this feature representation. The particular classi¯cation scores for each test document (11) can be seen in Table 2. The worst results reported for document id. 7 can be explained 1553003-16

Camera-Based Whiteboard Reading for Understanding Mind Maps Table 1. Overview of the data sets used in the experiments detailing the content, the size of the collection and the training/test ratio and the number of writers. Dataset MindMap IAM-OnDB IAM-DB

Content

Size

# Writers

Training/Test

Mind maps Online notes Scanned pages

30 doc. 1700 doc. 1539 doc.

11 221 657

19/11 doc. 5034 lines/div. sets 6161 lines/div. sets

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Table 2. Document speci¯c results of CC classi¯cation into text/nontext (shape set). Doc. Id.

Accuracy [%]

Doc. Id.

Accuracy [%]

1 3 5 7 9 11

97.6 97.7 95.6 82.3 96.8 97.0

2 4 6 8 10

92.2 97.3 94.8 96.0 96.4

with the fact that the quality of the document is not satisfactory and many items (lines, circles) were interconnected during the drawing process, hence huge CCs were extracted and analyzed which lead to signi¯cant errors. This particular document contains also some line structure not available in the other documents. While the text items are recognized with a high precision (98.5%), the arrows are often confused with lines. This confusion can be explained by the fact that just a few arrows are represented in our data set, and there is not much di®erence between lines and arrows. Similar problems can be encountered for circles, which can be erroneously confused with text items as, e.g. \o", \D". Vertical lines are often considered as text snippets, and the other way around. Small text components or letters like \i" or \l" are identi¯ed as being lines. Another type of error can be observed once lines touch circles or circles touch letters. In that situation, due to the nature of the method (CC base recognition), such components are usually miss-recognized (see Fig. 5). 7.3. Clustering of text patches For the evaluation of the proposed method, we use the method introduced in the context of the ICDAR 2005 Text Locating Competition.24 The evaluation scheme is based on precision and recall,5 deriving these measures from the bounding box coverage between the ground truth document and the analyzed one. The bounding boxes of the annotated ground truth T and the agglomerated text components E are compared — the larger the overlap of the bounding boxes, the higher the level of match. A match mp between two rectangles r; r 0 is de¯ned as the 1553003-17

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

S. Vajda, T. Pl€ otz & G. A. Fink

Fig. 5. Text patch grouping obtained for an exemplary mind map image.

quotient of their intersection area and their union area: mp ¼

Að\ðr; r 0 ÞÞ : Að[ðr; r 0 ÞÞ

ð1Þ

Having a binary answer to whether there is a ¯tting ground-truth rectangle to an estimated one or not would not cope with partial matches. This is why the quality for a single match mp in this case lies in the range of [0, 1]. In order to calculate these adapted versions of precision and recall, the best match between a rectangle within the agglomerations and all rectangles within the set of annotations is taken into

(a)

(d)

(b)

(e)

(c)

(f)

Fig. 6. Typical outcomes of the unsupervised clustering of elementary document items into text patches. (a) Erroneous agglomeration with other words, (b) erroneous agglomeration with graphics, (c) missing \na" in between, (d) missing \T", (e) successful agglomeration and (f) successful agglomeration. 1553003-18

Camera-Based Whiteboard Reading for Understanding Mind Maps

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Table 3. Precision, recall and F-measure for clustering each document from the test set. Doc. Id.

Precision

Recall

F-measure

1 2 3 4 5 6 7 8 9 10 11 Average

0.49 0.52 0.70 0.55 0.55 0.68 0.43 0.82 0.56 0.72 0.76 0.61

0.61 0.42 0.68 0.62 0.62 0.51 0.36 0.78 0.38 0.83 0.59 0.58

0.54 0.47 0.69 0.59 0.58 0.58 0.39 0.78 0.46 0.77 0.66 0.59

consideration and vice versa. The best match mðr; RÞ of a rectangle r within a set of other rectangles R is de¯ned as: mðr; RÞ ¼ maxfmp ðr; r 0 Þjr 0 2 Rg:

ð2Þ

The recall then is the quotient of the sum of the best matches of the ground truth among the agglomerated areas and the number of all annotated bounding boxes within the ground truth. P rt 2T mðrt ; EÞ recall ¼ : ð3Þ jT j The precision relates to the quotient of the sum of the best matches of the agglomerated areas among the annotated regions and the number of all agglomerated areas: P re 2E mðre ; T Þ precision ¼ : ð4Þ jEj We evaluated the output of the agglomeration (clustering) using both schemes described above. In Table 3, a detailed list can be found for each document extended also with the F-measure, directly inferred from the precision and recall.5 The worse score is produced by the document 7, which was identi¯ed as failure for the text separation case too. The low results can be explained by the fact that the clustering ends up in some huge components, which do not match anymore the items available in the ground truth document. The main error source for clustering in general is due to the nontext patches labeled as text, or minor text components ¯ltered out based on their size or their confusion with graphics, hence gaps (splitting words) or partial words will be detected. While in some cases, the agglomeration is successful, in some other cases it fails because of some CCs are recognized as nontext (e.g. R in \Relaxation" or g in 1553003-19

S. Vajda, T. Pl€ otz & G. A. Fink

\Booking" in Fig. 5), or due to some distances which leads to agglomeration or separation (see \Guests" in Fig. 5) of di®erent text items.

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

7.4. Word recognition In order to properly evaluate word recognition performance for the whiteboardreading task given only a quite limited amount of domain speci¯c documents, we decided to consider the text patches extracted from the whole set of whiteboard documents, i.e. the complete MindMap data set.g We built di®erent general handwriting recognizers on the IAM-DB and on rendered images of the online data provided by IAM-OnDB. The con¯guration of these HMM-based recognition models is similar to the ones used in our previous research (cf. Ref. 35). However, as explained in the previous section we systematically varied several meta-parameters of the models in order to investigate their e®ect on the generalization capabilities of the obtained recognition systems. Prior to applying the statistical writing model all text line or text patch images are subject to the usual pre-processing operations, namely skew and slant correction. Additionally, the apparent size of the writing is normalized such that the estimated distance between local contour minima is 25 pixels on average. Subsequently, a sliding window analysis framework is applied with window widths of 4 and 8 pixels. For each window an 18-dimensional feature representation consisting of nine geometrical features and their approximated temporal derivatives is computed. For modeling the appearance of the writing semi-continuous HMMs with either linear or a Bakis topology for a set of 80 basic character, punctuation and whitespace units with codebook sizes of 1024 or 2048 densities were estimated using the Baum–Welch algorithm. Training was performed for 10 iterations only as informal experiments showed that models trained in order to maximize performance on the respective validation sets — i.e. for 15 to 20 iterations — showed signi¯cantly reduced generalization capabilities. Decoding of the model was performed using Viterbi beamsearch. In order to account to some extent for potential text-extraction artifacts and noise in the text-patch images, the recognition model consists of a basic lexicon (which includes a whitespace model) and an additional \garbage" model de¯ned as an arbitrary sequence of elementary character models. We used two di®erent lexica. A small task-speci¯c one consisting of only the words found in the ground truth annotation of the MindMap data set (totaling in 183 words) and an extended one which also contains all putative content words heuristically chosenh from the training data of IAM-OnDB (1804 words). g As the training of the recognition model is performed on out-of-domain data, also in this con¯guration the test is guaranteed to be writer independent. h Putative content words were required to be at least ¯ve characters long and to occur at least two times in the training data of IAM-OnDB.

1553003-20

Camera-Based Whiteboard Reading for Understanding Mind Maps Table 4. Overview of the evaluation results obtained: Model con¯guration (training data used, topology of basic models, number of model states, codebook and feature window size) and resulting word error rate in percent (right columns). Model con¯guration

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Training IAM-DB IAM-DB IAM-OnDB IAM-OnDB IAM-OnDB IAM-OnDB IAM-OnDB

Task lexicon

Topology

# States

# Densities

Bakis Linear Linear Bakis Bakis Linear Linear

2406 1374 1296 2402 2402 1296 1296

2k 2k 2k 2k 1k 2k 1k

Window 8 8 8 4 4 4 4

px px px px px px px

184

1.8k

65.9 62.9 52.7 40.0 49.7 34.9 35.4

65.9 65.9 58.5 48.7 49.7 42.9 42.0

On the complete MindMap data set 758 text patches are hypothesized. Evaluation results are however only reported for those 353 patches which were labeled as being readable in the ground truth annotation, i.e. which contained not solely erroneously detected graphical items or complete partial mind map images. The recognition hypotheses obtained were ¯ltered such that occurrences of \garbage" hypotheses were discarded. The recognition results obtained for di®erent con¯gurations of the writing model and the two sizes of the task lexicon used are summarized in Table 4. It can clearly be seen that training on high-quality data as provided by the IAM-DB will only produce quite poor results on the MindMap data. The performance of the writing model can be improved signi¯cantly,i when using rendered online whiteboard documents for training instead. Further signi¯cant improvements are possible with the reduction of the size of the analysis window to only 4 pixels making the feature representation less vulnerable to noise. The best results for the task are obtained with a model based on linear HMMs, which only uses approximately half the number of model states.

8. Conclusion We developed a camera-based whiteboard reading system, which particularly addresses the analysis of hand-drawn mind maps. Mind maps' spatial arrangements of handwritten ideas in graph-like formations are important means for, e.g. structuring the results of brainstorming sessions as they are typically held in creative thinking and problem solving processes. Recognizing mind maps from whiteboard images is relevant since it generates digital representations of such hand-drawn documents, which allows for storage and retrieval, i.e. digital asset management. The technical contributions of this paper consist of: (i) a text detection procedure, which automatically extracts handwriting from whiteboard images using a statistical classi¯er that has been trained on shape features extracted from CCs, thereby avoiding excessive use of thresholds; (ii) a novel approach for unsupervised layout i An

absolute reduction of the error rate of approximately 5% is signi¯cant at a level of 95%. 1553003-21

S. Vajda, T. Pl€ otz & G. A. Fink

analysis that recovers the graph-like spatial arrangements of ideas captured by mind maps using clustering techniques; and (iii) unconstrained HWR using HMM-based recognizers and in particular focusing on parameter estimation procedures that use out-of-domain sample data for e®ective system bootstrapping. We evaluated the developed system in an experimental evaluation on unconstrained mind map data. The achieved results are very promising for the envisioned application of camerabased mind map reading, for example, to automate corporate document work °ows with respect to meeting summarization.

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Acknowledgments This work has been supported by the German Research Foundation (DFG) within project Fi799/3. The authors would also like to thank Leonard Rothacker for his valuable support in the implementation of the whiteboard reading system. References 1. H. Bunke, Recognition of cursive roman handwriting — Past, present and future, Int. Conf. Document Analysis and Recognition, Edinburgh (2003), pp. 1–12. 2. T. Buzan, Business Mind Mapping (Ueberreuter, 1999). 3. S. Chowdhury, S. Mandal, A. Das and B. Chanda, Segmentation of text and graphics from document images, Int. Conf. Document Analysis and Recognition, Curitiba, Brazil (2007), pp. 619–623. 4. H.-P. Ester, M. Kriegel, J. Sander and X Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Int. Conf. Knowledge Discovery and Data Mining, Portland, Oregon (1996), pp. 226–231. 5. T. Fawcett, An introduction to ROC analysis, Pattern Recogn. Lett. 27 (2006) 861–874. 6. G. A. Fink, M. Wienecke and G. Sagerer, Video-based on-line handwriting recognition, Int. Conf. Document Analysis and Recognition, Seattle IEEE (2001), pp. 226–230. 7. G. A. Fink and T. Pl€otz, On appearance-based feature extraction methods for writerindependent handwritten text recognitionm, Eighth Int. Conf. Document Analysis and Recognition (ICDAR 2005), Seoul, Korea, 29 August–1 September 2005, pp. 1070–1074. 8. G. A. Fink and T. Pl€otz, Developing pattern recognition systems based on Markov models: The ESMERALDA framework, Pattern Recogn. Image Anal. 18(2) (2008) 207– 215. 9. L. A. Fletcher and R Kasturi, A robust algorithm for text string separation from mixed text/graphics images, IEEE Trans. Pattern Anal. Mach. Intell. 10(6) (1988) 910–918. 10. B. Fritzke, A growing neural gas network learns topologies, in Advances in Neural Information Processing Systems 7 [NIPS 1994], eds. G. Tesauro, D. S. Touretzky and T. K. Leen (MIT Press, 1995), pp. 625–632. 11. H. Fujisawa, Robustness design of industrial strength recognition systems, in Digital Document Processing: Major Diretions and Recent Advances, eds. B. B. Chaudhuri (Springer, 2007), pp. 185–212. 12. H. Fujisawa, Forty years of research in character and document recognition — An industrial perspective, Pattern Recognition 41(8) (2008) 2435–2446. 13. R. Gao, F. Shafait, S. Uchida and Y. Feng, A hierarchical visual saliency model for character detection in natural scenes, Camera-Based Document Analysis and Recognition — 5th Int. Workshop, CBDAR 2013, Washington, DC, USA, 23 August, 2013, pp. 18–29. 1553003-22

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

Camera-Based Whiteboard Reading for Understanding Mind Maps

14. S. Gao, C. Wang, B. Xiao, C. Shi, Y. Zhang, Z. Lv and Y. Shi, Adaptive scene text detection based on transferring adaboost, 2013 12th Int. Conf. Document Analysis and Recognition, Washington, DC, USA, 25–28 August, 2013, pp. 388–392. 15. R. C. Gonzalez and R. E. Woods, Digital Image Processing (Pearson/Prentice Hall, 2008). 16. A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke and J. Schmidhuber, A novel connectionist system for improved unconstrained handwriting recognition, IEEE Trans. Pattern Anal. Mach. Intell. 31(5) (2009) 855–868. 17. L. Gomez i Bigorda and D. Karatzas, Multi-script text extraction from natural scenes. 2013 12th Int. Conf. Document Analysis and Recognition, Washington, DC, USA, 25–28 August, 2013, pp. 467–471. 18. T. Kohonen, J. Hynninen, J. Kangas and J. Laaksonen, Som pak: The self-organizing map program package, Report (1996). 19. J. Liang, D. S. Doermann and H. Li, Camera-based analysis of text and documents: A survey, Int. J. Doc. Anal. Recogn. 7(2–3) (2005) 84–104. 20. M. Liwicki and H. Bunke, Handwriting recognition of whiteboard notes, in Proc. 12th Conf. Int. Graphonomics Society (2005), pp. 118–122. 21. M. Liwicki and H. Bunke, IAM-OnDB — An on-line English sentence database acquired from handwritten text on a whiteboard, Int. Conf. Document Analysis and Recognition, Seoul, Korea (2005), pp. 956–961. 22. M. Liwicki, O. Rostanin, S. Mohamed El-Neklawy and A. Dengel, Touch & write: A multi-touch table with pen-input, in Document Analysis Systems (2010), pp. 479–484. 23. M. Liwicki and H. Bunke, Combining on-line and o®-line systems for handwriting recognition, Int. Conf. Document Analysis and Recognition, Curitiba, Brazil (2007), pp. 372–376. 24. S. M. Lucas, Text locating competition results, Int. Conf. Document Analysis and Recognition, Seoul, Korea (2005), pp. 80–85. 25. S. Madhvanath, G. Kim and V. Govindaraju, Chaincode contour processing for handwritten word recognition, IEEE Trans. Pattern Anal. Mach. Intell. 21(9) (1999) 928–932. 26. S. Marinai, E. Marino and G. Soda, Self-organizing maps for clustering in document image analysis, in Machine Learning in Document Analysis and Recognition, eds. S. Marinai and H. Fujisawa, Studies in Computational Intelligence, Vol. 90 (Springer, 2008), pp. 193–219. 27. U.-V. Marti and H. Bunke, Handwritten sentence recognition, Int. Conf. Pattern Recognition, Barcelona, Vol. 3 (2000), pp. 467–470. 28. U.-V. Marti and H. Bunke, The IAM-database: An english sentence database for o²ine handwriting recognition, Int. J. Doc. Anal. Recogn. 5(1) (2002) 39–46. 29. J. Matas, O. Chum, M. Urban and T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image Vis. Comput. 22(10) (2004) 761–767. 30. A. F. Mollag, S. Basu, M. Nasipuri and D. K. Basu, Text/graphics separation for business card images for mobile devices, Int. Workshop on Graphics Recognition, La Rochelle, France (2009), pp. 263–270. 31. A. M. Namboodiri and A. K. Jain, Document structure and layout analysis, in Digital Document Processing: Major Directions and Recent Advances, eds. B. B. Chaudhuri (Springer 2007), pp. 29–48. 32. W. Niblack, An Introduction to Digital Image Processing (Prentice Hall, 1986). 33. T. Q. Phan, P. Shivakumara and C. L. Tan, Text detection in natural scenes using gradient vector °ow-guided symmetry, in Proc. 21st Int. Conf. Pattern Recognition, ICPR 2012, Tsukuba, Japan, 11–15 November, 2012, pp. 3296–3299.

1553003-23

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

S. Vajda, T. Pl€ otz & G. A. Fink

34. R. Plamondon and S. N. Srihari, On-line and o®-line handwriting recognition: A comprehensive survey, IEEE Trans. Pattern Anal. Mach. Intell. 22(1) (2000) 63–84. 35. T. Pl€otz, C. Thurau and G. A. Fink, Camera-based whiteboard reading: New approaches to a challenging task, in Proc. Int. Conf. Frontiers in Handwriting Recognition, Montreal (2008), pp. 385–390. 36. T. Pl€otz and G. A. Fink, Markov models for o²ine handwriting recognition: A survey, Int. J. Doc. Anal. Recogn. 12(4) (2009) 269–298. 37. T. Pl€otz and G. A. Fink, Markov Models for Handwriting Recognition, Springer Briefs in Computer Science (Springer, 2011). 38. P. P. Roy, U Pal and J. Lladós, Seal detection and recognition: An approach for document indexing, Int. Conf. Document Analysis and Recognition, Barcelona (2009), pp. 101–105. 39. J. Sander, M. Ester, H.-P. Kriegel and X. Xu, Density-based clustering in spatial databases: The algorithm gdbscan and its applications, Data Min. Knowl. Discov. 2 (1998) 169–194. 40. E. Saund, Bringing the marks on a whiteboard to electronic life, in Proc. Second Int. Workshop on Cooperative Buildings, Integrating Information, Organization, and Architecture, CoBuild'99, London, UK (Springer-Verlag, 1999), pp. 69–78. 41. J. Sauvola, T. Seppanen, S. Haapakoski and M. Pietikainen, Adaptive document binarization, Int. Conf. Document Analysis and Recognition, Ulm, Germany, Vol. 1, 18–20 August 1997, pp. 147–152. 42. Q. Sta®ord-Fraser and P. Robinson, Brightboard: A video-augmented environment, CHI (1996), pp. 134–141. 43. Y. Terada, R. Huang, Y. Feng and S. Uchida, On the possibility of structure learningbased scene character detector, 2013 12th Int. Conf. on Document Analysis and Recognition, Washington, DC, USA, 25–28 August, 2013, pp. 472–476. 44. K. Tombre and B. Lamiroy, Pattern recognition methods for querying and browsing technical documentation, in Iberoamerican Congress on Pattern Recognition (2008), pp. 504–518. 45. K. Tombre, S. Tabbone, L. Pelissier, B. Lamiroy and P. Dosch, Text/graphics separation revisited, in eds. D. P. Lopresti, J. Hu and R. S. Kashi, Document Analysis Systems, Lecture Notes in Computer Science, Vol. 2423 (Springer, 2002), pp. 200–211. 46. S. Vajda, T. Ramforth, T. Pl€otz and G. A. Fink, Camera-based analysis of whiteboard notes, 3rd Int. Workshop on Camera-Based Document Analysis and Recognition, Barcelona, Spain (2009), pp. 42–49. 47. S. Vajda, L. Rothacker and G. A. Fink, A camera-based interactive whiteboard reading system, Int. Workshop on Camera-Based Document Analysis and Recognition, Beijing, China (2011), pp. 91–96. 48. S. Vajda, L. Rothacker and G. A. Fink, A method for camera-based interactive whiteboard reading, in Camera-Based Document Analysis and Recognition — 4th International Workshop, CBDAR 2011, Beijing, China, 22 September, 2011, Revised Selected Papers, Lecture Notes in Computer Science, Vol. 7139 (Springer, 2012), pp. 112–125. 49. S. Vajda, T. Pl€otz and G. A. Fink, Layout analysis for camera-based whiteboard notes, J. Univ. Comput. Sci. 15(18) (2009) 3307–3324. 50. S. Vajda, K. Roy, U. Pal, B. B. Chaudhuri and A. Belaid, Automation of Indian postal documents written in Bangla and English, Int. J. Pattern Recogn. Artif. Intell. 23(8) (2009) 1599–1632. 51. J. van Beusekom, D. Keysers, F. Shafait and T. M. Breuel, Example-based logical labeling of document title page images, Int. Conf. Document Analysis and Recognition, Curitiba, Brazil, Vol. 2 (2007), pp. 919–923. 1553003-24

Camera-Based Whiteboard Reading for Understanding Mind Maps

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

52. M. Wienecke, G. A. Fink and G. Sagerer, Toward automatic video-based whiteboard reading, Int. J. Doc. Anal. Recogn. 7(2–3) (2005) 188–200. 53. M. Wienecke, G. A. Fink and G. Sagerer, Towards automatic video-based whiteboard reading, Int. Conf. Document Analysis and Recognition, Edinburgh (2003), pp. 87–91. 54. Q. Ye and D. S. Doermann, Scene text detection via integrated discrimination of component appearance and consensus, in Camera-Based Document Analysis and Recognition — 5th Int. Workshop, CBDAR 2013, Washington, DC, USA, 23 August, 2013, pp. 47–59. 55. C. Zeng, W. Jia and X. He, An algorithm for colour-based natural scene text segmentation, in Camera-Based Document Analysis and Recognition — 4th Int. Workshop, CBDAR 2011, Beijing, China, 22 September, 2011, pp. 58–68.

Szil ard Vajda received his B.Sc. degree in Computer Science from BabesBolyai University, Kolozsvar, Romania in 1999. In the same year, he joined the Faculty of Computer Science, E€otv€os Lórand University, Budapest, Hungary, as a Research Assistant. After some years of research and teaching duties, in 2002 he joined the READ research group led by Professor A. Belaïd at Loria Research Center in Nancy, France. In 2008, he was awarded a Ph.D. degree in Computer Science from Henri Poincare University, Nancy, France. Between 2006 and 2009, he joined the Furukawa Electric Institute of Technology in Budapest, Hungary to work on di®erent pattern recognition topics for the automotive industry. From 2009 to 2012, he was a research fellow (postdoc) in the Intelligent Systems group at TU Dortmund University, Dortmund, Germany. Since 2012 he has been with the National Library of Medicine, Bethesda, MD, USA, where he conducts research in medical imaging, face detection and skin detection. His main research interests are handwriting recognition, document layout analysis, recognition of historical documents, medical imaging and the application of stochastic and neural strategies to di®erent pattern recognition tasks.

Thomas Pl€ otz received his Diploma in Technical Computer Science (M. Eng. equivalent) from the University of Cooperative Education, Mosbach, Germany, in 1998. He received his Diploma (M.Sc. equivalent) and his Ph.D. (Dr.-Ing.) in Computer Science from the University of Bielefeld, Germany, in 2001 and 2005, respectively. Dr. Pl€otz held post-doc positions at TU Dortmund University, Germany and Newcastle University, UK. In 2011 he was Visiting Research Fellow at the Georgia Institute of Technology, Atlanta, USA. Since 2012 he is faculty at Newcastle University's School of Computing Science in Newcastle upon Tyne UK (currently Senior Lecturer, i.e. Associate Professor). Dr. Pl€otz is interested in general aspects of machine learning and pattern recognition techniques with speci¯c focus on real-world applications in various domains such as ubiquitous and wearable computing for behaviour analysis, speech-processing, automatic recognition of handwritten script, or image processing.

1553003-25

Int. J. Patt. Recogn. Artif. Intell. Downloaded from www.worldscientific.com by Dr. Szilard Vajda on 04/07/15. For personal use only.

S. Vajda, T. Pl€ otz & G. A. Fink Gernot A. Fink received his Master's degree (Diplom) in Computer Science from the University of Erlangen-Nuremberg, Erlangen, Germany, in 1991 and his Ph.D. (Dr.-Ing.), also in Computer Science, from Bielefeld University, Bielefeld, Germany, in 1995. In 2002, he received the venia legendi (Habilitation) in Applied Computer Science from the Faculty of Technology of Bielefeld University. From 1991 to 2005, he was with the Applied Computer Science Group at the Faculty of Technology of Bielefeld University. Since 2005, he has been a Professor for Pattern Recognition in Embedded Systems in the Department of Computer Science of TU Dortmund University, Dortmund, Germany. His research interests lie in the development and application of statistical pattern recognition methods in the ¯elds of man-machine interaction, multimodal machine perception, computer vision, handwriting recognition, and document image analysis. He has published various papers in these ¯elds, and is the author of a book on the application of Markov models for pattern recognition purposes. Dr. Fink is a member of the German Association for Pattern Recognition (DAGM), of the International Association for Pattern Recognition (IAPR), and a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE). Since 2009, he has been serving on the Leadership Team of the IAPR Technical Committee 11 \Reading Systems".

1553003-26

Camera-Based Whiteboard Reading for Understanding ...

Mar 30, 2015 - by Phan et al.33 to group together possible candidates. ... formed on character level, which is either based on explicit segmentation (employing ...... In Table 3, a detailed list can be found for each document extended ..... S. Madhvanath, G. Kim and V. Govindaraju, Chaincode contour processing for hand-.

2MB Sizes 0 Downloads 142 Views

Recommend Documents

A Camera-Based Interactive Whiteboard Reading System
centers and grouping them into possible words by a density based clustering. Finally, the ... paper also presents a software tool integrating all these processing stages, allowing a ... sound results clean and well segmented data was necessary.

Read Online Reading for Understanding: How Reading ...
College Classrooms - eBooks Textbooks ... Secondary and College Classrooms, All Ebook Reading for Understanding: How .... P. David Pearson, University of California, Berkeley, and founding editor of the Handbook of Reading Research.

whiteboard pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. whiteboard pdf.

Collaborative Dimensional Modeling, from Whiteboard ...
[Ebook] p.d.f Agile Data Warehouse Design: ... The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence.

Download Understanding Pathophysiology, 6e Full online reading ...
Download Understanding Pathophysiology, 6e Full online reading. Ebook ... adult, and aging patients in. Book details. Author : Sue E. Huether. RN PhD q. Pages : 1160 pages q. Publisher ... into a broader health care context.Key terms are ...

Collaborative Dimensional Modeling, from Whiteboard ...
Read Agile Data Warehouse Design: Collaborative ... The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence.

Interactive Whiteboard - Presentation Slides.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Interactive ...

Jolly Phonics WhiteBoard Manual.pdf
Windows: Depending on the option selected at installation, either. double click the desktop shortcut icon or run it from the start menu. programs folder. Mac: For ...

Interactive Whiteboard - Presentation Slides.pdf
ActivStudio flipchart software. • semantic net software – planned but not done. • not planned: PowerPoint 2003. Evaluation: Questionnaire to IAW groups and.

Reading for sound and meaning 1 Reading for ... - WordPress.com
qualitatively similar to those of younger typically developing children at the same stage of reading acquisition. However, across several experiments, they found ...