CBIR for Image-Based Language Learning within ...

Viewer
Transcript

CBIR for Image-Based Language Learning within Mobile Environment O. Starostenko1, R. Contreras Gómez1, V. Alarcon-Aquino1, O. Sergiyenko2 1 Communications and Signal Processing Research Group, Computer Science Department, Universidad de las Américas Puebla, México {oleg.starostenko, renan.contrerasgz, vicente.alarcon}@udlap.mx 2 Applied Physics Department, Engineering Institute of Baja California Autonomous University, Mexicali, Mexico, E-mail: [email protected] Abstract This paper presents the analysis of emerging data exchange technologies used for integration of mobile devices to image-based learning process of second language. Particularly, the design of portable personal spaces providing mobile access to multimedia documents based on XML technologies and creation of generic interfaces for learning environments have been carried out. For image-based languages learning that implies the image processing, recognition, and retrieval Two Segment Turning function has been proposed and analyzed for possible adoption in applications assisted by mobile devices. The evaluation of designed prototype of image-based language learning system for interpretation of Japanese kanji and Mayan glyphs on mobile devices is discussed in this paper. Additionally the performance of proposed approaches used in content-based image retrieval is evaluated for their possible integration within mobile learning environments.

1. Introduction From a computational point of view a language learning environment may be defined as virtual space which consists of repositories with digitized information and a wide range of services for collaborative work, digital documents searching, management of distributed databases, multimedia data processing and retrieval [1]. In precise terms, the expansion of wireless networks, appearance of novel telecommunication technologies and new communicating facilities provide acceptance and integration of mobile devices into contemporary life as part of our social practice [2], [3]. Unfortunately a simple migration of well-known application to new mobile platform is not possible due to limited resources and capabilities of mobile devices. Therefore

the first objective of this paper is to evaluate and propose appropriate wireless infrastructures for data exchange in distributed environments. The online language learning deals with search, access, retrieval, and visualization of digital documents with text, images, audio, and video. Particularly, the learning image-based languages (Japanese, Mayan, etc.) implies indexing, recognition, and interpretation of symbols. Thus, the second objective of this paper is to design some algorithms for multimedia documents processing, recognition, and retrieval.

2. Infrastructure for mobile applications Actually, modern mobile devices have Web browsers and provide interactive content via WAP or HTTP. Therefore, a simplest way for development of language learning systems may be based on serverclient architecture adapting user interfaces to resources of particular mobile device. In order to provide users with Web pages well-suited to the limited screen, three solutions are known: methods to summarize Web pages, tool to convert HTML into WML (Wireless Markup Language), and languages to create Web interfaces which can be accessed from any device [4]. Fist two methods are more oriented to textual than to visual documents. The third approach exploits XMLbased markup languages to create generic interfaces that promote separation of interface description from application as well as a scalable interface design. The most representative language of this approach is UIML (User Interface Markup Language) that describes user interfaces in a device-independent manner; these descriptions are automatically mapped to the language used by a target device (Java, HTML, VoiceXML and WML) [5]. This approach is just one of many possible implementation of the wider Service Oriented Architectures (SOA) model. Another well-known approach is based on port-to-port architecture (PPA).

According to Warren and Bishop [6] in the server side of Java-based system, in certain situations, JRMP is much more useful and effective than simple systems based on the HTTP. The technologies based on J2ME platform are characterized by code execution on mobile terminal. The advantage of PPA is that there exist a lot of types of connectors either for JavaBeans, JMX, RMI, or Jini, making it possible to connect to running services without writing any server side code and receiving information to mobile stubs via a single port open to the internet. After evaluation of networking SOA- and PPA-based technologies it is possible to conclude that they are appropriate for design of language learning environments assisted by the mobile devices.

3. Portable Personal Spaces on SOA For user assistance in language learning applications the concept of personal spaces is usually refer to a virtual knowledge area through which users can manage resources and services, select relevant documents, and organize information according to their individual needs and preferences. Proposed and designed Portable Personal Spaces (PoPS) has highly scalable architecture that provides mobile access to resources and services exploiting XML via user interface [5]. PoPS consists of three components: generic personal space, converter, and interface generator (Fig. 1). The generic PoPS contains highlevel descriptions of user interface. These descriptions are written in device-independent manner using an XML-syntax. The converter transforms the interface descriptions into code, which is written in the language used by the target device (XHTML for PDAs or WML for WAP phones). It has two components: XSL stylesheets and an XSLT engine. The XSLT engine CONVERTER XSLT engine

INFORMATION RETRIEVAL SERVICES

XSL stylesheets

JavaBeans

transforms XML file into WML or XHTML according to the instructions provided by the XSL stylesheet. The interface generator (IG) contains JSP files to build interfaces. The JSP files use JavaBeans to perform search and update personal space configuration. In order to build interfaces for personal spaces IG receives user parameters and device specification for retrieval of user’s personal space configuration from database, for example, the latest chapters in language learning support application. When user performs a search, IG receives parameters and forwards them to the respective information retrieval service. The service sends back the search results, IG composes the appropriate interface to present them to user. This interface composition is made by wrapping the search results into code of the device language. Finally, the interfaces containing the searching results are sent to user. If users modify their individual personal space configuration, IG updates these changes into the database. Mobile users can access PoPS from designed WAP emulator or PDA using corresponding generic interfaces.

4. Image-based script recognition In case of learning the complex symbols-based languages the image processing has to be applied. Actually there are a lot of systems for automatic image recognition based on analysis of the low-level image characteristics, such as color, texture, or shape [7], [8]. The recent methods for global image description, indexing, and pattern recognition, such as Elasticity correspondence, Curvature scale space approach, Bsplines, chain case codes, Fourier and wavelets spectral descriptors, neural networks and fuzzy classifiers, statistical and predictive algorithms, etc. sometimes are too complex for real time applications [7], [9]. Data Base

WAP EMULATOR

JDBC

WML for WAP phones XML documents GENERIC PERSONAL SPACE

JSPs XHTML for PDAs

WEB SERVER INTERFACE GENERATOR

Figure 1. Context diagram of the designed Portable Personal Space

That is why the methods used in content based image retrieval (CBIR) area may be applied to imagebased script recognition in language learning environment. The novelty of this approach consists in manipulation of image characteristics instead of textual descriptions used in well-known translators. These approaches use multiple feature vectors which define such low-level image characteristics as image content. The well-known systems, which may be used as prototypes for a novel approaches, usually provide retrieval of images, graphics, and video data from online collections using low-level image features such as color, texture, or shape (QBIC, AMORE, Virage, SQUID) [8], [10]. The proposed method may be introduced as a combination of specific descriptors of symbol shape invariant to scale, rotation, and illumination. Traditionally, a shape is more informational feature described by a set of segments which may be extracted by any well-known method. However, this representation of a symbol is not a convenient form for calculation how similar is that symbol to another. Any changes in scale or its rotation makes comparison process too complex. That is why, Two-segment Turning Function (2STF) is proposed for computing similarity between two shapes. A shape of symbol is represented by a step function, the steps on x - axis represents the normalized length of each segment, and the y - axis represents the turn angle between two selected segments as it shown in Fig. 2. After rotation of selected segments number 1 it must coincide with x-axis. The segment 2 is translated to their intersection point that appears in origin of coordinate system. Finally, the angle θ between the xaxis and the segment 2 is computed providing invariance to rotation. 2STFs for Mayan glyph and Japanese kanji are presented in the Fig 3. The advantage of this approach is invariance to translation, scaling, reflection, and rotation. 2STF is built taking into account the relative position between segments (turning angle) and the relative length of each segment (relative to accumulative length of all segments in a shape). This allows getting the same representation for a set of symbols even though they are placed in different positions or has been reflected or rotated.

Figure 2. Arbitrary angles between segments.

Figure 3. 2STFs of Mayan and Japanese symbols The similarity between two shapes is evaluated by computing the differences between 2STFs. They are scaled to the same length, and then one curve is shifted to obtain minimum difference (shaded area that represents how similar two shapes are) as it shown in Fig. 4.

Figure 4. Matching strategy for two 2STFs The disadvantage of 2STF approach is significant time that it takes to find the best correspondence between two shapes. Experimentally, we determined that this time is some seconds for more than fiftysegment symbols. This time is more critical for Mayan glyphs than for kanji and it may be reduced by applying another matching strategy, for example, using Star Field shape matching [11]. After experiments with proposed approach we can conclude that it can recognize patterns with high level of variations in symbol representation, can work with noised image without reduction of recognition efficiency, and may reduce recognition error by variation of membership degree.

5. Image-based language learning system After analysis and test of some networking architectures used in language learning systems the SOA-based approach has been adopted due to its wide acceptance by developers. In order to operate with visual data the following services based on proposed 2STF approach have been designed: image processing, indexing, retrieval and interpretation. The block

diagram of the designed prototype for image-based language learning assisted by mobile devices is shown in Fig. 5. The input query on the mobile device (client) may be an image with symbol to translate or textual description of symbol meaning used for retrieval. The connection of client with Web server may be done using “wrappers”. However, another simple way has been applied. It does not require specific support for wrappers, and it uses queries as binary strings received by server which calls the services for different types of queries (visual or textual), recovers the results of searching, and returns to client the xml document with retrieved information. The visual query is processed by feature extraction module that defines the shape of a symbol by set of segments. The indexing and feature vector generation is provided by 2STF approach. Computing similarity between feature vectors of analyzed and reference symbol uses pre-processed image data base (DB). If some symbols with highest grade of similarity have been found in image DB the corresponding to these images descriptions are selected from descriptive name space (vocabulary) data base. The textual descriptions in the name space DB consist of keywords linking the previously pre-processed images to the symbol content. The separation of textual and visual data has some advantages for data access, management, registration and organization. This permits to speed up the matching process and reduces the number of iterations in searching process. Finally, the retrieved images with their interpretation are sent to generator of response which returns the xml document to mobile device. If a

Client

Query generation

query is a textual description, the visual presentation of symbol is retrieved from pre-processed image DB corresponding to that description. In general case of CBIR the image and name spaces DBs have specific organization. For analysis of Mayan symbols or kanji there is not clustering onto particular domains, the description of semantics is established by direct relationship between a set of segments defining a symbol shape and meaning of that symbol. This approach facilitates the implementation of useroriented vocabulary in name space DB and corresponding images in pre-processed image DB. The performance of system has been evaluated using the precision and recall metrics (traditionally used for CBIR systems) defined by the following equation: PRECISION = A , RECALL = A , B C where A is a set of relevant images retrieved by system, B is a set of relevant and irrelevant images retrieved by system for particular query, and C is a set of all relevant images in collection. The recall defines the proportion of relevant images in the entire database that are retrieved for comparison with a query. The precision is proportion of the retrieved images that are relevant to the query from the set of retrieved images. Some experiments have been done using on-line image collection with Mayan glyphs descriptors and Japanese kanji translators. Fig. 6 shows average recall and precision for image retrieval. The x-axis represents the number of tests with 20 queries in each one; y-axis shows the average recall and precision computed according to previous equation.

Host/Server

Wireless router

Connection to server

Connection with client

Document visualization

Generator of response 802.11b

802.3 Connection to visual data processing services Image & text

Retrieved symbol interpretation & description

Pre-processed images DB

Textual query Descriptive name space, vocabulary Similarity: visual & textual feature vectors

Visual query

Feature vector extraction (segments & shape) 2STF shape indexing

Figure 5. Block diagram of the image-based language learning system for mobile devices

0.8 0.6 0.4 0.2 0 1

3

5

7

9

11

a) Average Recall

13

15

17

19

week borders or complex background in image are not recommended in this application. Finally, we can conclude that the presented results could be considered as alternative way for the development of visual information retrieval facilities in wireless environments, particularly, for language learning application used image-based scripts assisted by mobile devices.

7. References

0.6 0.5 0.4

[1] L. Lockyer, S. Bennett, Handbook of Research on Learning Design and Learning Objects: Issues, Applications and Technologies, Publ. Inf. Science Reference, 2008

0.3 0.2 0.1 0 1

3

5

7

9

11

13

b) Average Precision

15

17

19

Figure 6. Average recall and precision When the number of images increases recall/precision also increases because there are more possibilities for applying the similarity metrics over a major number of shapes with the similar segment orientation. The average recall/precision values for this experiment lie in the intervals (0.65÷0.85)/(0.38÷0.52) respectively. The experiments show that proposed approach is rigorous enough and do not accept wide range of images as candidates to be retrieved.

6. Conclusions Considering that portable devices have limited resources and restricted processing and networking capabilities, the optimization of interfaces for data access have been developed using the portable personal spaces as highly scalable architecture that provide mobile access to digital collections using XML technologies. Either client/server or port-to-port networking technologies are appropriate for integration of mobile devices to distributed environments. The image-based language learning system assisted by mobile devices has been designed using proposed 2STF approach. They permit simple and fast image processing, recognition, and retrieval based on symbol segments/shape indexing. The disadvantages of the proposed 2STF are the presence of errors during spatial sampling and generation of image feature vectors as well as the required amount of system memory on server side. Significant occlusions between symbols,

[2] G. M. Chinnery Mobile Assisted Language Learning, Lang. Learn. & Technology J., Vol. 10, No.1, 2006, pp.9-16. [3] M. Sharples, J. Taylor & G. Vavoula, Towards a theory 2005, URL: of mobile learning, m-Learn, http://www.mlearn.org.za/CD/papers/Sharples- Theory of Mobile.pdf [4] P. Smith Networks Guide To Wap & Wml, Ed. Jaico Book House. (2002) [5] N. Castellanos, G. Rivera, Exploration of Network Technologies for Mobile Data Access in Digital Libraries, J. WSEAS Trans. on Com., Issue1, Vol. 3, 2004, pp.104-109. [6] N. Warren, P. Bishop, Taking Service-Oriented Architectures Mobile, Part 1: Thinking Mobile, URL: http://today.java.net/pub/a/today/2005/06/21/mobile1.html [7] R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, 2007. [8] L. Flores, O. Starostenko, Wavelets vs Shape-Based Approaches for Image Indexing and Retrieval, Book Novel Algorithms and Techniques in Telecom., Automation and Industrial Electronics, Sobh, T.,(Eds.), USA, Springer, 2008. [9] O. Starostenko, J. Rodríguez, Shape Indexing and Retrieval: a Hybrid Approach Using Ontological Descriptions, Book Innov.s and Advan. Tech.s in Systems, Comp. Sc. and Soft. Eng.., Elleithy K., (Ed.), Springer, 2008. [10] Q. Iqbal, Content Based Image REtrieval System, Comp and Vision Research Center, Univ. of Texas at Austin, 2007, URL: http://amazon.ece.utexas.edu/~qasim/research.htm [11] A. Chávez-Aragon, O. Starostenko, Star Fields: Improvements in Shape-Based Image Retrieval, J. Research on Comp. Science, IPN, Mexico, Vol. 27, 2007, pp.79-90.

CBIR for Image-Based Language Learning within ...

learning environment may be defined as virtual space which consists of ... information and a wide range of services for collaborative work ... expansion of wireless networks, appearance of novel ... part of our social practice [2], [3]. ... port open to the internet. ... The generic PoPS contains high- .... This permits to speed up the.

Download PDF

234KB Sizes 0 Downloads 167 Views

Report

CBIR for Image-Based Language Learning within ...

Recommend Documents