Leibniz University of Hannover Faculty of Computer Science Institute of Distributed Systems, Knowledge Based Systems

Master of Science Thesis

Building a Gateway from Text Editing in LATEX to RDF by

Peyman Nasirifard

Main advisor: Prof. Dr. Nicola Henze Second advisor: Prof. Dr. Wolfgang Nejdl

Hannover, October 2006

Abstract Semantic Web tries to help machines to understand concepts, their properties, and relations between them. In this case, machines are able to conclude and present new information using existing well-defined information. It is not just a dream and has been realized by aid of several technologies, languages and standards like Resource Description Framework (RDF) and Ontologies. LATEX, a free open source typesetting system based on TeX, has been designed originally for mathematicians and currently is being used by many students, professors and scientists for preparing their reports, papers and documents. Its package-based architecture makes it easily extensible. Gathering metadata from LATEX source file and generating RDF document from source file, according to document architecture and elements, will help end users to benefit from different query languages designed generally for RDF documents for accessing different parts of source file. In this Master of Science Thesis, several practical algorithms for transforming a LATEX document into RDF have been proposed. These algorithms use dynamic XSL templates for translating an XML document to RDF. These algorithms have been implemented in an application named “latex2rdf” which acts like a general LATEX to RDF converter. The solutions for solving visualization aspects of different parts of LATEX source file have been also addressed in this thesis.

I hereby announce that current Master of Science Thesis ”Building a Gateway from Text Editing in LATEX to RDF” has been done by myself and nobody and/or nothing helped me during this work except the references that I have explicitly mentioned in this thesis.

——————————— Peyman Nasirifard Hannover, October 20, 2006

Contents

Contents

i

1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . .

3 3 4

2 Semantic Web, XSL and Related Technologies 2.1 Semantic Web . . . . . . . . . . . . . . . . . . . . . 2.1.1 Metadata . . . . . . . . . . . . . . . . . . . 2.1.1.1 Dublin Core Metadata . . . . . . 2.1.2 Ontology . . . . . . . . . . . . . . . . . . . 2.1.3 Semantic Web Tower . . . . . . . . . . . . . 2.1.3.1 Unicode and URI Layer . . . . . . 2.1.3.2 XML and XML Schema Layer . . 2.1.3.3 RDF and RDF Schema Layer . . . 2.1.3.4 Ontology Vocabulary Layer . . . . 2.1.3.5 Logic Layer . . . . . . . . . . . . . 2.1.3.6 Proof Layer . . . . . . . . . . . . . 2.1.3.7 Trust Layer . . . . . . . . . . . . . 2.1.3.8 Digital Signature and Encryption 2.2 A Deeper Look at RDF . . . . . . . . . . . . . . . 2.2.1 RDF Model . . . . . . . . . . . . . . . . . . 2.2.2 RDF Graph . . . . . . . . . . . . . . . . . . 2.2.3 RDF Triples . . . . . . . . . . . . . . . . . 2.2.4 Validating RDF . . . . . . . . . . . . . . . . 2.2.5 Query Languages for RDF . . . . . . . . . . 2.2.5.1 RDQL . . . . . . . . . . . . . . . 2.2.5.2 SPARQL . . . . . . . . . . . . . . 2.3 XSL Family . . . . . . . . . . . . . . . . . . . . . . 2.3.1 XSLT . . . . . . . . . . . . . . . . . . . . . i

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

5 5 6 6 7 7 8 8 9 9 9 9 9 9 10 10 10 10 11 11 12 12 13 13

ii

CONTENTS

2.4

2.3.2 XSL-FO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 XPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . 15

3 LATEX and 3.1 LATEX 3.1.1 3.1.2

3.2 3.3

its Family . . . . . . . . . . . . . . . . . MiKTeX . . . . . . . . . . . . LATEX Documents in Different 3.1.2.1 LATEX to PDF . . . 3.1.2.2 LATEX to HTML . . BibTeX . . . . . . . . . . . . . . . . An Overview of LATEX Tools . . . . .

. . . . . . . . . . Formats . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

17 17 18 19 19 19 19 20

4 Extracting Metadata, Generating RDF and XSL-FO 4.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Extracting Metadata and Generating RDF from LATEX . . . 4.1.2 Querying Generated RDF . . . . . . . . . . . . . . . . . . . 4.1.3 Generating Human Understandable Format from LATEX . . 4.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Extracting Metadata and Generating RDF from LATEX . . . 4.2.1.1 Generating XML from LATEX . . . . . . . . . . . . 4.2.1.2 Transforming XML into RDF . . . . . . . . . . . . 4.2.1.3 Algorithm for Generating Dynamic XSL Templates 4.2.1.4 Algorithm for Generating ID . . . . . . . . . . . . 4.2.1.5 LATEX Document Ontology . . . . . . . . . . . . . 4.2.2 Generating Dynamic Queries from RDF . . . . . . . . . . . 4.2.3 Two Ways for Querying Generated RDF . . . . . . . . . . . 4.2.4 Generating XSL-FO from LATEX . . . . . . . . . . . . . . . 4.2.5 Generating Human Understandable Format from XSL-FO . 4.2.5.1 LATEX Off . . . . . . . . . . . . . . . . . . . . . . . 4.2.5.2 LATEX On . . . . . . . . . . . . . . . . . . . . . . . 4.3 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . .

23 23 23 23 24 24 25 25 27 27 28 28 33 34 34 34 36 36 43

5 Implementation 5.1 Methodology . . . . . . . . . . . 5.1.1 Iterations . . . . . . . . . 5.1.2 Timetable . . . . . . . . . 5.2 Queries . . . . . . . . . . . . . . 5.2.1 Dynamic Queries . . . . . 5.2.1.1 Element Queries 5.2.1.2 Numeric Queries 5.2.2 Static Queries . . . . . . .

45 45 46 47 47 48 48 48 48

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

iii

5.3 5.4 5.5 5.6

Graphical User Interface . . . . . . . . . . . . . . Output . . . . . . . . . . . . . . . . . . . . . . . Configuration . . . . . . . . . . . . . . . . . . . . Sequences . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Sequence of Generate RDF Use Case . . . 5.6.2 Sequence of Generate Query Use Case . 5.6.3 Sequence of Execute Query Use Case . . 5.6.4 Sequence of Generate XSL-FO Use Case . 5.6.5 Sequence of Generate PDF Use Case . . . 5.7 Source Code . . . . . . . . . . . . . . . . . . . . . 5.7.1 Packages and Classes . . . . . . . . . . . . 5.7.2 License . . . . . . . . . . . . . . . . . . . 5.7.3 Installation . . . . . . . . . . . . . . . . . 5.8 Lessons Learned . . . . . . . . . . . . . . . . . . 5.9 Main Tools . . . . . . . . . . . . . . . . . . . . . 5.9.1 Eclipse . . . . . . . . . . . . . . . . . . . . 5.9.2 Prot´eg´e . . . . . . . . . . . . . . . . . . . 5.9.3 Exchanger XML Editor . . . . . . . . . . 5.9.4 TeXnicCenter . . . . . . . . . . . . . . . . 5.10 Main Third Party Packages . . . . . . . . . . . . 5.10.1 Jena . . . . . . . . . . . . . . . . . . . . . 5.10.2 ARQ . . . . . . . . . . . . . . . . . . . . . 5.10.3 JDOM . . . . . . . . . . . . . . . . . . . . 5.10.4 Apache FOP . . . . . . . . . . . . . . . . 5.10.5 Xalan-Java . . . . . . . . . . . . . . . . . 5.11 Testing Solutions with an Example . . . . . . . . 5.11.1 Input File . . . . . . . . . . . . . . . . . . 5.11.2 Results . . . . . . . . . . . . . . . . . . . 5.11.2.1 A Deeper Look at One Element 5.11.2.2 Visualization of an Element . . . 5.12 Discussion and Conclusion . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 54 56 59 59 59 61 61 63 64 64 67 68 69 69 69 70 70 70 70 70 71 71 71 71 71 71 72 78 81 84

6 Summary

85

Bibliography

89

A General Example

95

B Description of the Attached CD-ROM

141

List of Symbols and Abbreviations

143

List of Figures

145

iv

List of Tables

CONTENTS

147

Acknowledgements

I would like to appreciate and thank Prof. Dr. Nicola Henze for her support and guidelines during this work. Besides being a nice professor and advisor at Leibniz University of Hannover, she is also a very kind friend to her students. I would also like to thank all people, who supported this work by providing free/open source tools and information.

1

Chapter 1

Introduction

1.1

Motivation

After the invention of printing press by Johannes Gutenberg, people always tried and still try to improve the quality of what they print. Many new devices and technologies have been developed for this purpose. After the birth of computers and the creation of electronic typesetting systems, the new effort for improving the quality of electronic typesetting systems happened. LATEX is an open source and extensible typesetting system with many useful features that are highly flexible. Nowadays, many students, professors and scientists use it to prepare a document, paper or even a book. Its extensibility by means of external packages which can be seen as plug-in makes LATEX a powerful typesetting system. There exist several LATEX compilers for different operating systems. Semantic Web tries to help machines to understand concepts, relations between concepts and their properties and process them. In this case, machines are able to conclude and present new information using existing information. Semantic Web is not Artificial Intelligence (AI), but it can be seen as a kind of intelligence in classical Web. The basic blocks of Semantic Web are metadata and ontologies. Metadata is simply data about data and ontologies define the vocabularies that is used in different domains. Metadata can be defined for every resource (existence or concept) and LATEX is not an exception. One way for presenting concepts, their properties, and their relationships with each other is defining them in a document using Resource Description Framework (RDF) language. RDF is the other building block of Semantic Web. In my Master of Science thesis, I am responsible for embedding LATEX into Semantic Web. In other words, building a gateway for transforming a plain LATEX document into RDF which is fully machine processable is the main task in my thesis. RDF should follow a LATEX document ontology. After generating 3

4

CHAPTER 1. INTRODUCTION

RDF, end users are able to query it and browse results. The other part of work is focused on human side. In this part, from a plain LATEX document, XSL-FO will be generated and finally it will be transformed into Portable Document Format (PDF) by means of third party packages.

1.2

Structure of this Thesis

Chapter 2 will give an introduction regarding Semantic Web, XSL and its related technologies. In chapter 3, I will cover LATEX typesetting system, BibTeX, and LATEX tools and packages. In chapter 4, the main problems of my thesis and my solutions, algorithms and approaches for solving them will be addressed. In chapter 5, implementation issues like methodology, timetable, and source code will be discussed and tools and packages that I used in my work will be introduced. Finally, chapter 6 is the summary of my thesis.

Chapter 2

Semantic Web, XSL and Related Technologies

In this chapter, basic technologies that have been used in this thesis will be introduced. I will define Semantic Web and its effect on current Web and how it can be used in order to improve the machine understandable part of current Web. I will also explain Semantic Web tower that has been offered by the World Wide Web Consortium (W3C) members. I will have a deeper look at RDF model, graph and triples and finally I will introduce XSL family. Regarding XSL family, I will cover XSLT, XSL-FO and XPath and finally with a conclusion I will finish this chapter.

2.1

Semantic Web

After the birth of Internet and World Wide Web (WWW), many efforts have been done and many technologies have been developed, in order to make the World Wide Web better, faster and more intelligent. One technology appeared after the other and proposals became standards in a short time. One of these efforts is Semantic Web. Semantic Web can be seen as an extension to current Web. Semantic Web is not Artificial Intelligence. Tim Berners-Lee, the creator of World Wide Web (WWW) and first hypertext-enabled browser, says [50] : ”The concept of machine-understandable documents does not imply some magical artificial intelligence which allows machines to comprehend human mumblings. It only indicates a machine’s ability to solve a well-defined problem by performing well-defined operations on existing well-defined data. Instead of asking machines to understand people’s language, it involves asking people to make the extra effort.” 5

6

CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES

In Semantic Web, with the help of other technologies, we try to help machines to understand concepts and relations between them, process them and present fast logical responses to queries. Semantic Web can assist the evolution of human knowledge as a whole [51]. For achieving this goal, we need several prerequisites which in following sections, I will provide an overview of these prerequisites.

2.1.1

Metadata

Metadata are data that describe data. Every existence or concept can have one or more metadata. As an example, my thesis has an author, title, supervisor etc. These are metadata about my thesis. We can simplify classification and querying the data by means of metadata. One of the most important metadata standards that is being used in Semantic Web projects is Dublin Core metadata standard. 2.1.1.1

Dublin Core Metadata

One of the most famous metadata standards in semantic Web is Dublin Core metadata [14]. The name Dublin Core comes from a city in USA (Dublin), where a workshop in computer science was held. Dublin Core metadata standard is an effective and small size of elements for describing a wide range of resources. The Dublin Core standard consists of two levels: Simple and Qualified. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements listed below: • Title: The name given to the resource [55]. • Creator: An entity primarily responsible for making the content of the resource [55]. • Subject: The topic of the content of the resource [55]. • Description: An account of the content of the resource [55]. • Publisher: The entity responsible for making the resource available [55]. • Contributer: An entity responsible for making contributions to the content of the resource [55]. • Date: A date associated with an event in the life cycle of the resource [55]. • Type: The nature or genre of the content of the resource [55]. • Format: The physical or digital manifestation of the resource [55]. • Identifier: An unambiguous reference to the resource within a given context [55].

2.1. SEMANTIC WEB

7

• Source: A reference to a resource from which the present resource is derived [55]. • Language: A language of the intellectual content of the resource [55]. • Relation: A reference to a related resource [55]. • Coverage: The extent or scope of the content of the resource [55]. • Rights: Information about rights held in and over the resource [55]. I will use some Dublin Core metadata elements in my work. For more information on Dublin Core metadata standard, refer to [14].

2.1.2

Ontology

One of the most important factors in success of Semantic Web depends on ontologies. An ontology is a collection of vocabularies for describing a specific domain. What is domain? It is a general word. Every existence or concept can be imagined as a domain. An ontology includes classes (or concepts) and properties that are related to a domain. Ontologies can be seen as machine understandable classification schemes. As an example, suppose a domain like book and you want to describe all vocabularies related to this domain. You may say: book has one or more authors; book has one or more chapters; book has one ISBN; book has one or more publishers; book has pages number; book has one or more editors and so on. According to human knowledge, many sentences can be built. These sentences in an XML-like shape will build an ontology. Till now, many people developed many different ontologies in different domains. In Semantic Web world, Wine ontology is very famous. Wine ontology is being used for introducing and teaching ontologies. It contains a relative complete classification of wine types. Wine ontology can be browsed at [58]. A small selection of OWL ontologies can be found at [1]. There exist some semantic search engines, like Swoogle [54], which can help us to search for ontologies and/or more things regarding Semantic Web. One important thing about ontologies is that they can be imported into other ontologies. For example, wine ontology imports food ontology. In food ontology, several common vocabularies about foods have been defined. A step-by-step guide for developing ontologies is [37]. For more information on ontologies, refer to [13]. For my thesis, I developed an ontology for LATEX document that I will describe it in following chapters.

2.1.3

Semantic Web Tower

In a classical view, we can build a tower or stack from technologies and concepts that are used in Semantic Web. The most famous tower (stack) of Semantic Web,

8

CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES

offered by the World Wide Web Consortium (W3C) [2] members, has seven levels or layers. Figure 2.1 [51] demonstrates this tower.

Figure 2.1: Semantic Web Tower In following sections, I will explain each layer of this stack. 2.1.3.1

Unicode and URI Layer

In underlying layer, we see Unicode and Uniform Resource Identifier (URI). The aim of this layer is to identify each existence or concept by assigning a unique ID to them. This ID can be meaningful or meaningless for human. For example an ID like AD43F53SDERF34JK can be imagined as a meaningless ID for human and an ID like math course is meaningful. The only restriction is identifying it uniquely. To understand the importance of this layer, suppose a town that all people have the same first name and surname. What a mess! Bob says to Bob: How is Bob? and Bob replies: Which Bob? Actually, without this layer, our tower will fall down. Better say, the lack of each layer will destroy the tower. 2.1.3.2

XML and XML Schema Layer

Extensible Markup Language (XML) is one of the main technologies and standards in current Web. With the help of XML, applications are able to integrate and interact with each other and speak together. In order to validate an XML file, we need to define a structure or schema and each XML that follows this schema is called a valid XML. XML is a very wide topic; For more information on XML, refer to [16].

2.1. SEMANTIC WEB

2.1.3.3

9

RDF and RDF Schema Layer

Resource Description Framework (RDF) in a language for describing resources, their metadata and relationships with other resources. RDF schema is a RDF file that describes vocabularies that we use in RDF for describing resources. I will have a deeper look at RDF in next section. 2.1.3.4

Ontology Vocabulary Layer

I explained briefly in previous section what ontologies are; But I did not say how they should be presented. For presenting ontologies, there exist several languages that help us to describe ontologies in an XML-like format and structure, such as Web Ontology Language (OWL) which has three different types: OWL Lite, OWL DL, and OWL full. Each type offers different features. In brief, it is easier to reason about OWL Lite than OWL DL and OWL DL than OWL Full. In other words, OWL Lite is subset of OWL DL and OWL DL is subset of OWL Full. According to [43], RDF documents will generally be in OWL Full, unless they are specifically constructed to be in OWL DL or Lite. For more information regarding OWL, refer to [43]. RDF can be also imagined as an ontology vocabulary language. 2.1.3.5

Logic Layer

Logic layer of Semantic Web stack is one of the most important layers in this tower. In this layer, logic statements using logic expressions (like NOT, AND, etc.) and first order logic will be defined. These rules actually model the system. 2.1.3.6

Proof Layer

In this layer, reasoning will happen. In previous layer, logic statements have been developed and in this layer, proof layer, the result of above rules will be produced. 2.1.3.7

Trust Layer

Trust layer plays an important rule in this architecture. This layer actually covers other layers and guarantees that the parties are trusted. 2.1.3.8

Digital Signature and Encryption

Digital signature and encryption can be used to make a more secure architecture. Without a robust security architecture, the tower will fall down. I explained in brief, what Semantic Web and so-called Semantic Web tower are. It was just an overview and brief introduction. For more information regarding Semantic Web tower, refer to [19] or [32].

10

2.2

CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES

A Deeper Look at RDF

In this section, I try to describe RDF a bit more, because most part of my thesis uses RDF. I will explain RDF graph and RDF model. I will also have a look at RDF query languages that I am going to use in my work. The first question is why we need RDF at all. Why not XML? Tim Berners-Lee has a nice answer for this question. According to Tim Berners-Lee’s comments [49], the mapping from XML documents to semantic graphs is many to one and we need also a schema to know what the mapping is. Therefore, generating a unique model sounds to be hard or impossible. The solution would be building another layer over XML in order to uniquely generate this semantic model and this layer is RDF layer. If you take a look at Semantic Web tower in previous section, you will see the right place of RDF layer.

2.2.1

RDF Model

RDF model is a phrase that is heard most times. What do we obtain from RDF model and why is it useful? RDF model is simply the semantic view of RDF. In other words, it is a mental view of RDF. For realization of RDF model, RDF graph can be generated. To my view, RDF model and RDF graph can be used with the same purpose.

2.2.2

RDF Graph

From a RDF document, a RDF graph can be built. Figure 2.2 demonstrates a simple RDF graph. This RDF graph is an excerpt of my thesis. According to this graph, we can build several simple sentences. Sentences like ‘‘The title of thesis is Building a Gateway from Text Editing in LATEX to RDF’’ or ‘‘The author of thesis is Peyman Nasirifard’’ can be built using this graph. We can build something called Triple from this graph. In next section, I explain a bit more about triples.

2.2.3

RDF Triples

RDF graph is a set of triples. These triples are composed of Subject, Predicate and Object. Subjects and predicates are identified by URI values, whereas an object can be another URI or a value. According to above graph, following triples in listing 2.1 can be extracted from RDF graph. Listing 2.1: Triples of Previous RDF Graph 1 2 3 4

( Thesis ( Thesis ( Thesis ( Thesis

, h a s T i t l e , B u i l d i n g a Gateway from Text E d i t i n g i n LaTeX t o RDF) , hasAuthor , Peyman N a s i r i f a r d ) , h a s A d v i s o r , P r o f . Dr . N i c o l a Henze ) , hasDate , October 2 0 0 6 )

2.2. A DEEPER LOOK AT RDF

11

Figure 2.2: A Simple RDF Graph

2.2.4

Validating RDF

A RDF document can be validated according to several standards and issues. Besides validity and well-formness of RDF that are inherited from its mother (XML), a RDF document can be examined whether all triples are valid or not. In other words, it is a validator of RDF model. It can be also validated, whether all resources exist in RDF document or some of them have been simply omitted. There exists many online and offline RDF validators around. Many XML editors, which support RDF, can also validate it. I used W3C online validator at [60] for my work. In this validator, after uploading RDF, the triples will be built server-side and will be shown in browser. I would say, it is not a very powerful validator, but it is able to build triples successfully and also check the validity of namespaces.

2.2.5

Query Languages for RDF

The power of a new data model should lie in the ability to access the data easily. RDF as a new data model should follow this rule. It is good if techniques that are used for one data model could be adapted for using within another models. One of the most famous query languages for relational data models is Structured Query Language (SQL). SQL can be easily extended for accessing RDF model too. The result of this extension is different RDF query languages like RDQL and SPARQL. In next section, I will take a look at these query languages that I will use them in my work.

12

CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES

2.2.5.1

RDQL

RDQL is a simple query language for RDF data model. Its syntax is very similar to SQL syntax and people, who know SQL, can simply learn RDQL. There exist many implementations, like Jena, for RDQL. For more information on RDQL, refer to [44]. Listing 2.2 shows a simple RDQL query. Listing 2.2: A Simple RDQL Query 1 2 3

SELECT ? x WHERE ( ? x ,

, )

This query will find all statements in the graph that have predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type and object http://example.com/someType. The variable ?x will be bound to the label of the subject resource. All such x are returned. Note that ? introduces variable, but it is not part of the variable.

2.2.5.2

SPARQL

SPARQL is another query language for RDF, designed by W3C [2]. It is also very similar to RDQL. There exist several differences between RDQL and SPARQL. For example, in different clauses of RDQL, “()” is used, whereas in SPARQL, “{}” is used. For more information on SPARQL, refer to [48]. Listing 2.3 shows a simple SPARQL query. Listing 2.3: A Simple SPARQL Query 1 2 3

PREFIX r d f s y n t a x : SELECT ? x WHERE {? x r d f s y n t a x : t y p e }

This query will find all statements in the graph that have predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type and object http://example.com/someType. The variable ?x will be bound to the label of the subject resource. All such x are returned. Note that ? introduces variable, but it is not part of the variable. This query does exactly what previous RDQL query does, but it is a bit different in syntax. Unfortunately, several well-known issues in SQL, like GROUP BY and aggregate functions such as SUM() and COUNT() are not available in RDQL and SPARQL specifications and to my view, that is one of the main shortages in RDQL and SPARQL. RDF is a very wide topic to discuss. For more information on RDF, its data model, query languages etc., refer to [44].

13

2.3. XSL FAMILY

2.3

XSL Family

XSL stands for eXtensible Stylesheet Language and is a family of recommendations for defining XML document transformation and presentation. With the help of XSL, we are able to access a specific element within an XML document and translate one XML document to another XML document. We are also able to work on visualization aspects of XML documents. In this section, I present a deeper look at XSL family languages and its related technologies. I will introduce three members of XSL family: XSLT, XSL-FO and XPath.

2.3.1

XSLT

Extensible Stylesheet Language Transformations (XSLT) is simply an XML language for transforming XML documents into another XML documents. The way it works is very straightforward: We define one or more XSL templates to translate a special XML structure into other formats. In next step, we employ a so called XSLT processor or engine. There exist many free open source XSLT processors for Java like Xalan-Java or SAXON. As a black box view, an XSLT processor has two inputs: The first input is the input XML file and the other is a list of XSL templates. XSLT processor applies templates to input XML file and generates a new (maybe XML) file. In this case, the input file would not change, but a new file will be generated. Figure 2.3 demonstrates a general overview of this process. For more information on XSLT, refer to [9].

Figure 2.3: Applying XSL templates to input XML file Listing 2.4 shows a sample XSL template. If we apply this XSL template on listing 2.5, the output will be listing 2.6. s Listing 2.4: A Simple XSL Template 1 2 3 4



14

5 6 7 8 9 10 11 12

CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES



Listing 2.5: Input XML 1 2 3 4

< t i t l e >B u i l d i n g a Gateway from Text E d i t i n g i n LaTeX t o RDF Peyman N a s i r i f a r d

1 2 3 4 5 6

B u i l d i n g a Gateway from Text E d i t i n g i n LaTeX t o RDF

Peyman N a s i r i f a r d



Listing 2.6: Output XML

2.3.2

XSL-FO

Extensible Stylesheet Language Formatting Objects (XSL-FO), the other member of XSL family is simply an XML language for document formatting. It contains both data and its formatting issues, like font family, font size, color etc. in only one document. For a better understanding, we can imagine XSL-FO as a combination of HTML and Cascading Style Sheet (CSS) in only one document. Like XSLT processors, there exist also so called XSL-FO processors. The task of XSL-FO processors is applying formatting issues to document and presenting a readable document to end users. For more information on XSL-FO, refer to [12]. Listing 2.7 shows a simple XSL-FO file. The element is the root element of XSL-FO documents. The element contains one or more page templates. In listing 2.7, it contains only one page template, named my-page. One or more elements describe the page contents. The master-reference attribute refers to the page templates that have been defined before in the . is composed of several and which are actually the contents of document. XSL-FO files can be simply transformed into other formats like HTML or PDF. Listing 2.7: A Simple XSL-FO File 1 2 3 4 5 6 7 8 9 10



2.4. DISCUSSION AND CONCLUSION

11 12 13 14 15

15

H e l l o , w o r l d !

2.3.3

XPath

XML Path Language (XPath) is a powerful non-XML language for addressing parts of an XML document. It joins with XSLT for addressing different parts of an XML document. XPath offers also many useful functions for testing nodes, working with strings, numbers and so on. Each XSLT processor should have an XPath engine, for getting the desired part of an XML document. Table 2.1 shows several XPath expressions and their results. For more information on XPath, refer to [9]. XPath Expression * /parent/child[1] //@att //element[@att]

Result Matches any element node Selects the first child element that is the child of the parent element Selects all attributes that are named att Selects all the element elements that have an attribute named att

Table 2.1: Several XPath Expressions and Their Results

2.4

Discussion and Conclusion

In this chapter, I explained technologies and standards regarding Semantic Web and XSL family that I am going to use in my thesis. I explained what Semantic Web is and why we use it and its effect on current Web. I explained the role of metadata in Semantic Web and specially Dublin Core metadata standards in my thesis. I described ontologies as one of success factors in Semantic Web. I covered Semantic Web tower (stack) and different technologies and standards that play critical rules in this game. I explained RDF, one of the main building blocks of Semantic Web, its model, graph and triples. I had a look at XSL and its family. XSLT has been explained and the way we access each element in XML documents using a non-XML language (XPATH) has been described. I had a look at XSL-FO and its structure. The role of XSLT and XSL-FO processors and their requirements have been depicted. I tried to support most parts with simple examples. In next chapter, I am going to explain LATEX and its family.

Chapter 3

LATEX and its Family

In this chapter, I will take a look at LATEX and its family members. I will explain what LATEX is and the advantages that it offers. The output of compiling a LATEX document can be presented in different formats. I will cover different ways for presenting the output of LATEX and the tools and packages that exist for this purpose. I will also have an overview on different LATEX tools (mostly LATEX converters) that I found during my thesis. I will also take a look at BibTeX and its structure and how it can be used for bibliographic purposes.

3.1

LATEX

In general, typesetting systems fall into two main groups; the first group is so called ”What You See Is What You Get” or simply WYSIWYG, and the other group is focused on separating view form content or better say this group is not WYSIWYG. In [42], it calls the second group as markup systems. Microsoft Word [35] and Open Office [39] are two examples of first group, whereas TEX is an example of second group. TEX is a typesetting system created by Donald Knuth at Stanford University. It is an extensible and portable typesetting system. For more information on TEX and its structure, refer to [42] or [56]. LATEX is a document preparation system for the TEX. LATEX is pronounced like ”Latesh” or ”Latech” and has been originally developed for mathematicians. LATEX is implemented as a TEX macro package. A macro package is a set of predefined commands. Nowadays, many scientists, researchers, professors and students use LATEX for preparing their documents and papers. I personally prepared this thesis with LATEX and I am feeling its advantages. It is comfortable and powerful and supports all features that I need. In [28], there exists a nice user-friendly 17

18

CHAPTER 3. LATEX AND ITS FAMILY

tutorial on LATEX; from getting started to handling graphics and errors. The only disadvantage of markup systems is that it costs a bit more time to manage rather than WYSIWYG systems. It has a startup time, that end users should consume in order to get familiar with its environment and different commands. As it is known, it is so called: ”Every start is hard”. I can say, in a system, if document complexity and size grow up, then LATEX acts very better than other typesetting systems like Microsoft Word. LATEX supports many typesettings. In [8] there exists a comprehensive reference on LATEX typesetting in mathematics, graphics and multilingual documents. Listing 3.1 shows a very simple LATEX document. Listing 3.1: A Simple LATEX Document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

28 29 30 31 32

% T h i s i s a sample LaTeX f i l e . % % A ’% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e a l l % and i s u s e d f o r comments l i k e t h i s one \ documentclass { a r t i c l e }

r e m a i n i n g t e x t on t h e l i n e ,

\ t i t l e {An Example Document} \ a u t h o r {Peyman N a s i r i f a r d } \ d a t e { September 1 5 , 2006}

% % % % %

S p e c i f i e s t h e document c l a s s The p r e a m b l e b e g i n s h e r e . D e c l a r e s t h e document ’ s t i t l e . D e c l a r e s t h e a u t h o r ’ s name . D e l e t i n g t h i s command p r o d u c e s today ’ s d a t e .

\ b e g i n { document }

% End o f p r e a m b l e and b e g i n n i n g o f t e x t .

\ maketitle

% Produces the

title .

T h i s i s an example document . \ section { First Section }

It

i s the

first

% Produces s e c t i o n heading . Lower−l e v e l % s e c t i o n s a r e begun w i t h s i m i l a r % \ s u b s e c t i o n and \ s u b s u b s e c t i o n commands . s e c t i o n o f my document . I can w r i t e e v e r y t h i n g t h a t I want .

\ s u b s e c t i o n { F i r s t Su bs ect i o n } % Produces s u b s e c t i o n heading . I t i s a sample s u b s e c t i o n . You can even have s u b s u b s e c t i o n i n your document . \ s e c t i o n { Second S e c t i o n } I t i s a n o t h e r s e c t i o n o f my document . I can put image , draw t a b l e , w r i t e math f o r m u l a s and much much more i n \LaTeX . % The \LaTeX command g e n e r a t e s t h e LaTeX l o g o . \ section { conclusion } I would l i k e t o s a y \LaTeX i s a v e r y n i c e t y p e s e t t i n g s y st e m . I e n j o y u s i n g \ end { document }

it .

% End o f document .

In Listing 3.1, several simple LATEX commands have been used. This document has only three sections and one subsection. It also contains useful comments for understanding different commands. There exist many tools called LATEX editors for free and commercial purposes. Note that LATEX documents are plain text documents; therefore, source files can be written in a simple text editor and it can be compiled with LATEX compiler.

3.1.1

MiKTeX

MiKTeX is simply a TEX implementation for the Windows platform. In other words, it is a compiler for LATEX and TEX documents under Windows. It supports

3.2. BIBTEX

19

almost all versions of Windows, from Windows 98 to Windows XP. It contains all necessary packages for compiling and visualizing LATEX documents. It also contains all necessary files for generating PDF from source file. It can be installed online or offline. In other words, end users are able to download all packages and install them offline or simply download an installer and installer can download desired packages from Web site. A complete MiKTeX installation may take several hours, depending on different systems and the number of packages that is going to be installed. For more information on MiKTeX, refer to [36].

3.1.2

LATEX Documents in Different Formats

As I said, the advantage of LATEX is separating content from view, therefore there exist many possibilities for view section. LATEX documents can be transformed into many other formats, like Portable Document Format (PDF) and HTML. For achieving this goal, there exist many useful tools that can be employed. 3.1.2.1

LATEX to PDF

PDF is currently one of the most common formats that many people prefer to use. Therefore, generating PDF from LATEX is one of the most common goals to achieve. There exist several tools like pdflatex for generating PDF from a plain LATEX document. Under Windows platform, pdflatex is part of MiKTeX project. 3.1.2.2

LATEX to HTML

Generating HTML from LATEX is another goal of LATEX users, specially those people, who want to publish their works on the Web. The latex2html project [33] and tex4ht project [33] are two most common tools for generating HTML from LATEX. Both of them are highly configurable. The tex4ht is even configurable for generating XML documents from LATEX, but this process is a bit complex.

3.2

BibTeX

BibTeX is a file format and also a program developed for LATEX environment. It is being used for preparing the bibliography and reference parts of a document. According to [28], BibTeX supports fourteen kinds of document: article, book, booklet, conference, inbook (part of a book), incollection (part of a book with its own title), manual, master thesis, misc, Ph.D. thesis, technical report, and unpublished. I personally feel the lack of an entry as a Web site. With the help of Persistent Uniform Resource Locator (PURL) which is actually an intermediate resolution service [38], we are able to assign static URLs to Web pages. I do hope that in future this and/or similar entries would be supported in BibTeX.

20

CHAPTER 3. LATEX AND ITS FAMILY

Someday, there were no official laws for doing e-business, but now there exist. For more information on BibTeX, refer to [28]. Listing 3.2 shows a simple bibliographic document. It contains an article and a book in BibTeX format. Listing 3.2: A Simple Bibliographic Document 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% ∗∗∗∗∗∗∗∗∗∗ B i b l i o g r a p h y ∗∗∗∗∗∗∗∗∗∗ % The n e x t i t e m i s an a r t i c l e . @ARTICLE{Sem2001 , AUTHOR = ”Tim B e r n e r s −Lee , and James H endl er , and Ora L a s s i l a ” , TITLE = ”The S e m a n t i c Web, A new form o f Web c o n t e n t t h a t i s m e a n i n g f u l t o c o m p u t e r s w i l l u n l e a s h a r e v o l u t i o n o f new possibilities ”, JOURNAL = ” S c i e n t i f i c American ” , YEAR = ”2001” , VOLUME = ”” , NUMBER = ”” , PAGES = ”” , MONTH = ”May” , NOTE = ”” } % The n e x t i t e m i s a book . @BOOK{ Ltx1994 , AUTHOR = ” L e s l i e Lamport ” , TITLE = ”LaTeX : A Document P r e p a r a t i o n System , User ’ s Guide and R e f e r e n c e Manual ” , PUBLISHER = ” Addison−Wesley ” , YEAR = ”1994” , VOLUME = ”” , SERIES = ”” , ADDRESS = ”” , EDITION = ” Second ” , MONTH = ”” , NOTE = ”” }

Due to simple structure of BibTeX, there are several efforts for transforming BibTeX into XML. There exists also some DTDs offered for BibTeX, like [33]. There exist even several efforts for transforming a BibTeX file into RDF format, like [59] and [34]. The former has been implemented using Java and the latter is a Perl script which has an online interface for demo purposes.

3.3

An Overview of LATEX Tools

During my thesis, I found many useful tools in different aspects of LATEX, mostly in converting a LATEX document to other formats. In this section, I present a survey or an overview of tools that I have found during my work. • JLatex: An editor for LATEX. For more information, refer to [25]. • JDVI: This tool enables us to view DVI files in a browser. It is a Java applet. For more information, refer to [52].

3.3. AN OVERVIEW OF LATEX TOOLS

21

• BibTeX2HTML: It is a BibTeX to HTML converter. For more information on it, refer to [24]. • GELLMU: Generalized Extensible LATEX-Like Markup (GELLMU) is LATEXlike markup to create documents in an easy plain text format that may be faithfully converted to high-powered documents marked up under SGML [57]. For more information, refer to [57]. • HEVEA: HEVEA is a quite complete and fast LATEX to HTML translator. HEVEA was written in Objective Caml. For more information, refer to [20]. • Hyperlatex: Hyperlatex is a set of macro definitions that allows users to write one document for two media and to have the output look good in both printed text and on the Web. For more information, refer to [40]. • LaTeX2HTML: It is a converter that was written in Perl for generating HTML documents from LATEX source files. For more information, refer to [29]. • TtH: It is a translator for TEX documents to HTML documents. It has also a commercial version with additional features called TtHgold. For more information, refer to [53]. • vulcanize: It is a very simple Perl script for converting LATEX documents to HTML. According to its documentation, it does not work very well with nested LATEX commands. For more information, refer to [30]

Chapter 4

Extracting Metadata, Generating RDF and XSL-FO

In this chapter, I will focus on theoretical aspects of my thesis. I will explain the motivations of my thesis and what exactly my thesis is. Problems and the relations between different parts of thesis will be explained. After explaining the problems, I will focus on solutions, proposals and algorithms that I have presented for solving the problems. I will describe the advantages and to somehow disadvantages of proposed algorithms. I will reason why I used a specific tool, technology or standard. After explaining the theoretical aspects, in next chapter, I will discuss implementation issues. I will start with the problems description.

4.1

Problems

Building a Gateway from Text Editing in LATEX to RDF and XSL-FO, the ability to query generated RDF, and visualize its content can be divided into several subproblems. In following sections, I will explain these subproblems.

4.1.1

Extracting Metadata and Generating RDF from LATEX

The main subproblem in my thesis is generating RDF from a plain LATEX document. In this part, a source document should be translated to a machine understandable format, i.e. RDF.

4.1.2

Querying Generated RDF

The main purpose of generating RDF is the ability to query it and getting reasonable responses. One of the problems to solve is the ability to query RDF in a user-friendly manner. 23

24 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

4.1.3

Generating Human Understandable Format from LATEX

After generating a machine understandable format, i.e. generating RDF, I will also generate a human-readable format from a plain LATEX document. This stage is mainly for human side and end users. In this case, end users are able to have an overview on different parts of source document.

4.2

Solutions

After browsing the problems that are actually the topics of my thesis, I will explain the solutions that solve above problems. There exists several subproblems that I mentioned above and the solutions can be also divided into several subsolutions. One of the first things that I did during my thesis was drawing a use case diagram to visualize the requirements of thesis. Figure 4.1 demonstrates use case diagram. Generating RDF document, generating queries, generating XSL-FO document, generating a user friendly format from XSL-FO (generating PDF), and executing queries against RDF are five main use cases of project.

Figure 4.1: Use Case Diagram

Table 4.1 shows the preconditions of each use case. Note that Generated XML is not an use case, but it is a precondition for Generate RDF and Generate XSL-FO use cases.

25

4.2. SOLUTIONS

Use case Generate RDF Generate Query Generate XSL-FO Generate PDF Execute Query

Precondition Generated XML Generate RDF Generated XML Generate XSL-FO Generate RDF (sometimes Generate Query)

Table 4.1: Preconditions of Use Cases

Extracting Metadata and Generating RDF from LATEX

4.2.1

I explained in previous chapters what RDF is. To my view, the main problem in my thesis was generating RDF from a plain LATEX document. For this purpose, there exist several approaches. The precondition of all approaches is that the user should be able to access all elements in LATEX source document. Considering this issue, one approach is developing a compiler or API for LATEX documents in order to access document elements. Unfortunately, there is no Java API available for LATEX documents. There exist such efforts for BibTeX, like javabib [26], that is actually a BibTeX parser written in Java. After considering all limitations and possibilities, due to time limitations of thesis, I omitted this approach. Generating a compiler or Java API for LATEX could be itself a nice thesis topic. Another approach is translating a LATEX document into a general more accessible format like XML and then translating XML to desired RDF. I chose this approach, due to accessibility of so called LATEX to XML converters. In a tree-like view, this section can be divided into two branches that I will explain in followings. Figure 4.2 shows the general overview of this approach.

Figure 4.2: Overall View of Generating RDF from LATEX Document As I explained, the main task of my thesis was generating RDF; therefore, I named my application latex2rdf, as it acts like a LATEX to RDF converter. 4.2.1.1

Generating XML from LATEX

For generating XML from LATEX document, I tried to find the best free tool available on net. This phase was very important, because I wanted to reduce the loss of metadata from LATEX document. Therefore, I focused on this section several weeks to select a good tool. I found several LATEX to XML converters which I introduce in followings:

26 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

Converter

Docs

Platform

Configuration

latex2xml

Good

Good structure of configuration

Tralics

Good

Independent (implemented in Java) Only Linux (possible under Windows)

Complex structure of configuration

Third-party Dependency No

Yes

Table 4.2: Comparison between latex2xml and Tralics

• latex2xml: It was the first tool that I found. It is the result of diploma work of three students at Berne University of Applied Sciences. latex2xml is a converter that transforms LATEX document into a definable XML structure. It is highly configurable by means of XML configuration files. It has a buildin compiler for parsing LATEX documents. They implemented latex2xml in Java, therefore it is platform independent. For more information regarding latex2xml and its documentation, refer to [4]. • Tralics: It is another LATEX to XML converter. It has been developed at ”The French national institute for research in computer science and control” using C++ and Perl programming languages. It operates only under Linux, but using Cygwin [11], it can be also employed under Windows. It has a comprehensive documentation. For more information regarding Tralics, refer to [31]. After comparing these two packages, I decided to use latex2xml for my thesis. Table 4.2 shows the comparison between these two packages. I should say latex2rdf is not dependable on latex2xml. Every “LATEX to XML” converter can be used in it. Figure 4.3 shows the main approach for producing XML using latex2xml and latex2rdf configurations. latex2xml is dependable on JDK 1.4.2, therefore the JDK 1.4.2 home folder should be determined in latex2rdf configuration file. For converting a LATEX document into XML, latex2rdf takes a look firstly at its configuration to find JDK home and then it invokes latex2xml. Note that latex2xml is not compatible with JDK 1.5. In the case of an unsuccessful transformation, a warnings file would be generated to prompt end users what the reasons were. Two most common problems are “unknown command” and “no output label”. The first problem means that there exist one or more LATEX commands in source file that have not been defined in configuration file. The solution is simply opening one of configuration files (LaTeXCommands.xml) and putting the omitted command in it. Putting a command into LaTeXCommands.xml has also several other parameters that have

4.2. SOLUTIONS

27

Figure 4.3: General Overview of Converting LATEX Document to XML

been addressed in latex2xml documentation. The second common problem (no output label) means that latex2xml does not know which tag should be used in XML file. The solution is simply opening one of configuration files (LabelAssociations.xml) and putting the label tag into it. More information is available in latex2xml documentation [5]. 4.2.1.2

Transforming XML into RDF

After generating XML, there should be a way to transform it into RDF. The most practical way is using XSLT. There exists a problem and that is the generated XML is dynamic and developing static XSL templates for transforming them into RDF is not extensible, comprehensive, logical, and it is really time-consuming. Therefore, I decided to generate dynamic XSL templates for transforming XML into RDF. These templates are dependable on XML structure. In next section, I present an algorithm that I designed for producing dynamic XSL templates. 4.2.1.3

Algorithm for Generating Dynamic XSL Templates

This algorithm is straightforward. Firstly, I generate all possible children of an element in XML using a recursive method. According to children of an element and also absolute path of different elements, by means of functionalities of XSL and XPath node testing, I generate dynamic XSL templates for all possible children of an element. Generating dynamic XSL templates is configurable by means of config file. For example, end users can determine to summarize RDF by omit-

28 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

ting some elements. At the end of next chapter, an example regarding this issue will be presented. Figure 4.4 demonstrates the flowchart of this algorithm. The generated RDF is based on an ontology that I have developed for LATEX documents. In following sections, I will explain this ontology too. 4.2.1.4

Algorithm for Generating ID

There exist two general approaches for generating IDs in RDF. One approach is employing XSLT processors for doing this and the other is generating ID by an XSL template. Most XSLT processors have a build-in ID generator for XML tags; but the problem is that the generated ID by these processors offers no semantic in it. For example an ID like A02B21C has no semantic and nothing can be extracted from it, but an ID like document1 bodymatter1 chapter1 section2 tabular1 emphasizes the first tabular of second section of first chapter of document. The second approach sounds much better than first one. The way this ID is generated is very simple and straightforward. Getting all parents, grandparents, parents of grandparents, etc. of an element from XML source and also getting their indices (orders) from source, combining them and adding a simple underline ( ) character between elements will lead to such an ID. Listing 4.1 shows an XSL template for generating unique ID in RDF document. Listing 4.1: XSL Template for Generating Unique ID in RDF 1 2 3 4 5 6 7 8 9



4.2.1.5

LATEX Document Ontology

As I explained in previous sections, ontologies are main blocks for enabling Semantic Web. For my thesis, I needed a reference for LATEX document elements. Therefore, I developed a LATEX document ontology. In following, I explain classes and properties of this ontology. LATEX document ontology has two main classes: LatexDocument which represents a LATEX document and DocumentElement which indicates document elements. Figure 4.5 demonstrates these two top classes of LATEX document ontology. In DocumentElement, there exist several categories and LATEX commands are sorted according to their relevant categories. Figure 4.6 shows the subclasses of DocumentElement.

4.2. SOLUTIONS

29

Figure 4.4: Flowchart of Proposed Algorithm for Generating RDF from XML by Means of Dynamic XSLT

30 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

Figure 4.5: Top Classes of LATEX Document Ontology

Figure 4.6: Subclasses of DocumentElement

Each class has its own subclasses. As an example, I present here subclasses of BeginEndCommand class and Sectioning class. Figure 4.7 demonstrates subclasses of BeginEndCommand class. Figure 4.8 demonstrates subclasses of Sectioning class. I try to explain each category: • BeginEndCommand: This class contains commands that are surrounded by \begin and \end. • Footnote: This class contains footnote command(s). • Formatting: This class contains formatting commands, like center or bold. • Links: This class contains links command(s). • lstinputlisting: This class contains listing command(s). • MathFormula: This class contains math formula(s). • Misc: This class contains misc commands that do not fit in other classes. • Sectioning: This class contains sectioning commands, like section or subsection. • Tabling: This class contains tabling commands, like row and cell in a table. • TheBibliography: This class contains bibliographic command(s).

31

4.2. SOLUTIONS

Figure 4.7: Subclasses of BeginEndCommand

Figure 4.8: Subclasses of Sectioning

32 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

In following, I provide an overview of ontology properties and the structure of properties. For classes that are often being used, there exists a property. This property is composed of the word has plus the name of class that is actually the range of property. For example, hastable is a property that its range is the table class, or hasquote is a property that its range is the quote class. For defining a property, the domain of property should be also specified. In LATEX document ontology, the domain of properties, is all classes that there exists a possibility that the range of property can appear in them. For example, quote can happen inside paragraphs; therefore par is one of domains of hasquote property and its range is quote. There may exist one or more domains for a property. Table 4.3 shows the domains and range of several properties. Property hasrow hasstitle hastable

Domain tabularx, tabular chapter, section, subsection, subsubsection chapte, par, section, subpar, subsection, subsubsection

Range row stitle table

Table 4.3: Some Properties of LATEX Document Ontology Listing 4.2 demonstrates an excerpt of LATEX document ontology. This ontology has been developed using Prot´eg´e that I will introduce it in implementation chapter. Listing 4.2: An Excerpt of LATEX Document Ontology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

. . . . . . . . .

4.2. SOLUTIONS

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

33

. . . . . . . . . . . .

4.2.2

Generating Dynamic Queries from RDF

Due to dynamic nature of XML and XSL templates, I thought there can be also a way for generating dynamic queries. It is possible to generate two kinds of query from RDF. First type of queries that I call them numeric queries and the other is element queries. The first type, numeric queries, are those queries that their results are the number of a specified element. In other words, they count the frequency of an element in RDF. For example, how many chapters exist in my document. Unfortunately, several well-known issues in SQL, like GROUP BY and aggregate functions such as SUM() and COUNT() are not available in RDQL and SPARQL. Therefore, I defined a protocol for counting queries. This simple protocol operates as follows: If a query starts with a special character (#), that means the number of results should be returned; Otherwise the results will be returned. The second type, element queries, are those queries that their results are a specific element in RDF. In other words, end users can access different parts of LATEX document by means of these queries. For example, end users can access the third item of first itemize of second section of fifth chapter of document. These kinds of queries can be automatically generated for all possible elements after generating RDF.

34 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

Beside these queries, I will also generate several static RDQL queries. For example, getting all definitions from document. More information about structure and source code of these queries is available in implementation chapter.

4.2.3

Two Ways for Querying Generated RDF

After generating RDF and maybe queries, end users should be able to query RDF and get desired results. There exist two general approaches for executing queries against RDF. The first one is writing a query in RDQL/SPARQL format and executing it. This kind of query can be generated automatically or can be selected from predefined queries which I explained in previous section, or end users can simply write their own queries in RDQL/SPARQL and execute them. The other approach is a query language similar to human language. Actually, in this approach, end users should be familiar with the structure of elements and IDs in RDF. They should know how IDs are generated. In other words, in this kind of query, end users give only the ID of desired element and a RDQL query will be automatically generated using element ID and will be executed against RDF. As an example, end users may say, I need the title of first subsection of second section of third chapter of document.

4.2.4

Generating XSL-FO from LATEX

The way I generate XSL-FO from LATEX document is very similar to generating RDF. In other words, for generating XSL-FO, firstly I generate dynamic XSL templates and then I apply them to the XML document of LATEX . Figure 4.9 demonstrates this approach and figure 4.10 shows the flowchart of this approach.

Figure 4.9: Overall View of Generating XSL-FO from LATEX Document

4.2.5

Generating Human Understandable Format from XSL-FO

After generating XSL-FO, end users should be able to see the results in a userfriendly manner. This can be achieved by finding a way to show the content of XSL-FO elements in a human understandable format, like HTML or PDF. I decided to use PDF for visualization purposes. For generating PDF, there exist two general approaches: LATEX off and LATEX on. Note that I use the phrase LATEX on for emphasizing that the output is generated with consideration of LATEX commands. Note that there exist some technologies like Adobe XMP [3] for adding metadata to PDF files, but it is mainly focused on general metadata,

4.2. SOLUTIONS

35

Figure 4.10: Flowchart of Proposed Algorithm for Generating XSL-FO from XML Using Dynamic XSLT

36 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

like author and creation date of a document and it can not be used for more specific metadata. 4.2.5.1

LATEX Off

In this approach, the output will be generated without any consideration of LATEX commands. In other words, the LATEX commands and plain text of document are embedded together in a PDF file. The advantage of this approach is that due to accessibility of third-party packages like Apache FOP, for transforming XSL-FO into PDF, this stage can be done straightforward. This approach is also faster than second approach. The disadvantage is that, when the text contains many LATEX commands, reading the generated PDF may sound hard. Figure 4.11 demonstrates the general flowchart for transforming XSL-FO into PDF using Apache FOP. I try to describe this algorithm. This algorithms turns on with the input XSLFO file. After getting input file, it will be separated into several small XSL-FO files, according to configuration. Each file is stored in output folder. The folder name comes from configuration. After separating XSL-FO into several small XSL-FO files, Apache FOP will be employed and PDF files will be generated. As I said, this method is fast, but it is not a good visualization method, because Apache FOP does not understand LATEX commands. 4.2.5.2

LATEX On

In this approach, the output will be generated with considerations of LATEX commands. In other words, a compiler that understands LATEX commands will be employed and after compiling source code, PDF files are generated. This stage is a bit hard, needs text processing algorithms and many configuration items and much more. The main problem in this approach is fetching desired part of source from main source file. After getting an excerpt from LATEX source file, pdflatex will be invoked for generating PDF from source file. Figure 4.12 demonstrates the overview of this general approach for getting the desired part of LATEX source code. I try to describe this algorithm in detail. I would say this algorithm was one of the most complex algorithms in my thesis. I can break this algorithm into several small sub-algorithms: Purifying XML, purifying LATEX source code, text processing algorithm, post-processing tasks, generating source files and finally invoking pdflatex. In followings, I will discuss each sub-algorithm. • Purifying XML: In this sub-algorithm, several changes will be applied on XML file that has been generated by latex2xml. Actually, these tasks are not configurable by means of latex2xml configuration files; therefore, some purification tasks are needed on generated XML. These purification tasks

4.2. SOLUTIONS

37

Figure 4.11: Flowchart of an Algorithm for Transforming XSL-FO into PDF Using Apache FOP

38 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

Figure 4.12: General Overview of LATEX On Method

39

4.2. SOLUTIONS

can be done via XSL templates or text processing algorithms. I used the suitable way for each task. I developed a static XSLT for purifying math formulas. Listing 4.3 shows a static XSL template for adding a $ sign at front and end of math formulas in whole document. Besides this purification task, I should also handle special LATEX characters. For achieving this goal, XML will be the best choice rather than LATEX source file or XSL-FO document, because the characters, that should not change, do not exist in XML, but exist in LATEX source code or XSL-FO. In this stage, some text processing algorithms should exist for handling these special characters. The question is which special characters should be handled. The list of characters, that should change, come from configuration file. Some default characters exist in default configuration file, but it is extensible and end users can define many new characters. The structure of this configuration file has been explained in configuration section. One more important thing during purification is the characters that should not change. For example, verbatim environments, math formulas and CDDATA sections should not change. To achieve this goal, I remove temporarily these environments from XML file and after substitution of LATEX characters, these environments will come back. As I mentioned, such tasks will be done perfectly only in XML file, rather than source file or XSL-FO. The aim of purification of XML is producing a better XSL-FO. Actually, I use this new XML for generating dynamic XSL templates and producing XSL-FO. Purifying XML is a preprocessing task. Listing 4.3: An XSL template for adding $ sign in math formulas 1 2 3 4 5 6 7 8 9 10 11 12 13 14



s e l e c t =”@∗ | node ( ) ”/>

$< x s l : v a l u e −o f />$

s e l e c t =”.”



• Purifying LATEX source file: In this sub-algorithm, LATEX source file will be purified for using in next text processing algorithms. These tasks aim to enhance the structure of source file. Firstly, all verbatim environments will be temporarily removed from source file. These environments will be stored in a temporary place for retrieving in next sections. The reason is that verbatim environments may contain some commands and characters which lead to an unsuccessful text processing. Next, all comments will

40 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

be deleted from source file. The reason is that comments play no role for generating PDF, but they may contain some characters that result to an unsuccessful text processing algorithm in next sections. In next stage, source file should be examined whether there exists the keyword chapter between beginning the document and first section of document or not. If not, an empty chapter keyword (\chapter{}) will be added to source code. The reason is that, text processing algorithms are highly dependable on this keyword. After doing these steps, the source code has been purified and is ready for text processing algorithms. Purifying LATEX source file is also a pre-processing task. • Text processing: In this step, the purified XSL-FO and LATEX source code will be used as input for text processing algorithm. The other input would be the desired part of source file. This part is identified by an ID that exists in XSL-FO file. In this algorithm, I get the desired ID and according to ID, I try to find the suitable part from source code. In this process, the index of chapter and index of section will be extracted from ID. After finding the suitable chapter and section, I begin to dig inside it. The aim of digging is finding the last word in ID according to its index. As an example, suppose the ID is document1 bodymatter1 chapter1 section2 definition3 ; In this case, I try to find second section of first chapter and I try to dig inside it to find the third definition. This is simply done by finding the third \begin{definition} and third \end{definition} in founded section and returning the text that is bounded by these two commands. Note that comments and verbatim environments have been removed in pre-processing steps, therefore they have no effect on search result. In some cases like section titles or paragraphs, there exists no explicit LATEX command for specifying these parts. Therefore, it leads to an unsuccessful search in LATEX source code. In this case, the desired text will be extracted from XSL-FO and will be returned. • Post-processing tasks: After getting the desired part of source file, I should do some post-processing tasks. In this step, the effects of pre-processing tasks are neutralized. In other words and as an example, the verbatim environments that had been removed temporarily from source code in preprocessing tasks, will be added again in LATEX source code. • Generating source file: In this step, all preamble commands from main LATEX source file will be copied to a new file and the contents that have been extracted and purified from main source file in previous section will be added to this file and finally it will be stored in output folder. The path to output folder will be determined by means of configuration file.

4.2. SOLUTIONS

41

Figure 4.13: Flowchart of Proposed Algorithm for Generating PDF Using an Excerpt of LATEX Source File by Means of pdflatex

42 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO

Figure 4.14: Flowchart of Proposed Algorithm for Finding an Excerpt of LATEX Source File

4.3. DISCUSSION AND CONCLUSION

43

• Invoking pdflatex: In this stage, pdflatex will be invoked according to configuration. The path to pdflatex, the path to required source files and the other information will be determined by means of configuration file. For more information regarding configuration items, refer to implementation chapter. Figure 4.13 demonstrates the general flowchart of proposed algorithm for transforming XSL-FO into PDF by means of pdflatex. Figure 4.14 demonstrates text processing algorithm for finding an excerpt from LATEX source file. This algorithm is used in the algorithm demonstrated in figure 4.13.

4.3

Discussion and Conclusion

In this chapter, I presented the problems that should be solved in my thesis. These problems were mainly focused on generating RDF, the ability to query it, and visualizing different parts of source file. After introducing problems, I presented my solutions for solving the problems. I presented several algorithms for generating RDF and XSL-FO from LATEX source file. These algorithms produce dynamic XSL templates for transforming an XML document into RDF or XSLFO. An algorithm for generating unique ID for RDF elements has been explained. The generated RDF is based on an ontology for LATEX documents that I described it. I explained two general ways for generating dynamic queries by means of RDF model. For visualization part, two methods (LATEX on and LATEX off) have been introduced. In LATEX on method, with the help of LATEX compiler, a better view will be generated, but it is a bit complex and has a bit more configuration details. In LATEX off method that is faster than first method, the generated view does not care on LATEX commands. In next chapter, I will cover implementation issues.

Chapter 5

Implementation

As I explained before, I named the application latex2rdf, as it is a common LATEX to RDF converter. However the name latex2rdf does not cover other functionalities of application like generating XSL-FO, querying RDF etc., but I think it is simple and clean. In this chapter, I focus on implementation issues of my thesis. I will describe the software engineering methodology that I used in my thesis and also timetable of development. I will explain latex2rdf features and functionalities. I will also explain its configuration file and how latex2rdf can be configured by means of an XML file. Graphical User Interface (GUI) of latex2rdf and its different elements will be described. Structure of output folders, different types of query and several UML diagrams will be presented. I will also present an overview of tools and third-party packages that I used in my thesis. I will also cover tips for generating XML and several hints regarding latex2rdf. After explaining different issues of implementation, I will present an example to demonstrate latex2rdf and its outputs.

5.1

Methodology

For implementing latex2rdf, I used IBM Rational Unified Process [41] methodology. According to IBM RUP, one of the software development principles is developing software iteratively. latex2rdf was made in several iterations. In each iteration, some features were added and some bugs were fixed. Figure 5.1 [41] demonstrates RUP software development lifecycle. In next sections, I will take a look at each iteration and timetable of developing latex2rdf. 45

46

CHAPTER 5. IMPLEMENTATION

Figure 5.1: IBM Rational Unified Process Software Development Lifecycle

5.1.1

Iterations

latex2rdf was developed in several main iterations. In followings, I list main changes in each iteration: Main features of first iteration: • Generating a simple RDF from a LATEX document • A simple graphical user interface • Ability to execute very simple RDQL queries Main features of second iteration: • Generating a more detailed RDF from LATEX document • A better graphical user interface • Ability to execute different RDQL queries • User interaction by means of suitable messages and alerts • Generating XSL-FO and PDF from LATEX documents • Generating dynamic RDQL queries from RDF • Generating several general static RDQL queries • Configuration issues

47

5.2. QUERIES

Main features of third iteration: • Generating a more detailed RDF from LATEX document • A better graphical user interface • Ability to execute SPARQL queries • Ability to configure latex2rdf in detail (generating RDF and XSL-FO) • A better visualization of PDF files In each iteration, I fixed possible bugs and tried to do my best to make a clean code. I tried to add helpful comments and JavaDoc for a better understanding of different parts and methods of source code.

5.1.2

Timetable

Roughly speaking, I started my thesis in April 2006. The first month was mainly focused on understanding concepts, analysis and planning. In first month, I got more familiar with LATEX environment, different tools of LATEX , available tools for generating XML from LATEX and so on. From May 2006 till August 2006, I designed, implemented and tested latex2rdf in three main iterations. After that, I focused on writing thesis and fixing several bugs and also adding several features to latex2rdf. A Gantt chart is a useful tool for planning and scheduling projects. Figure 5.2 demonstrates Gantt chart of my thesis.

Figure 5.2: Gantt Chart of My Thesis

5.2

Queries

In previous chapter, I explained different types of RDQL/SPARQL queries. In this section, I explain static and dynamic queries, that are produced by latex2rdf, in a more detailed manner.

48

5.2.1

CHAPTER 5. IMPLEMENTATION

Dynamic Queries

Basically, latex2xml produces two groups of queries dynamically. The first group is element queries and the other is numeric queries. 5.2.1.1

Element Queries

Element queries are those queries that aim to return a specified element from document. For example, the title of second section of third chapter of document can be seen as an element query. The naming standards that latex2rdf uses for storing these queries are also very simple and user-friendly. These queries begin with GiveMe phrase and after that it comes an underline ( ) and after underline, the ID of desired element appears. For example, GiveMe document1 bodymatter1 chapter1 section1 par5 m4.rdql is an element query and means the fourth math formula of fifth paragraph of first section of first chapter of document. Listing 5.1 shows an element query. Listing 5.1: Content of GiveMe document1 bodymatter1 chapter1 section1 par12 m4.rdql File 1 2 3 4 5 6

SELECT ? x WHERE ( , , ?x )

5.2.1.2

Numeric Queries

Numeric queries are those queries that aim to return the number of a desired element from document. For example, the number of math formulas or the number of sections or subsections in a document are numeric queries. The naming standards which latex2rdf uses for storing these queries is also very simple and user-friendly. These queries begin with GiveMeNumberOf phrase and after that it comes an underline ( ) and after underline, the name of element appears. For example, GiveMeNumberOf Section.rdql and GiveMeNumberOf Tabular.rdql are two numeric queries. The former returns the number of sections and the latter returns the number of tabulars in document. Listing 5.2 shows a numeric query. Listing 5.2: Content of GiveMeNumberOf Section.rdql File 1 2 3

#SELECT ? x WHERE ( ?x , , )

5.2.2

Static Queries

Some general purpose static queries were also generated during my thesis. Due to structure of these queries, making them dynamically is not simple; therefore,

5.3. GRAPHICAL USER INTERFACE

49

these queries were developed statically. Listing 5.3 shows a simple static query. This RDQL query will return all footnotes located in document. Listing 5.3: Content of GiveMeAllNote.rdql File 1 2 3 4 5

SELECT ? z WHERE ( ?x , , ) , ( ? y , , ) , ( ? y , , ? x ) , ( ? y , , ? z )

5.3

Graphical User Interface

For developing Graphical User Interface (GUI) of latex2rdf, Java Swing [27] was used. Swing released after Java AWT and is a new library from Sun Microsystems for developing GUI controls. It supports many features that are needed for developing an advanced user interface. Figure 5.3 is a snapshot of latex2rdf GUI.

Figure 5.3: Main Graphical User Interface of latex2rdf In followings, I will describe each element of GUI:

50

CHAPTER 5. IMPLEMENTATION

• Element 1: It is the main window of LATEX or XML document. After loading LATEX or XML document, its content will be shown in this element. • Element 2: Pushing this button will lead to open a window for selecting desired LATEX or XML file. After selecting file, the content of it will be copied to Element 1. Figure 5.4 shows this window.

Figure 5.4: Load LATEX /XML Panel

• Element 3: Pushing this button will lead to open a window for selecting a file. After selecting file, the content of element 1 will be stored in desired file. Figure 5.5 shows this window.

Figure 5.5: Save LATEX /XML Panel

• Element 4: Pushing this button will lead to generate RDF. Actually, after pushing this button, the content of element 1 will be examined. If it is an XML file, then it will be directly transformed into RDF; otherwise, the

5.3. GRAPHICAL USER INTERFACE

51

content of element 1 will be firstly translated to XML and then will be transformed into RDF. • Element 5: This element is RDF box. After generating RDF, it will be copied to this box. If generating RDF is not successful, the exception message and/or a guideline will be shown in element 21. After generating RDF, model will be generated automatically. Note that after generating RDF, the RDF file will be also presented in a separated frame or window. Figure 5.6 demonstrates this window.

Figure 5.6: RDF/XSL-FO Window

• Element 6: Pushing this button will lead to open a window for selecting a RDF file. After selecting file, the content of RDF will be copied into element 5 and RDF model will be generated automatically. That means, latex2rdf can be also used as a stand-alone application for generating queries and also for executing RDQL/SPARQL queries against RDF. Figure 5.4 shows this window. • Element 7: Pushing this button will lead to generate dynamic queries according to RDF model. All queries will be stored in output folder. The path

52

CHAPTER 5. IMPLEMENTATION

to output folder is extracted from configuration file. For understanding the structure of these queries, refer to previous sections. • Element 8: Pushing this button will lead to generate XSL-FO. Actually, after pushing this button, the content of element 1 will be examined. If it is an XML file, then it will be directly transformed into XSL-FO; otherwise, the content of element 1 will be firstly translated to XML and then will be transformed into XSL-FO. • Element 9: This element is XSL-FO window. After generating XSL-FO, it will be copied to this window. If generating XSL-FO is not successful, the exception message and/or a guideline will be shown in element 21. Note that after generating XSL-FO, the XSL-FO file will be also presented in a separated frame or window. Figure 5.6 demonstrates this window. • Element 10: Pushing this button will lead to generate PDF files. Actually, after pushing this button, the content of element 9 will be cleaved into several small XSL-FO or LATEX files and each XSL-FO or LATEX file will make a PDF document. • Element 11: This element is RDQL/SPARQL query window. End users can load into or simply write a query in it. • Element 12: This element chooses the query type. latex2rdf has the ability to execute two kinds of query: RDQL and SPARQL. According to query type, end users should select the right type from radio button. • Element 13: This element acts as a cleaner. Pushing this button will clear the content of element 11. • Element 14: Pushing this button will lead to open a window for selecting desired query file. Supported query types are RDQL and SPARQL queries. After selecting file, the content of it will be copied to Element 11. Figure 5.4 shows this window. • Element 15: Pushing this button will get the query from element 11 and execute it against RDF in element 5. • Element 16: This element is result box. After executing query using element 15, the results will be shown in this box. Two phrases – Begin – and – End – are being used for separating results of different queries. A better view, most likely for RDQL queries with one variable and SPARQL queries with two variables is also available after using element 15. A snapshot of this auxiliary window is demonstrated in figure 5.7.

5.3. GRAPHICAL USER INTERFACE

53

Figure 5.7: Snapshot of a Sample Result Window

• Element 17: This element aims to be a simple query box for the users, who know the structure of elements and IDs in RDF. In this case, end users write the hierarchy of desired element which is actually the ID of that element in this window. For example a hierarchy like document1 bodymatter1 chapter1 section1 stitle1 means first title of first section of first chapter of first bodymatter of first document and its query means “give me all information regarding this ID from RDF”. Another example: An expression like document1 bodymatter1 chapter2 section3 itemize2 item3 par5 means fifth paragraph of third item of second itemize of third section of second chapter of first bodymatter of first document. If such a hierarchy exists in IDs of elements, all information (triples) regarding this ID will be extracted from RDF (element 5) and will be shown in result box (element 16) after pushing element 18. In case of lack of such an ID, en empty string surrounded by – Begin – and – End – will be shown. • Element 18: Pushing this button will build a RDQL query based on information located in element 17 and execute this query against element 5 and show the results in element 16. • Element 19: Pushing this button will lead to open a window for selecting a file. After selecting file, the content of element 16 will be stored in desired file. Figure 5.5 shows this window. • Element 20: This element acts as a cleaner. Pushing this button will clear the content of element 16. • Element 21: This element is status box. This box aims to be a one-way communication window between application and end users. All exceptions, messages, guidelines and/or other kinds of alerts are presented in this box. After launching application, a welcome message will be shown and it tries to

54

CHAPTER 5. IMPLEMENTATION

load the path to JDK 1.4 home folder from configuration file and a message indicating this path will be shown. If this value is not set in configuration file, the current folder is assumed to be JDK 1.4 home folder and a dot sign will be shown as JDK 1.4 home folder. • Element 22: This element acts as a cleaner. Pushing this button will clear the content of element 21. • Element 23: Pushing this button will lead to open a window for presenting several guidelines and hints regarding application. • Element 24: Pushing this button will lead to open a window for presenting several messages about application. • Element 25: Pushing this button will finish the application. • Element 26: This element is logo of application. Logo of latex2rdf was generated using a free online logo generator. For more information regarding this logo generator, refer to [10]. Figure 5.8 shows this logo.

Figure 5.8: latex2rdf Logo

• Element 27: This element contains two radio buttons for generating PDF documents. These two radio buttons are pdflatex and Apache FOP. After pushing the button in element 10, the value of this radio button will be read and according to its value (pdflatex or Apache FOP), desired tool or package will be employed for generating PDF files.

5.4

Output

The application generates several kinds of output and stores them in several different output folders. Figure 5.9 demonstrates the hierarchy of output folders. The name of folders is clear. In followings, I will explain a bit more about structure of folders. • Output root folder: This folder shows the root folder of output. The path to this folder is stored in configuration file and end users can change it to point to their desired output root folder. • RDF folder: The RDF file, generated by the application, will be stored in this folder.

55

5.4. OUTPUT

• PDF folder: The PDF documents, generated by the application, will be stored in this folder. • XSL folder: The XSL files, generated dynamically by the application for transforming XML into RDF and XSL-FO, will be stored in this folder. • XSL-FO folder: XSL-FO documents, generated by the application, will be stored in this folder. These documents include actually the main XSL-FO document and also small XSL-FO files of main file. • TEX folder: LATEX documents, generated by the application, will be stored in this folder. These documents are actually the representation of separated elements of main source file. • Queries/Elements folder: The element queries, generated by the application, will be stored in this folder. • Queries/Numeric folder: The numeric queries, generated by the application, will be stored in this folder. • Queries/Predefined folder: This folder is a static folder and application does not write anything in it. It contains the general static predefined RDQL queries. Note that this folder exists in default output root folder of application (out folder) and will not be copied to other output root folders. Therefore, end users can access these queries always from this folder.

Figure 5.9: Hierarchy of Output Folders Note that old files with the same name in folders will be overwritten without any notification. For avoiding this issue, end users can simply change the output root folder in configuration file or move old files to a new place.

56

5.5

CHAPTER 5. IMPLEMENTATION

Configuration

Configuration of latex2rdf is done by means of an XML file named config.xml. latex2rdf tries to load configuration items from this file which should be located under config/latex2rdf folder. Listing 5.4 shows DTD file of config.xml. This DTD is stored in the same folder as config.xml. Listing 5.4: config.dtd: DTD of config.xml 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24



Listing 5.5 shows an example of config.xml. This config.xml is default config file in latex2rdf package. Listing 5.5: A Sample Configuration File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

C: / j a v a / j 2 s d k 1 . 4 . 2 1 2 C: / Programme/MiKTeX 2 . 5 / miktex / b i n / p d f l a t e x . exe E : / w o r k s p a c e / T e s t c a s e s / M a t t h i a s f i l e s 1 ¨ ¨ ¨

5.5. CONFIGURATION

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

57

d a t a / p u r i f y / afterXSLFO / p u r i f y 1 . x s l out 0 0 d e f i n i t i o n example p r o o f theorem lemma c o r o l l a r y c o n j e c t u r e quote t a b u l a r i t e m i z e

I will describe the role of each element in configuration tasks. • jdk14: This element should point to JDK 1.4 folder. latex2xml needs JDK 1.4 for producing XML. • pdflatex: This element is being used for pdflatex. It contains several children that all of them are related to pdflatex. In followings, I will describe each child of pdflatex element. – path: This element should point to main file of pdflatex. Actually, latex2rdf will get the value of this element and invoke pdflatex exactly according to its path. If this element is not set, the default value of it would be pdflatex.exe. – sourceNeededFiles: This element should point to folder, where needed source files for compiling LATEX source and invoking pdflatex exist. For example, if LATEX source file uses some images or external files, this element should point to folder, where these images are stored in it. – laTeXSubstitutionRules: This element is actually a list of substitution rules. The reason I need these rules has been described in previous sections. This element contains zero or more insteadOf element.

58

CHAPTER 5. IMPLEMENTATION

∗ insteadOf: Each insteadOf element indicates a substitution rule. This element has two attributes: this and use. The attribute this indicates the character or the list of characters that should be removed and the attribute use indicates the character or the list of characters that should be replaced with previous characters. For example an statement like means replacing all { characters in document with \{. • outputPurify: This element is being used for purifying output. – beforeRDF: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying before generating RDF. This element can exist zero or more times inside of its parent element. – afterRDF: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying after generating RDF. This element can exist zero or more times inside of its parent element. – beforeXSLFO: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying before generating XSL-FO. This element can exist zero or more times inside of its parent element. – afterXSLFO: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying after generating XSL-FO. This element can exist zero or more times inside of its parent element. • outputFolder: This element points to root output folder of latex2rdf. In previous section, I described about structure of output folders. • fullContentItems: This element is a collection of item elements. – item: This element is child of fullContentItems. Each item tells latex2rdf to generate a content tag in RDF. The content tag contains all contents of that element. • fullRDF: This element is being used for limiting the size of RDF. If this element is set to 0, that means latex2rdf should not generate a full RDF, and it should stop when it sees an element that has been mentioned in item. That means, latex2rdf does not go deeper for generating RDF, as long as it sees an element defined in item. If it is not set to 0, latex2rdf will generate a full RDF.

5.6. SEQUENCES

59

• fullXSLFO: This element is being used for limiting the size of XSL-FO. If this element is set to 0, that means latex2rdf should not generate a full XSLFO, and it should stop when it sees an element that has been mentioned in item. That means, latex2rdf does not go deeper for generating XSL-FO, as long as it sees an element defined in item. If it is not set to 0, latex2rdf will generate a full XSL-FO.

5.6

Sequences

As I explained in previous sections, there exist five main use cases for my thesis. In this section, for each use case, I present a UML sequence diagram to clarify how it works.

5.6.1

Sequence of Generate RDF Use Case

Figure 5.10 demonstrates sequence diagram of Generate RDF use case. I try to describe its process briefly: Firstly, latex2rdfUI checks whether the input content is XML or a LATEX document. If it is a LATEX document, latex2rdf tries to invoke latex2xml. After a successful invocation, it calls makeRDF() method of RdfGenerator class. This method gets the XML file and generates dynamic XSL templates and stores them in an XSL file. createInit() method generates actually the header of XSL file. The path of XSL file is set according to configuration file. After storing XSL file, it creates a new instance from XSLTApplier class and invokes makeTransform() method of this class and it will apply XSL templates on XML file and finally, RDF will be generated.

5.6.2

Sequence of Generate Query Use Case

Figure 5.11 demonstrates sequence diagram of Generate Query use case. I try to describe its process briefly: As I explained before, two main general types of query will be generated automatically: numeric and element queries. For making numeric queries, latex2rdfUI creates a new instance of NumericQueryGenerator class and invokes its generateQuery() method. This method gets the generated RDF model as input and according to RDF model, numeric queries will be generated. As I said before, numeric queries are actually the number of results. Note that an special sign (#) at front of a query makes it a numeric query. After successful generation of numeric queries, element queries will be generated. For this purpose, latex2rdfUI creates a new instance of ElementQueryGenerator class and invokes its generateQuery() method. This method gets the generated RDF model as input and according to RDF model, element queries will be generated.

60

CHAPTER 5. IMPLEMENTATION

Figure 5.10: Sequence of Generate RDF Use Case

Figure 5.11: Sequence of Generate Query Use Case

5.6. SEQUENCES

5.6.3

61

Sequence of Execute Query Use Case

Figure 5.12 and 5.13 demonstrate Execute Query use case. The former is sequence of executing RDQL queries and the latter is sequence of executing SPARQL queries. In RDQL query process, latex2rdfUI generates a new instance of RDQLRunner class and invokes its ExecuteRDQLQuery() method. This methods gets RDF model and query as input and tries to invoke the query against model. Firstly, removeSpecialSign() will be invoked. This method gets query as input and checks whether it is a numeric or element query. If it is a numeric query, the special sign (#) will be removed from query. After executing query using Jena and ARQ engine, the method addToHashmap() will be executed. This method gets key and value as inputs and stores it in a HashMap. These two parameters are actually the result of query. This HashMap will be used for result window in visualization step. The process of executing SPARQL queries is very similar to RDQL queries. Therefore, I do not explain it once more.

Figure 5.12: Sequence of Execute Query Use Case (RDQL Query)

5.6.4

Sequence of Generate XSL-FO Use Case

Figure 5.14 demonstrates sequence diagram of Generate XSL-FO use case. I try to describe its process briefly: Firstly, latex2rdfUI checks whether the input content is XML or a LATEX document. If it is a LATEX document, latex2rdf tries to invoke latex2xml. After a successful invocation, it calls makeXSLFO() method from XSLFOGenerator class. this method gets the XML file and generates dynamic XSL templates and stores them in an XSL file. The path to XSL file is set according to configuration file. After storing XSL file, it invokes makeTrans-

62

CHAPTER 5. IMPLEMENTATION

Figure 5.13: Sequence of Execute Query Use Case (SPARQL Query)

form() method of XSLTApplier class and it will apply XSL templates on XML file and finally, the main XSL-FO file will be generated.

Figure 5.14: Sequence of Generate XSL-FO Use Case

5.6. SEQUENCES

5.6.5

63

Sequence of Generate PDF Use Case

Figure 5.15 and 5.16 demonstrate sequence diagrams of Generate PDF use case. As I mentioned before, there exist two general ways for generating PDF from source code: LATEX on method and LATEX off method. Figure 5.15 shows first method and figure 5.16 demonstrates the second method. In LATEX on method, an object from FOSeparator2Tex class will be generated. This object aims to separate the main XSL-FO file into several small LATEX files and it will be done with the help of LATEX source code and XSL-FO file. After that, separateFOAndGeneratePDF() method will be invoked. This method gets the path to main XSL-FO file as input. Inside this method, a recursive method called generateSimpleTex() will be invoked. This method is actually a traverse method for all elements and inside it, another method named getContent() will be invoked. getContent() method tries to find suitable content for pdflatex with the help of LATEX source code and also XSL-FO file. After getting the desired content, a new instance from SimpleTexCreator class will be generated. After that, createTex() method of this class will be invoked. This method gets the content and a file name and stores this content into desired file. Finally, a new instance from Tex2PDF class will be generated and its method called convertTex2PDF() will be invoked. This method calls actually pdflatex according to configuration and input files. After that, if no problem occurs, the PDF files will be generated. The process of generating PDF in LATEX off method is very similar to LATEX on method. Therefore, I do not explain it once more.

Figure 5.15: Sequence of Generate PDF Use Case Using pdflatex

64

CHAPTER 5. IMPLEMENTATION

Figure 5.16: Sequence of Generate PDF Use Case Using Apache FOP

5.7

Source Code

In this section, I try to explain source code and its structures. latex2rdf has been developed using Java programming language. Java is an object-oriented programming (OOP) language developed by Sun Microsystems. I used version 1.5 of Java Development Kit (JDK) [46], but it should also work with JDK 1.4. The application has been developed and tested mainly under Windows XP Service Pack 2, but as Java is a platform-independent programming language, it should be also executed with no problem under UNIX based operating systems and/or other operating systems with Java support. I tried to add always suitable JavaDoc comments to source code and methods.

5.7.1

Packages and Classes

The latex2rdf application is composed of several packages and classes. In next section, I try to explain each package and class in brief. • Package exceptions: This package contains exceptions which I use in source code. Package exceptions contains following classes: – Class Latex2XmlFailureException: This exception will be thrown, whenever generating XML from LATEX document is unsuccessful. After throwing this exception, end users should check latex2xml log files to see the cause of problem.

5.7. SOURCE CODE

65

– Class LatexWindowIsEmptyException: This exception will be thrown, whenever the user wants to generate RDF or XSL-FO, but the LATEX /XML box is empty and contains no data. – Class QueryExecutionException: This exception will be thrown, whenever execution of query is unsuccessful. In other words, when there exists a problem in query or RDF model is not available, this exception will be thrown. – Class Xml2RdfFailureException: This exception will be thrown, whenever generating RDF from XML is not successful. • Package queryengine: This package contains classes for generating dynamic queries and also executing different types of queries. Package queryengine contains following classes: – Class ElementQueryGenerator: This class contains methods for generating dynamic elements queries from RDF model and storing them in output folder. – Class NumericQueryGenerator: This class contains methods for generating dynamic numeric queries from RDF model and storing them in output folder. – Class RDQLRunner: This class can execute a RDQL query against a RDF model. This class gets as input one query and one RDF model and executes the query against model and stores the results in a HashMap for result window. – Class SPARQLRunner: This class can execute a SPARQL query against a RDF model. This class gets as input one query and one RDF model and executes the query against model and stores the results in a HashMap for result window. • Package rdfengine: This package contains class(es) for generating RDF from LATEX source code. Package rdfengine contains following class: – Class RdfGenerator: This class is the main class for generating RDF from LATEX document. This class contains all methods for generating XSL templates, applying them into source, generating RDF and storing it in output folder. • Package test: This package contains several tests, specially for utils package. Package test contains following classes: – Class TestConfigReader: This class contains several test cases for testing ConfigReader class.

66

CHAPTER 5. IMPLEMENTATION

– Class TestFileUtils: This class contains several test cases for testing FileUtils class. – Class TestFO2PDF: This class contains several test cases for testing FO2PDF class. – Class TestFOSeparator2Tex: This class contains several test cases for testing FOSeparator2Tex class. – Class TestLatex2Xml: This class contains several test cases for testing LATEX to XML converter. – Class TestLatexUtils: This class contains several test cases for testing LatexUtils class. – Class TestOutputPath: This class contains several test cases for testing ConfigReader class, specially for output folders. – Class TestQueryGenerator: This class contains several test cases for testing numeric and element query generator classes. – Class TestRDQLRunner: This class contains several test cases for testing RDQLRunner class. – Class TestRunApp: This class contains several test cases for testing the execution of an external executable file. – Class TestString: This class contains several test cases for working with strings. – Class TestXalan: This class contains several test cases for working with Xalan-Java. • Package ui: This package contains Graphical User Interface (GUI) of latex2rdf. Package ui contains following classes: – Class AboutFrame: This class is the main frame of About window. After pushing About button, this window will appear. – Class HelpFrame: This class is the main frame of Help window. After pushing Help button, this window will appear. – Class Latex2rdfUI: This class is the main user interface of latex2rdf. It contains the main() method of latex2rdf application. – Class OutputFrame: This class is the output frame for presenting RDF or XSL-FO content in a separate window. – Class ResultFrame: This class is the result window. After executing a query, the results will appear in this window. • Package utils: This package contains utility classes of application. Package utils contains following classes:

5.7. SOURCE CODE

67

– Class ConfigReader: This class contains different methods for handling configuration issues. – Class ExecuteFile: This class is being used for executing an external executable file. – Class FileUtils: This class contains several utility methods for working with files, such as saving a file, deleting some files, loading a file, etc. – Class LatexUtils: This class contains several utility methods for working with LATEX documents, for example removing verbatim environments, deleting comments from source file, etc. – Class StringUtils: This class contains several utility methods for working with strings. – Class XSLTApplier: This class contains methods for applying XSL templates into an XML file. • Package xslfoengine: This package contains classes for generating XSL-FO and converting them to PDF using Apache FOP or pdflatex. Package xslfoengine contains following classes: – Class FO2PDF: This class contains methods for generating a PDF document from XSL-FO file using Apache FOP. – Class FOSeparator2FO: This class contains methods for separating an XSL-FO file into several small XSL-FO files. – Class FOSeparator2Tex: This class contains methods for separating an XSL-FO file into several small LATEX files using XSL-FO and LATEX source file. – Class SimpleFOCreator: This class generates a simple XSL-FO file and stores it in output folder. – Class SimpleTexCreator: This class generates a simple LATEX document and stores it in output folder. – Class Tex2PDF: This class contains several methods for generating a PDF document from a LATEX document using pdflatex. – Class XSLFOGenerator: This class is the main class for generating XSL-FO from LATEX document. This class contains all methods for generating XSL templates, applying them into source, generating XSLFO and storing it in output folder.

5.7.2

License

Different parts of latex2rdf application are covered by various licenses, such as Hewlett-Packard Development Company License for Jena package. For more

68

CHAPTER 5. IMPLEMENTATION

information on each license, refer to each project/package Web site. The latex2rdf application was developed under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version [18]. Listing 5.6 shows latex2rdf license agreement. Listing 5.6: latex2rdf License Agreement /∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗ C o p y r i g h t ( c ) 2006 by Peyman N a s i r i f a r d ∗ All rights reserved ∗ ∗ T h i s f i l e i s p a r t o f t h e l a t e x 2 r d f p r o j e c t . The l a t e x 2 r d f p r o j e c t ∗ i s f r e e s o f t w a r e ; you can r e d i s t r i b u t e i t and / o r modify i t under ∗ t h e t e r m s o f t h e GNU G e n e r a l P u b l i c L i c e n s e a s p u b l i s h e d by t h e ∗ F r e e S o f t w a r e Foundation ; e i t h e r v e r s i o n 2 o f t h e L i c e n s e , o r ∗ ( a t your o p t i o n ) any l a t e r v e r s i o n . ∗ ∗ The GNU G e n e r a l P u b l i c L i c e n s e can be f o u n d a t ∗ h t t p : / /www. gnu . o r g / c o p y l e f t / g p l . html . ∗ ∗ T h i s f i l e i s d i s t r i b u t e d i n t h e hope t h a t i t w i l l be u s e f u l , ∗ but WITHOUT ANY WARRANTY; w i t h o u t even t h e i m p l i e d w a r r a n t y o f ∗ MERCHANTABILITY o r FITNESS FOR A PARTICULAR PURPOSE. See the ∗ GNU G e n e r a l P u b l i c L i c e n s e f o r more d e t a i l s . ∗ ∗ T h i s c o p y r i g h t n o t i c e MUST APPEAR i n a l l c o p i e s o f t h e f i l e ! ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗/

5.7.3

Installation

In this section, I explain the installation instructions of latex2rdf. The latex2rdf has a main JAR file called latex2rdf.jar. This JAR file can be invoked for execution of latex2rdf. Following steps should be done before invoking latex2rdf. • Check the configuration file of latex2rdf (config.xml) and set JDK 1.4.2 home folder, the path to pdflatex, etc. • Check the configuration files of latex2xml to see whether all options are available. • Make sure jena.jar (verion 2.4) and its related libraries, fop.jar (version 0.92 beta) and its relates libraries, xercesImpl.jar (version 2.8.0), xalan.jar (version 2.7.0), jdom.jar (version 1.0), and latex2xml.jar (version 1.2) exist in the same folder that latex2rdf.jar exist; otherwise, end users should open latex2rdf.jar and edit its manifest file to correct CLASSPATH. After doing above steps, latex2rdf can be invoked by a command like java -jar latex2rdf.jar. After execution of latex2rdf, main graphical user interface will appear and end users are able to work with it. Note that for backup purposes, there exist all configuration files, structure of folders and so on in latex2rdf.jar.

5.8. LESSONS LEARNED

5.8

69

Lessons Learned

In this section, I will describe several experiences that I learned during my thesis. I will also present several tips for running application and getting results. • latex2xml: latex2rdf is not dependable on latex2xml. As I explained in previous sections, dynamic XSL templates will be generated to transform XML into RDF and this process is not dependable on XML. In other words, if end users want to employ other so called LATEX to XML converters, like Tralics [31], the RDF should be also generated dynamically from this new XML structure. • purification: Sometimes, end users need to purify the XML using static XSL templates. As an example, I had experienced some kind of problem with XML documents. The problem was that latex2xml produced some undesired par/note tags. After XML, I removed these undesired tags with some static XSL templates. After purifying XML, I set it as input to latex2rdf and it generated a correct RDF file. As I explained before, there exists a way for automating this process by means of configuration file. • JDK version: latex2xml is highly dependable on JDK 1.4.2; therefore, end users should set the path to JDK 1.4.2 home folder in configuration file. • Operating System: latex2rdf was implemented in Java programming language; therefore it is platform independent. I developed and tested the application under Windows XP Service Pack II platform, but it can be executed in other environments, like Linux or Mac OS, too.

5.9

Main Tools

In this section, I introduce tools and main third-party packages that I used during my thesis. I am not going to advertise a tool or package, but only introducing it and describing its useful features, advantages and disadvantages.

5.9.1

Eclipse

Most parts of development was done in Eclipse. Eclipse is a free and open source Integrated Development Environment (IDE) for Java and J2EE applications. Its plugin-based architecture enables it to be extended easily by developers around the world. One of the most useful plugins for me during development was Visual Editor for Eclipse. I used it for generating Graphical User Interface (IDE). It is a powerful plugin for designing user interface and to some extend generating its source code. It supports both Swing and AWT. Generally, IBM is the sponsor

70

CHAPTER 5. IMPLEMENTATION

of Eclipse project. I used Eclipse version 3.1.2 during development. For more information on Eclipse platform and project, refer to [15].

5.9.2

Prot´ eg´ e

Prot´eg´e is a free open source tool for developing and working with ontologies that has been developed at Stanford University. Like Eclipse, it has a pluginbased architecture which enables it to be extended easily. I used Prot´eg´e for development of LATEX document ontology. I used Prot´eg´e version 3.1.1 in my work. For more information on Prot´eg´e, refer to [45].

5.9.3

Exchanger XML Editor

Exchanger XML Editor is a powerful XML editor that I used it during my work. It has a free license for academic purposes and also commercial license for other goals. Doing several useful tasks, like checking validity and well-formness of XML files, applying XSL templates using three different processors (build-in, SAXON and XALAN) are several good features of this tool. I also used it to check XML, RDF and XSL-FO structures. For more information on Exchanger XML Editor, refer to [17].

5.9.4

TeXnicCenter

TeXnicCenter is a free tool for working with TEX and LATEX documents. It has a powerful editor with spell checking feature in different languages. It can download required LATEX packages and libraries automatically from server. I used it for making sample LATEX documents for testing purposes. I used it also for preparing my thesis. For more information on TeXnicCenter, refer to [47].

5.10

Main Third Party Packages

In this section, I explain main open source third party packages that I used during my thesis. Their licenses are accessible via their Web sites.

5.10.1

Jena

Jena is an open source Java framework for developing semantic Web applications. It has a rich API for accessing RDF and RDFS. It can simply build a RDF model and query it. Jena has been developed at Hewlett-Packard (HP) Labs for semantic Web research. For more information on Jena project, refer to [22].

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

5.10.2

71

ARQ

ARQ is a SPARQL processor for Jena. ARQ supports multiple query languages, like SPARQL, RDQL and an extended form of SPARQL. I used it for executing SPARQL queries. For more information on ARQ, refer to [21].

5.10.3

JDOM

Java Document Object Model (JDOM) is a free package for reading, writing and manipulating XML documents. In other words, it is an open source Java-based Document Object Model (DOM) for XML. For more information on JDOM, refer to [23].

5.10.4

Apache FOP

Apache FOP (Formatting Objects Processor) is a free open source Java package that enables end users to read XSL-FO documents and generate a specific output, like PDF or HTML. I used it to transform simple XSL-FO documents into PDF. For more information on Apache FOP, refer to [6].

5.10.5

Xalan-Java

Xalan-Java is an open source XSLT processor for transforming XML documents into another XML document. It uses Xerces for working with XML files. I should mention that in generated XSL templates, there is no processor-dependable tag; therefore, I can simply use other XSLT processors. For more information on Xalan-Java, refer to [7].

5.11

Testing Solutions with an Example

For testing methods and algorithms that were presented in this thesis, my main supervisor, Prof. Dr. Nicola Henze, gave me a document for testing purposes. From this LATEX source file, RDF document and respectively dynamic queries should be generated. After generating RDF and queries, different queries would be executed against RDF document and results would be presented. Finally, for visualization part, suitable PDF files by means of XSL-FO and LATEX source file would be generated.

5.11.1

Input File

In this part, I present several statistical information regarding input file. LATEX input file is a big LATEX source file, called Codes and Designs. The file size is almost

72

CHAPTER 5. IMPLEMENTATION

100 KB. It contains many mathematical formulas and is written in German language. The generated PDF file from source file, using pdflatex, has 48 pages. It contains four chapters and twenty two sections.

5.11.2

Results

In this part, results will be presented and demonstrated using several snapshots of application. For generating RDF, the input file should be translated to XML and then transformed into RDF using XSL templates. I generated the XML file of source file using latex2xml outside of application. Because after generating XML, I had to purify the generated XML file using several static XSL templates. After generating XML, I loaded it to main window. Figure 5.17 demonstrates the main window of application, after loading XML input file into it.

Figure 5.17: After Loading Source XML File

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

73

After pushing RDF button, RDF file will be generated. Firstly, dynamic XSL templates will be generated and then these templates will be applied on XML and RDF will be generated. The generated XSL templates will be shown in application log messages. It will be also stored in output folder. Figure 5.18 demonstrates the result of generating RDF from source file.

Figure 5.18: After Generating RDF Note that after pushing Generate Query button, dynamic queries (element and numeric) will be generated and saved in output folder.

74

CHAPTER 5. IMPLEMENTATION

After pushing XSL-FO button, XSL-FO file will be generated. Firstly, dynamic XSL templates will be generated and then these templates will be applied on XML and XSL-FO will be generated. The generated XSL templates will be shown in application log messages. It will be also stored in output folder. Figure 5.19 demonstrates the result of generating XSL-FO from source file.

Figure 5.19: After Generating XSL-FO

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

75

The generated RDF can be queried and the results will be presented in main and result windows. Figure 5.20 demonstrates the results of Give Me All Definitions query after executing it. In result frame, the results of query is presented.

Figure 5.20: After Executing a Pre-defined Query

76

CHAPTER 5. IMPLEMENTATION

Figure 5.21 demonstrates the result of document1 bodymatter1 chapter1 section1 stitle1 query that is a simple query after executing it. After executing the query, all RDF triples related to this ID will be shown in result box. The main result that is actually the title of first section of first chapter of document is recognized with an arrow in figure.

Figure 5.21: After Executing a Simple Query

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

77

Figure 5.22 demonstrates the general view of application after working with it. All parts of this user interface were introduced in previous sections.

Figure 5.22: General View

78

CHAPTER 5. IMPLEMENTATION

Finally, end users are able to save the results. Figure 5.23 demonstrates the process of saving results.

Figure 5.23: Saving Results to a File

5.11.2.1

A Deeper Look at One Element

In this section, I present a deeper look at RDF and also visualization part of an element in results. I will consider the first example of second paragraph of sixth section of first chapter of document. Its ID in RDF will be document1 bodymatter1 chapter1 section6 par2 example1. As I explained, generating RDF is configurable and a concise or detailed RDF can be generated. Listing 5.7 shows the concise RDF for above element. In this case, RDF will be generated till example element is met and it would not go deeper.

79

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

Listing 5.7: Small RDF Description of an Example in Document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

(Hadamard−−M a t r i z e n d e r Ordnung 2ˆm)

\ l e f t ( \ b e g i n { a r r a y }{ c } +1 \ end { a r r a y }\ r i g h t ) und S 1 :=\ l e f t ( \ b e g i n { a r r a y }{ c c } +1 +1 \\ +1 −1 \ end { a r r a y }\ r i g h t ) sind Hadamard−−M a t r i z e n . F¨ u r m \ geq 1 b e t r a c h t e man den Vektorraum V:=\ F 2 ˆm n :=2ˆm v i e l e n Elemente von V i n e i n e r f e s t e n Weise a n g e o r d n e t und s e t z e n f u ¨ r a , b \ in V amp ; ( −1) ˆ{ ab ˆ\ t o p }

Wir denken uns d i e H( a , b ) & ; := &

H i e r d u r c h w i r d e i n e n \ t i m e s n−Matrix H mit E i n t r ¨ a g e n +1 o d e r −1 bestimmt . F¨ ur a \ i n V i s t \sum\ l i m i t s { c \ i n V}H( a , c )H( c , a )=\sum\ l i m i t s { c \ i n V}( −1) ˆ{ ac ˆ\ t o p+ca ˆ\ t o p}=n . I s t a \ n o t= b , s o i s t ( a+b ) ( i )=1 f u ¨ r wenigstens ein i \ i n \N m . B e z e i c h n e n w i r den i −t e n E i n h e i t s v e k t o r von V mit e i , s o e r h a l t e n w i r \sum\ l i m i t s { c \ i n V}H( a , c )H( c , b )=\sum\ l i m i t s { c \ i n V} ( −1) ˆ { ( a+b ) c ˆ\ t o p }=\ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} (( −1) ˆ { ( a+b ) c ˆ\ t o p } + ( −1) ˆ { ( a+b ) ( c+ e i ) ˆ\ t o p } )=\h s p a c e ∗{−1em} \ sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} 0=0. A l s o i s t H Hˆ\ t o p = n E n und f o l g l i c h H e i n e Hadamard−−Matrix .


80

CHAPTER 5. IMPLEMENTATION

Listing 5.8 shows an excerpt of the detailed RDF file. The detailed RDF file has been generated with consideration of all elements. Listing 5.8: Excerpt of Full RDF Description of an Example in Document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

< h a s b l k l i s t r d f : r e s o u r c e=”# d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 6 p a r 2 e x a m p l e 1 p a r 1 b l k l i s t 1 ”/> (Hadamard−−M a t r i z e n d e r Ordnung 2ˆm)

\ l e f t ( \ b e g i n { a r r a y }{ c } +1 \ end { a r r a y }\ r i g h t ) und S 1 :=\ l e f t ( \ b e g i n { a r r a y }{ c c } +1 +1 \\ +1 −1 \ end { a r r a y }\ r i g h t ) sind Hadamard−−M a t r i z e n . F¨ u r m \ geq 1 b e t r a c h t e man den Vektorraum V:=\ F 2 ˆm n :=2ˆm v i e l e n Elemente von V i n e i n e r f e s t e n Weise a n g e o r d n e t und s e t z e n f u ¨ r a , b \ in V amp ; ( −1) ˆ{ ab ˆ\ t o p }

Wir denken uns d i e H( a , b ) & ; := &

H i e r d u r c h w i r d e i n e n \ t i m e s n−Matrix H mit E i n t r ¨ a g e n +1 o d e r −1 bestimmt . F¨ ur a \ i n V i s t \sum\ l i m i t s { c \ i n V}H( a , c )H( c , a )=\sum\ l i m i t s { c \ i n V}( −1) ˆ{ ac ˆ\ t o p+ca ˆ\ t o p}=n . I s t a \ n o t= b , s o i s t ( a+b ) ( i )=1 f u ¨ r wenigstens ein i \ i n \N m . B e z e i c h n e n w i r den i −t e n E i n h e i t s v e k t o r von V mit e i , s o e r h a l t e n w i r \sum\ l i m i t s { c \ i n V}H( a , c )H( c , b )=\sum\ l i m i t s { c \ i n V} ( −1) ˆ { ( a+b ) c ˆ\ t o p }=\ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} (( −1) ˆ { ( a+b ) c ˆ\ t o p } + ( −1) ˆ { ( a+b ) ( c+ e i ) ˆ\ t o p } )=\h s p a c e ∗{−1em} \ sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} 0=0. A l s o i s t H Hˆ\ t o p = n E n und f o l g l i c h H e i n e Hadamard−−Matrix .
2ˆm . . . .


81

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102

document1 bodymatter1 chapter1 section6 par2 example1 par1 H i e r d u r c h w i r d e i n e n \ t i m e s n−Matrix H mit E i n t r ¨ agen o d e r −1 bestimmt . . . . . H Hˆ\ t o p = n E n H

5.11.2.2

b l k l i s t 1 i t e m 2 ”/>

blklist1 item2 par4 m1

blklist1 item2 par4 m2

blklist1 item2 par4 m3

blklist1 item2 par4 m4

+1

blklist1 item2 par7 m1

blklist1 item2 par7

blklist1 item2 par7 m2

blklist1 item2 par7

Visualization of an Element

In this part, the visualization aspects of document1 bodymatter1 chapter1 section6 par2 example1 is demonstrated. In this case, a small LATEX file, listing 5.9, using main LATEX source file is generated. Listing 5.9: LATEX Source of Example 1 2 3 4 5 6

\ documentclass { report } \ u s e p a c k a g e { german } \ u s e p a c k a g e {amssymb} \ include { epsf } \ s e t l e n g t h {\ p a r i n d e n t }{0em}

82

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

CHAPTER 5. IMPLEMENTATION

\ s e t l e n g t h {\ p a r s k i p } { 1 . 5 ex } \ newcounter { b l k c o u n t e r } [ s e c t i o n ] \newcommand{\ newblk } [ 2 ] % {\ newenvironment {#1}[1]{% \renewcommand {\ t h e b l k c o u n t e r }{\ a r a b i c { c h a p t e r } . \ a r a b i c { s e c t i o n } . \ a r a b i c { b l k c o u n t e r }}% \ r e f s t e p c o u n t e r { b l k c o u n t e r}% {\ b f \ t h e b l k c o u n t e r \ v s p a c e { 0 . 5 em} #2.}% {\ h s p a c e ∗ { 0 . 5 em}\em##1}\\∗[\ p a r s k i p ]}% {\ v s p a c e ∗ { 4 . 5 ex }}} \ newcounter { b l k l i s t c o u n t e r } [ b l k c o u n t e r ] \ newenvironment { b l k l i s t }% {\ b e g i n { l i s t }{{\ b f ( \ a l p h { b l k l i s t c o u n t e r } )}}% {\ renewcommand {\ t h e b l k l i s t c o u n t e r }{\ t h e b l k c o u n t e r ( \ a l p h { b l k l i s t c o u n t e r } )}% \ u s e c o u n t e r { b l k l i s t c o u n t e r }\ p a r s e p 1 . 5 ex \ i t e m s e p 0 e x \ t o p s e p 0 e x \ p a r t o p s e p 0 e x }}% {\ end { l i s t }} \ newenvironment { p r o o f}% {{\ b f Beweis .}}% {\ h s p a c e ∗{\ f i l l } $ \Box$\ v s p a c e ∗{3 ex }} \ newblk { c o r o l l a r y }{ K o r o l l a r } \ newblk { c o n j e c t u r e }{ Vermutung} \ newblk { d e f i n i t i o n }{ D e f i n i t i o n } \ newblk { example }{ B e i s p i e l } \ newblk {lemma}{Lemma} \ newblk { theorem }{ S a t z } \newcommand{\ defem }{\em} \newcommand{\C}{\ mathbb{C}} \newcommand{\F}{\ mathbb{F}} \newcommand{\N}{\ mathbb{N}} \newcommand{\Q}{\ mathbb{Q}} \newcommand{\R}{\ mathbb{R}} \newcommand{\Z}{\ mathbb{Z}} \newcommand{\ g g t }{\mbox{ g g t }} \newcommand{\ Kern }{\mbox{ Kern }} \newcommand{\mod}{\mbox{mod}} \newcommand{\ supp }{\mbox{ supp }} \newcommand{\ wt }{\mbox{wt }} \newcommand{\ m l d e r r }{\ h s p a c e ∗ { 0 . 3 em}{\ s c r i p t s t y l e ?}\ h s p a c e ∗{ −0.70em}\ b i g c i r c } \newcommand{\DONOTTEX} [ 1 ] { } \ b e g i n { document } \ b e g i n { example } { ( Hadamard−−M a t r i z e n d e r Ordnung $2 ˆm$) } \ v s p a c e ∗{ −4.5 ex } \ l a b e l { hadaexa } \ begin { b l k l i s t } \ item $ \ l e f t ( \ b e g i n { a r r a y }{ c } +1 \ end { a r r a y }\ r i g h t ) $ und $ S 1 :=\ l e f t ( \ b e g i n { a r r a y }{ c c } +1 +1 \\ +1 −1 \ end { a r r a y }\ r i g h t ) $ sind Hadamard−−M a t r i z e n . \ item F\” ur $m \ geq 1 $ b e t r a c h t e man den Vektorraum $V:=\ F 2 ˆm$ Wir denken uns d i e $n :=2ˆm$ v i e l e n Elemente von $V$ i n e i n e r f e s t e n Weise a n g e o r d n e t und s e t z e n f \” ur $a , b \ i n V$ \ b e g i n { e q n a r r a y ∗} H( a , b ) & := & ( −1) ˆ{ ab ˆ\ t o p } \ end { e q n a r r a y ∗} H i e r d u r c h w i r d e i n e $n \ t i m e s n$−Matrix $H$ mit E i n t r \” agen $+1$ o d e r $−1$ bestimmt .

5.11. TESTING SOLUTIONS WITH AN EXAMPLE

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

83

F\” ur $a \ i n V$ i s t $ \sum\ l i m i t s { c \ i n V}H( a , c )H( c , a ) $ $=$ $ \sum\ l i m i t s { c \ i n V}( −1) ˆ{ ac ˆ\ t o p+ca ˆ\ t o p } $ $=$ $n$ . I s t $a \ n o t= b$ , s o i s t $ ( a+b ) ( i )=1$ f \” ur w e n i g s t e n s e i n $ i \ i n \N m$ . B e z e i c h n e n w i r den $ i $ −t e n E i n h e i t s v e k t o r von $V$ mit $ e i $ , s o e r h a l t e n w i r $ \sum\ l i m i t s { c \ i n V}H( a , c )H( c , b ) $ $=$ $ \sum\ l i m i t s { c \ i n V} ( −1) ˆ { ( a+b ) c ˆ\ t o p } $ $=$ $ \ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} (( −1) ˆ { ( a+b ) c ˆ\ t o p } + ( −1) ˆ { ( a+b ) ( c+ e i ) ˆ\ t o p } ) $ $=$ $ \ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} 0 $ $=$ $0$ . A l s o i s t $H Hˆ\ t o p = n E n$ und f o l g l i c h $H$ e i n e Hadamard−−Matrix . \ end { b l k l i s t } \ end { example } \ end { document }

Figure 5.24 demonstrates the output PDF file of the example element that was generated using pdflatex from the source code in listing 5.9.

Figure 5.24: A Sample Example from Codes and Design Document

84

5.12

CHAPTER 5. IMPLEMENTATION

Discussion and Conclusion

In this chapter, I discussed implementation issues of my thesis. Additionally, I described the way that I developed latex2rdf application. I used IBM Rational Unified Process as software development process and made the application in several iterations. I explained the features that were added in each iteration and also the required time for each iteration. Two general approaches for generating RDQL queries (dynamic and static) have been described. The application has a Graphical User Interface (GUI) that has been developed using Java Swing technologies. I described different parts of GUI and its structure and relations. I explained the structure of output folders, the place where results will be stored into them. I covered configuration issues of latex2rdf and its configuration structure and the role of each element in configuration file. I presented suitable sequences for a better understanding of the processes of use cases. In this chapter, different parts of source code, packages and classes have been also described. I introduced main tools and packages that I employed during my work. Finally, I presented an example and demonstrated different parts of application during presentation of example.

Chapter 6

Summary

In my M.Sc. thesis, Building a Gateway from Text Editing in LATEX to RDF, I developed several algorithms and one application for generating RDF from LATEX documents. This application can be imagined as a black box with several kinds of input and output. The input can be a LATEX source file, an XML file or a query and the output would be RDF document, XSL-FO file, several small LATEX documents, PDF files, generated dynamic queries and the result of execution of queries. At beginning of my thesis, I never used LATEX typesetting system for preparing my reports and documents and my knowledge about this typesetting system was limited. Firstly, I understood its architecture and commands and got familiar with its environments and tools. In this period, I learned a lot about LATEX and its architecture. I also checked, whether there exist some tools in this domain that can help me. I found latex2xml, as a free LATEX to XML converter, and I used it in my work. One of the main tasks in my thesis was making a RDF document form A L TEX source file. However, the way that I developed for generating RDF from LATEX is not unique, but I think it is an efficient way for it. In my proposed approach, the process would be generating XML from LATEX source file and transforming this XML into RDF via XSL templates. One of the main problems during generating RDF rose due to dynamic structure and nature of LATEX and therefore generated XML; because it is clear that the grammar of LATEX documents is very complex and hard to manage. That means XSL templates can not be developed statically, because it is very inefficient and offers developing a huge number of XSL templates. In my proposed approach that I have presented it in this thesis, XSL templates would be generated dynamically. This approach is easily extensible. XSL templates would be generated according to XML and by means of configuration file. Generating a 85

86

CHAPTER 6. SUMMARY

meaningful URI for each element was another problem. This problem has been addressed by means of static XSL templates that generate unique meaningful ID for each element. For this purpose, an XSL template gets parents, grandparents, parents of grandparents etc. of an element and produces a meaningful unique ID for an element using the names of its ancestors and their orders in the document. The generated RDF is based on an ontology, called LATEX document ontology, that I developed it during my thesis. However, this ontology does not contain all possible LATEX commands in a document, but it includes a reasonable subset of LATEX commands that is used commonly in LATEX documents. This ontology defines LATEX commands as concepts and if there exists a possibility for a command to be appeared inside another command, there exists also a property indicating this relationship. After generating RDF, I provided a dynamic approach for generating RDQL queries from RDF. These queries can be categorized into two main classes: element queries and numeric queries. In element queries, the main idea is getting a specific or desired element from LATEX document. Numeric queries are based on number of results. In other words, they count the number of occurrence of an element in document. Due to lack of COUNT() and/or similar functions in RDQL specification, I proposed a protocol for COUNT() function and that is adding a special character (#) at front of a query. In my work, I also generated some general purpose RDQL queries. These static queries can be applied for many documents. After generating queries, end users are able to load them into main Graphical User Interface (GUI) and execute them against RDF. End users can themselves write their own RDQL/SPARQL queries and execute them. There exists also another way for making a query. This approach is suitable for users, who are not familiar with RDQL queries, but they know the structure of generated RDF document. In other words, this kind of queries is specified by the ID of elements and end users type only the desired ID, and latex2rdf will transform it into a RDQL query and execute it. After execution of a query, the results will be presented in GUI and also in a separate frame. For visualization aspects, I presented a dynamic approach for generating XSLFO from LATEX documents. This approach is very similar to the already described method for generating RDF from LATEX source documents. XSL-FO files can be converted to other user-friendly formats like HTML or PDF using different tools and packages. I chose PDF for visualization part, because it is portable and easy to handle. Additionally, there exists also a vast number of tools for generating PDF files. For this purpose, two approaches have been proposed; the first one was using a third party package called Apache FOP and the other was pdflatex. The first approach is fast and secure, but the visualization is not clear. In other words, Apache FOP does not understand LATEX commands and they will be put in output PDF file with no care of semantic parts of commands. The other

87

approach uses pdflatex as a tool for generating PDF files. It is not as fast as first approach, but the visualization is much better than first one. For this approach, some text processing algorithms were developed. The text processing algorithms for generating PDF files using pdflatex aim to find a desired part in source file by aid of LATEX source and XSL-FO document. Some important issues were handling verbatim environments and comments in source file and also finding a specified chapter and section in it. latex2rdf is the name that I chose for my application, as it acts like a converter from LATEX to RDF. I should mention that this name does not reflect all functionalities of latex2rdf. latex2rdf has been developed using Java programming language, therefore it is platform-independent. latex2rdf has been developed and tested under Windows XP Service Pack II. latex2rdf is configurable by aid of an XML configuration file. Some issues like output (RDF, XSL-FO) configuration, path to pdflatex application, path to source required files, path of JDK 1.4 and so on can be configured by aid of configuration file. latex2rdf offers a Graphical User Interface (GUI) that contains all functionalities of it. The GUI is based on Java Swing technologies. The GUI contains also a status box (window). This window acts as a one-way user interaction message window. If something happens or a problem exists, this box will be updated. As extensions to latex2rdf application, I plan to add several extra features to it. Some features like updating the LATEX document ontology automatically, if there exists a new command; and auto-configuration of latex2xml are two cool capabilities that I have thought about. These features can help end users to have a more comfortable experience with latex2rdf. Finally, I would like to say that during my thesis, I learned much in different domains. I got a more detailed understanding regarding RDF and XSL-FO; I got familiar with LATEX typesetting system and many tools and packages that I have never used before.

Bibliography

[1]

A small collection of OWL Ontologies. http://protege.stanford.edu/plugins/owl/owllibrary/. [cited at p. 7]

[2]

W3C - The World Wide Web Consortium. http://www.w3.org/.

[3]

Adobe Company. XMP Extensible http://www.adobe.com/products/xmp/. [cited at p. 34]

[4]

Andreas Hirter, and Olivier Fankhauser, and Stefan von Niederh¨ausern. latex2xml: a LaTeX to XML translator. http://www.latex2xml.org/. [cited at p. 26]

[5]

Andreas Hirter, and Olivier Fankhauser, and Stefan von Niederh¨ausern. latex2xml project documentation. http://www.latex2xml.org/downloads/Projektbericht.pdf.

[cited at p. 8, 12]

Metadata

Platform.

[cited at p. 27]

[6]

Apache team. Apache FOP (Formatting http://xmlgraphics.apache.org/fop/. [cited at p. 71]

[7]

Apache team.

Apache Xalan-Java project.

Objects

Processor)

project.

http://xml.apache.org/xalan-j/.

[cited at p. 71]

[8]

Apostolos Syropoulos, and Antonis Tsolomitis, and Nick Sofroniou. Digital Typography using LaTeX. Springer, first edition, 2002. [cited at p. 18]

[9]

Bob DuCharme. XSLT Quickly. Manning, first edition, 2001.

[10] Cool Text team. Cool Text: http://www.cooltext.com/. [cited at p. 54]

Logo

[11] Cygwin team. Cygwin: a Linux-like http://www.cygwin.com/. [cited at p. 26]

and

[cited at p. 13, 15]

Graphics

environment

´ [12] Dave Pawson. XSL-FO. OReilly Media, first edition, 2002.

Generator.

for

Windows.

[cited at p. 14]

[13] David Taniar, and Johanna Wenny Rahayu. Web Semantics Ontology. Idea Group Publishing, 2006. [cited at p. 7] [14] Dublin Core Metadata Initiative Group. http://dublincore.org/. [cited at p. 6, 7]

89

Dublin Core Metadata Initiative.

90

BIBLIOGRAPHY

[15] Eclipse team - Sponsor:

IBM.

Eclipse project.

http://www.eclipse.org/.

[cited at p. 70]

´ [16] Erik T Ray. Learning XML. OReilly Media, second edition, 2003. [17] Exchanger XML Editor team. http://www.exchangerxml.com/. [cited at p. 70] [18] Free Software Foundation. http://www.gnu.org/copyleft/gpl.html.

Exchanger

GNU

[cited at p. 8]

XML

General

Public

Editor.

License.

[cited at p. 68]

[19] Grigoris Antoniou, and Frank van Harmelen. A Semantic Web Primer (Cooperative Information Systems). The MIT Press, 2004. [cited at p. 9] [20] HEVEA team. HEVEA - a LaTeX to HTML translator. http://hevea.inria.fr/. [cited at p. 21]

[21] Hewlett-Packard Development Company. ARQ: Query engine for Jena. http://jena.sourceforge.net/ARQ/. [cited at p. 71] [22] Hewlett-Packard Development Company. Jena: a Java framework for building Semantic Web applications. http://jena.sourceforge.net/. [cited at p. 70] [23] JDOM team. JDOM project. http://www.jdom.org/.

[cited at p. 71]

[24] Jean-Christophe Filliˆ atre, and Claude March´e. BibTeX2HTML - BibTeX to HTML. http://www.lri.fr/ filliatr/bibtex2html/. [cited at p. 21] [25] jlatex team. jlatex - an editor for latex2e. http://jlatex.free.fr/.

[cited at p. 20]

[26] Johannes Henkel. javabib - a bibtex parser written in Java. plan.cs.colorado.edu/henkel/stuff/javabib/. [cited at p. 25]

http://www-

[27] John Zukowski. The Definitive Guide to Java Swing. Apress, third edition, 2005. [cited at p. 49]

[28] Leslie Lamport. LaTeX: A Document Preparation System, User’s Guide and Reference Manual. Addison-Wesley, second edition, 1994. [cited at p. 17, 19, 20] [29] LaTeX2HTML team. LaTeX2HTML - a LaTeX to HTML convertor. http://www.latex2html.org/. [cited at p. 21] [30] Mark-Jason Dominus. vulcanize - a LaTeX to http://www.plover.com/ mjd/vulcanize.html. [cited at p. 21]

HTML

[31] Miaou team. Tralics: a LaTeX to XML translator. sop.inria.fr/apics/tralics/. [cited at p. 26, 69]

convertor.

http://www-

[32] Michael C. Daconta, and Leo J. Obrst, and Kevin T. Smith. The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management. Wiley, 2003. [cited at p. 9] [33] Michel Goossens, and Sebastian Rahtz, and Eitan Gurari, and Ross Moore, and Robert Sutor. The LaTeX Web Companion: Integrating TeX, HTML, and XML. Addison-Wesley, first edition, May 1999. [cited at p. 19, 20]

91

[34] Michel Klein. BibTeX-2-RDF translator. http://www.cs.vu.nl/ mcaklein/bib2rdf/. [cited at p. 20]

[35] Microsoft Corporation. Microsoft Office. http://office.microsoft.com/. [36] MiKTeX team. MiKTeX project. http://www.miktex.org/.

[cited at p. 17]

[cited at p. 19]

[37] Natalya F. Noy, and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology. http://protege.stanford.edu/publications/ontology development/ontology101noy-mcguinness.html. [cited at p. 7] [38] OCLC Research. Persistent Uniform Resource Locator. http://www.purl.org/. [cited at p. 19]

[39] Open Office team.

Free open source office suite.

http://www.openoffice.org/.

[cited at p. 17]

[40] Otfried Cheong. Hyperlatex. http://hyperlatex.sourceforge.net/.

[cited at p. 21]

[41] Philippe Kruchten. Rational Unified Process, The: An Introduction. Addison Wesley Professional, third edition, 2003. [cited at p. 45] [42] Raymond Seroul, and Silvio Levy, and D. Foata. A Beginner’s Book of TEX. Springer, first edition, 1991. [cited at p. 17] [43] Sean Bechhofer, and Frank van Harmelen, and Jim Hendler, and Ian Horrocks, and Deborah L. McGuinness, and Peter F. Patel-Schneider, and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/, February 2004. [cited at p. 9] ´ [44] Shelley Powers. Practical RDF. OReilly Media, first edition, 2003. [45] Stanford University. Prot´eg´e. http://protege.stanford.edu/.

[cited at p. 12]

[cited at p. 70]

[46] Sun Microsystems. Java Development Kit. http://java.sun.com/.

[cited at p. 64]

[47] TeXnicCenter team. TeXnicCenter. http://sourceforge.net/projects/texniccenter/. [cited at p. 70]

[48] The World Wide Web Consortium members. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/. [cited at p. 12] [49] Tim Berners-Lee. Why RDF model is different from the XML model. http://www.w3.org/DesignIssues/RDF-XML.html. [cited at p. 10] [50] Tim Berners-Lee. What the Semantic Web can represent. http://www.w3.org/DesignIssues/RDFnot.html, September 1998. [cited at p. 5] [51] Tim Berners-Lee, and James Hendler, and Ora Lassila. The Semantic Web, A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, May 2001. [cited at p. 6, 8] [52] Tim Hoffmann. jDvi - a viewer for dvi files. berlin.de/jdvi/. [cited at p. 20]

http://www-sfb288.math.tu-

92

[53] TtH team. TtH a TeX http://hutchinson.belmont.ma.us/tth/. [cited at p. 21]

BIBLIOGRAPHY

to

HTML

convertor.

[54] University of Maryland, Baltimore County. Swoogle Semantic Web Search Engine. http://swoogle.umbc.edu/. [cited at p. 7] [55] Using Dublin Core - The Elements. Dublin Core Metadata Initiative. http://dublincore.org/documents/usageguide/elements.shtml. [cited at p. 6, 7] [56] Victor Eijkhout. TeX by Topic, A TeXnician’s Reference. Addison-Wesley, first edition, 1992. [cited at p. 17] [57] William F. Hammond. GELLMU - A Bridge for Authors from LaTeX to XML that includes \newcommand with arguments. http://www.albany.edu/ hammond/gellmu/. [cited at p. 21] [58] Wine Ontology development team. Wine Ontology. http://www.w3.org/TR/owlguide/wine.rdf. [cited at p. 7] [59] Wolf Siberski. bibtex2rdf - A configurable BibTeX to RDF Converter. http://www.l3s.de/ siberski/bibtex2rdf/. [cited at p. 20] [60] World Wide Web Consortium engineers. http://www.w3.org/RDF/Validator/. [cited at p. 11]

RDF

validator.

Appendices

93

Appendix A

General Example

In this appendix, I present an example regarding the whole process, from LATEX source file to RDF and XSL-FO. Listing A.1 demonstrates a sample LATEX document. This document will be translated to XML. Listing A.1: A sample TeX file 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

% T h i s i s a sample LaTeX i n p u t f i l e . ( V e r s i o n o f 11 A p r i l 1 9 9 4 . ) % % A ’% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e a l l r e m a i n i n g t e x t on t h e l i n e , % and i s u s e d f o r comments l i k e t h i s one . \ documentclass { a r t i c l e }

% S p e c i f i e s t h e document c l a s s

\ t i t l e {An Example Document} \ a u t h o r { L e s l i e Lamport } \ d a t e { January 2 1 , 1994}

% % % %

The p r e a m b l e b e g i n s h e r e . D e c l a r e s t h e document ’ s t i t l e . D e c l a r e s t h e a u t h o r ’ s name . D e l e t i n g t h i s command p r o d u c e s today ’ s d a t e .

\newcommand{\ i p } [ 2 ] { ( # 1 , #2)} % D e f i n e s \ i p { a r g 1 }{ a r g 2 } t o mean % ( arg1 , a r g 2 ) . %\newcommand{\ i p } [ 2 ] { \ l a n g l e #1 | #2\ r a n g l e } % T h i s i s an a l t e r n a t i v e d e f i n i t i o n % \ i p t h a t i s commented o u t .

of

\ b e g i n { document }

% End o f p r e a m b l e and b e g i n n i n g o f t e x t .

\ maketitle

% Produces the

title .

T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own . \ s e c t i o n { O r d i n a r y Text }

% Produces s e c t i o n heading . Lower−l e v e l % s e c t i o n s a r e begun w i t h s i m i l a r % \ s u b s e c t i o n and \ s u b s u b s e c t i o n commands .

The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e counts as a space . One

o r more

blank l i n e s denote the

end

95

96

39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108

of

APPENDIX A. GENERAL EXAMPLE

a paragraph .

S i n c e any number o f c o n s e c u t i v e s p a c e s a r e t r e a t e d l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o \LaTeX , % The \LaTeX command g e n e r a t e s t h e LaTeX l o g o . but i t makes a d i f f e r e n c e t o you . When you u s e \LaTeX , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t file . B e c a u s e p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ % \ , s e p a r a t e s t h e d o u b l e and s i n g l e q u o t e . i s what I j u s t wrote , n o t ‘ that ’ \ , ’ ’ . Dashes come i n t h r e e s i z e s : an i n t r a −word dash , a medium dash f o r number r a n g e s 1−−2, and a p u n c t u a t i o n dash−−−l i k e this .

like

A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . \ a l l % ‘ \ ’ makes an i n t e r −word s p a c e . b e g i n w i t h G\@. % \@ marks end−o f −s e n t e n c e p u n c t u a t i o n . You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis \ldots\ % ‘ \ ’ i s needed a f t e r ‘ \ l d o t s ’ b e c a u s e TeX % i g n o r e s s p a c e s a f t e r command names l i k e \ l d o t s % made from \ + l e t t e r s . % % Note how a ‘% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e % t h e end o f t h e i n p u t l i n e , s o t h e s e b l a n k l i n e s % do n o t s t a r t a new p a r a g r a p h . % w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command . \LaTeX\ i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& \% \# \{ and \ } . I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an \emph{ i t a l i c } type s t y l e . \ b e g i n {em} A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be g i v e n \emph{ a d d i t i o n a l } e m p h a s i s . \ end {em}

97

109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178

I t i s s o m e t i m e s n e c e s s a r y t o p r e v e n t \LaTeX\ from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . ˜ Jones ’ ’ , % ˜ p r o d u c e s an u n b r e a k a b l e i n t e r w o r d s p a c e . o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e \mbox{\emph{itemnum }} t h a t makes l i t t l e s e n s e when hyphenated a c r o s s lines . F o o t n o t e s \ f o o t n o t e { T h i s i s an example o f a f o o t n o t e . } p o s e no problem . \LaTeX\ i s good a t t y p e s e t t i n g m a t h e m a t i c a l f o r m u l a s like \ ( x−3y + z = 7 \ ) or \ ( a {1} > x ˆ{2 n} + y ˆ{2 n} > x ’ \ ) or \ ( \ i p {A}{B} = \ sum { i } a { i } b { i } \ ) . The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ % $ . . . $ and \( . . . \) i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be t yped a s one .

are equivalent

\ s e c t i o n { D i s p l a y e d Text } Text i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations \ begin { quote } This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . S e e how i t i s f o r m a t t e d . \ end { q u o t e } and l o n g e r o n e s . \ begin { quotation } This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e particularly interesting . This i s the second paragraph o f the q u o t a t i o n . It i s j u s t as d u l l as the f i r s t paragraph . \ end { q u o t a t i o n } Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an \emph{ i t e m i z e d } list . \ begin { itemize } \ i t e m T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k mark i s u s e d . \ item This i s the second item o f the l i s t . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an \emph{ enumerated } l i s t . \ b e g i n { enumerate } \ i t e m T h i s i s t h e f i r s t i t e m o f an enumerated l i s t that i s nested within the itemized l i s t . \ item This i s the second item o f the i n n e r l i s t . \LaTeX\ a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . \ end { enumerate } This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . \ item This i s the t h i r d item o f the l i s t .

98

179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213

APPENDIX A. GENERAL EXAMPLE

\ end { i t e m i z e } You can even d i s p l a y p o e t r y . \ begin { verse } There i s an e n v i r o n m e n t f o r v e r s e \\ % The \\ command s e p a r a t e s Whose f e a t u r e s some p o e t s % w i t h i n a s t a n z a . w i l l curse . % One o r more b l a n k l i n e s

lines

separate stanzas .

For i n s t e a d o f making \\ Them do \emph{ a l l } l i n e b r e a k i n g , \\ I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e . \ end { v e r s e } M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . \[ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n } \ ] Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . Here i s a sample t a b l e added by me : \ begin { tabular }{| l | | l |} TableHead1&TableHead2 \\ \ hline T e s t C e l l 1&T e s t C e l l 2 \\ T e s t C e l l 3&T e s t C e l l 4 \\ \ end { t a b u l a r } \ end { document }

% End o f document .

Listing A.2 shows the generated XML from source file using latex2xml. Listing A.2: XML File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own. O r d i n a r y Text The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e c o u n t s a s a s p a c e . One o r more blank l i n e s denote the end of a p a r a g r a p h . S i n c e any number o f c o n s e c u t i v e s p a c e s a r e t r e a t e d l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o , but i t makes a d i f f e r e n c e t o you . When you u s e , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t f i l e . B e ca u se p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ i s what I j u s t wrote , n o t ‘ that ’ \ , ’ ’ . Dashes come i n t h r e e s i z e s : an i n t r a −word

99

29 30 31 32 33 34 35 36 37 38 39 40 41 42

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91

dash , a medium dash f o r number r a n g e s l i k e 1−−2, and a p u n c t u a t i o n dash−−−l i k e t h i s . A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . a l l b e g i n w i t h G. You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis ... w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command. i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& ; \% \# \{ and \}. I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an i t a l i c t y p e s t y l e . A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be g i v e n a d d i t i o n a l e m p h a s i s . I t i s s o m e t i m e s n e c e s s a r y t o p r e v e n t from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . Jones ’ ’ , o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e itemnum t h a t makes l i t t l e s e n s e when hyphenated a c r o s s l i n e s . F o o t n o t e s T h i s i s an example o f a f o o t n o t e . p o s e no problem . i s good a t t y p e s e t t i n g m a t h e m a t i c a l f o r m u l a s like $ x−3y + z = 7 $ or $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ or $ \ i p {A}{B} = \ sum { i } a { i } b { i } $. The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be typed a s one .
D i s p l a y e d TextText i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . S e e how i t i s f o r m a t t e d . and l o n g e r o n e s . This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e p a r t i c u l a r l y i n t e r e s t i n g . This i s the second paragraph o f the quotation . It i s j u s t as d u l l as the f i r s t paragraph . Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an i t e m i z e d list . T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k

100

92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118

APPENDIX A. GENERAL EXAMPLE

mark i s u s e d .
T h i s i s t h e s e c o n d i t e m o f t h e list . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an enumerated l i s t . T h i s i s t h e f i r s t i t e m o f an enumerated l i s t t h a t i s n e s t e d w i t h i n t h e i t e m i z e d l i s t . < item>T h i s i s t h e s e c o n d i t e m o f t h e i n n e r l i s t . a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . T h i s i s t h e t h i r d i t e m o f t h e l i s t .
You can even d i s p l a y p o e t r y . There i s an e n v i r o n m e n t f o r v e r s e
Whose f e a t u r e s some p o e t s w i l l curse .
For i n s t e a d o f making
Them do a l l l i n e b r e a k i n g ,
I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e .
M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n}$ Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . Here i s a sample t a b l e added by me : TableHead1 TableHead2 T e s t C e l l 1 T e s t C e l l 2 T e s t C e l l 3 T e s t C e l l 4


Listing A.3 demonstrates XSL templates that have been generated dynamically for producing RDF. Listing A.3: Dynamic XSL File for Generating RDF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>

101

25 26 27 28

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

60 61 62 63 64 65 66

67 68 69 70 71 72 73 74 75

76 77 78 79 80

< x s l : with−param name=” elementName ” s e l e c t =” ’m’ ” />
< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> 1”>

102

81 82

83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

106 107 108 109 110 111 112

113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134

APPENDIX A. GENERAL EXAMPLE

< x s l : with−param name=” elementName ” s e l e c t =” ’ t a b u l a r ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ l a t e x ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> < s t i t l e xmlns = ” h t t p : / / l a t e x o n t o l o g y . o r g / l a t e x #” r d f : about =”#{ s u b s t r i n g ( $fName , 1 , s t r i n g −l e n g t h ( $fName ) −1)}”> 1”>

103

135 136

137 138 139 140 141 142 143

144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167

168 169 170 171 172 173 174

175 176 177 178 179 180 181 182 183 184 185 186 187

< h a s s t i t l e r d f : r e s o u r c e = ”#{$fName} s t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ s t i t l e ’ ] ) + 1}”>
< x s l : with−param name=” elementName ” s e l e c t =” ’ s t i t l e ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’emph ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” />

104

188 189 190 191 192 193 194 195 196 197 198

199 200 201 202 203 204 205

206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229

230 231 232 233 234 235 236

237 238 239 240 241

APPENDIX A. GENERAL EXAMPLE

1”> < x s l : with−param name=” elementName ” s e l e c t =” ’emph ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>

105

242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260

261 262 263 264 265 266 267

268 269 270 271 272 273 274 275 276

277 278 279 280 281 282 283

284 285 286 287 288 289 290 291 292 293 294

< x s l : with−param name=” elementName ” s e l e c t =” ’ bodymatter ’ ” />
< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ document ’ ” />

106

295 296 297 298 299 300 301 302 303 304 305 306 307

308 309 310 311 312 313 314

315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337

338 339 340 341 342 343 344

345 346 347

APPENDIX A. GENERAL EXAMPLE

< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ quote ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < h a s s t i t l e r d f : r e s o u r c e = ”#{$fName} s t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ s t i t l e ’ ] ) + 1}”>

107

348 349 350 351 352 353

354 355 356 357 358 359 360

361 362 363 364 365 366 367 368 369

370 371 372 373 374 375 376

377 378 379 380 381 382 383 384 385

386 387 388 389 390 391 392

393 394 395 396 397 398 399 400 401

402 403

1”> 1”> 1”> 1”>

108

404 405 406 407 408

409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432

433 434 435 436 437 438 439

440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457

APPENDIX A. GENERAL EXAMPLE

< x s l : with−param name=” elementName ” s e l e c t =” ’ s e c t i o n ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” />

1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ par ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> < t i t l e xmlns = ” h t t p : / / l a t e x o n t o l o g y . o r g / l a t e x #” r d f : about =”#{ s u b s t r i n g ( $fName , 1 , s t r i n g −l e n g t h ( $fName ) −1)}”>

109

458 459 460 461 462 463

464 465 466 467 468 469 470

471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494

495 496 497 498 499 500 501

502 503 504 505 506 507 508 509 510 511 512

1”> < h a s t i t l e r d f : r e s o u r c e = ”#{$fName} t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ t i t l e ’ ] ) + 1}”>
< x s l : with−param name=” elementName ” s e l e c t =” ’ t i t l e ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ i t e m i z e ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” />

110

513 514 515 516 517 518 519 520 521 522 523 524

525 526 527 528 529 530 531

532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555

556 557 558 559 560 561 562

563 564 565 566

APPENDIX A. GENERAL EXAMPLE

< x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < h a s t i t l e r d f : r e s o u r c e = ”#{$fName} t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ t i t l e ’ ] ) + 1}”> < x s l : with−param name=” elementName ” s e l e c t =” ’ f r o n t m a t t e r ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>

111

567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586

587 588 589 590 591 592 593

594 595 596 597 598 599 600 601 602

603 604 605 606 607 608 609

610 611 612 613 614 615 616 617 618

619 620 621

< x s l : with−param name=” elementName ” s e l e c t =” ’ note ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” />

1”> 1”> 1”>

112

622 623 624 625

626 627 628 629 630 631 632 633 634

635 636 637 638 639 640 641

642 643 644 645 646 647 648 649 650

651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677

APPENDIX A. GENERAL EXAMPLE

1”> 1”>
< x s l : with−param name=” elementName ” s e l e c t =” ’ par ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>

113

678 679 680 681

682 683 684 685 686 687 688

689 690 691 692 693 694 695 696 697 698 699 700 701 702

703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718

719 720 721 722 723 724 725

726 727 728

< x s l : with−param name=” elementName ” s e l e c t =” ’ c h a p t e r ’ ” />
& l t ; < x s l : t e x t d i s a b l e −output−e s c a p i n g =”y e s”>&g t ; < x s l : t e x t d i s a b l e −output− e s c a p i n g =”y e s”>& l t ;/ < x s l : t e x t d i s a b l e −output−e s c a p i n g =”y e s”>&g t ; 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l :

114

APPENDIX A. GENERAL EXAMPLE

v a l u e −o f s e l e c t =”.”>
729 730 731

732 733 734

735 736 737

738 739 740

741 742 743

744 745 746

747 748 749

750 751 752

753 754 755 756 757 758 759 760 761 762 763

< x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”>

115

764 765 766 767 768 769 770 771 772 773 774 775 776 777 778



s t r i n g ( $testCounter ) ,





Listing A.4 shows the generated RDF after applying generated XSL templates into XML. Listing A.4: RDF File 1 2

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

< h a s t i t l e r d f : r e s o u r c e=”#d o c u m e n t 1 f r o n t m a t t e r 1 t i t l e 1 ”/> < t i t l e r d f : about=”#d o c u m e n t 1 f r o n t m a t t e r 1 t i t l e 1 ”> T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own.

116

44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

APPENDIX A. GENERAL EXAMPLE

< h a s s t i t l e r d f : r e s o u r c e=”# d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 1 s t i t l e 1 ”/> < s t i t l e r d f : about=”#d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 1 s t i t l e 1 ”> O r d i n a r y Text

The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e c o u n t s a s a s p a c e .



117

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158

One o r more blank l i n e s denote the end of a p a r a g r a p h .

S i n c e any number o f c o n s e c u t i v e s p a c e s a r e t r e a t e d l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o , but i t makes a d i f f e r e n c e t o you . When you u s e , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t f i l e .

B ec a u se p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ i s what I j u s t wrote , no t ‘ that ’ \ , ’ ’ .

Dashes come i n t h r e e s i z e s : an i n t r a −word

118

159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179

180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218

APPENDIX A. GENERAL EXAMPLE

dash , a medium dash f o r number r a n g e s l i k e 1−−2, and a p u n c t u a t i o n dash−−−l i k e t h i s .

A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . a l l b e g i n w i t h G. You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis ... w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command.

i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& ; \% \# \{ and \}.

I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an italic t y p e s t y l e . i t a l i c

119

219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278

A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be given a d d i t i o n a l emphasis . a d d i t i o n a l

I t i s s o m e t i m e s n e c e s s a r y t o p r e v e n t from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . Jones ’ ’ , o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e itemnum t h a t makes l i t t l e s e n s e when hyphenated a c r o s s l i n e s . itemnum



120

279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336

APPENDIX A. GENERAL EXAMPLE

F o o t n o t e s T h i s i s an example o f a f o o t n o t e . p o s e no problem .

T h i s i s an example o f a f o o t n o t e .

i s good a t t y p e s e t t i n g m a t h e m a t i c a l f o r m u l a s like $ x−3y + z = 7 $ or $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ or $ \ i p {A}{B} = \ sum { i } a { i } b { i } $ . The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be t yped a s one .

121

337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390

$ x−3y + z = 7 $
$ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ $ \ i p {A}{B} = \ sum { i } a { i } b { i } $ $x$ < h a s s t i t l e r d f : r e s o u r c e=”# d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 2 s t i t l e 1 ”/>

122

391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450

APPENDIX A. GENERAL EXAMPLE

< s t i t l e r d f : about=”#d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 2 s t i t l e 1 ”> D i s p l a y e d Text

Text i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . Se e how i t i s f o r m a t t e d .

and l o n g e r o n e s . This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e particularly interesting . This i s the second paragraph o f the q u o t a t i o n . It i s j u s t as d u l l as the f i r s t paragraph .



123

451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511

Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an i t e m i z e d list . i t e m i z e d T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k mark i s u s e d . T h i s i s t h e s e c o n d i t e m o f t h e l i s t . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an enumerated l i s t . T h i s i s t h e f i r s t i t e m o f an enumerated l i s t that i s nested w i t hi n the i t e m i z e d l i s t . This i s the second item o f the i n n e r l i s t . a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . This i s the t h i r d item o f the l i s t .

You can even d i s p l a y p o e t r y .

124

512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569

APPENDIX A. GENERAL EXAMPLE

There i s an e n v i r o n m e n t f o r v e r s e Whose f e a t u r e s some p o e t s w i l l curse . For i n s t e a d o f makingThem do a l l l i n e b r e a k i n g , I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e .

M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n} $ Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n}$

Here i s a sample t a b l e added by me: | l | | l |

125

570 571 572 573

T a b l e H e a d 1 T a b l e H e a d 2 T e s t C e l l 1 T e s t C e l l 2 T e s t C e l l 3 T e s t C e l l 4

Listing A.5 demonstrates XSL templates that have been generated dynamically for producing XSL-FO. Listing A.5: Dynamic XSL File for Generating XSL-FO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32

33 34 35 36 37 38 39 40 41 42 43

44

45 46 47

< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”>

126

48

49

50 51 52 53 54 55 56 57 58 59

60

61 62 63 64 65 66 67 68 69 70 71

72

73 74 75 76 77 78 79 80 81 82 83

84

85 86 87 88 89 90 91 92 93 94 95

APPENDIX A. GENERAL EXAMPLE

< x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”>

127

96

97 98 99 100 101 102 103 104 105 106 107

108

109 110 111 112 113 114 115 116 117 118 119

120

121 122 123 124 125 126 127 128 129 130 131

132

133 134 135 136

137

138 139 140 141 142 143

< x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”>

128

144 145 146 147

148

149 150 151 152 153 154 155 156 157 158 159

160

161 162 163 164 165 166 167 168 169 170 171

172

173 174 175 176 177 178 179 180 181 182 183

184

185 186 187 188

189

APPENDIX A. GENERAL EXAMPLE

< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”>

129

190 191 192 193 194 195 196 197 198 199

200

201 202 203 204 205 206 207 208 209 210 211

212

213 214 215 216 217 218 219 220 221 222 223

224

225 226 227 228 229 230 231 232 233 234 235

236

237 238 239

< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”>

130

240 241 242 243 244 245 246 247 248 249 250 251 252 253

APPENDIX A. GENERAL EXAMPLE



s e l e c t =”∗”/>



Listing A.6 shows the generated XSL-FO after applying generated XSL templates into XML. Listing A.6: Main XSL-FO File 1 2 3 4

5 6 7 8 9 10 11 12 13 14 15 16

17 18 19 20 21 22

23 24

25 26 27 28 29 30 31 32 33

34

T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own. O r d i n a r y Text The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e c o u n t s a s a s p a c e . One o r more blank l i n e s denote the end of a p a r a g r a p h . S i n c e any number of consecutive spaces are treated l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o , but i t makes a d i f f e r e n c e t o you . When you u s e , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t f i l e . B e c a u s e p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do

131

35 36 37 38 39 40 41 42 43

44 45 46 47 48 49 50

51 52 53 54 55 56 57 58 59

60 61

62 63 64 65 66

67 68 69

70 71 72 73 74 75

76

77 78 79 80 81 82 83 84 85

d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ i s what I j u s t wrote , n o t ‘ that ’ \ , ’ ’ . Dashes come i n t h r e e s i z e s : an i n t r a −word dash , a medium dash f o r number r a n g e s l i k e 1−−2, and a p u n c t u a t i o n dash−−−l i k e t h i s . A s e n t e n c e − e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . a l l b e g i n w i t h G. You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis ... w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command. i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& ; \% \# \{ and \}. I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an italic t y p e s t y l e . i t a l i c A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be given a d d i t i o n a l emphasis . a d d i t i o n a l I t i s s o m e t i m e s necessary to prevent from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . Jones ’ ’ , o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e itemnum t h a t makes l i t t l e s e n s e when hyphenated a c r o s s l i n e s . itemnum
132

86

87 88

89

90 91 92 93 94 95 96 97 98 99 100

101

102

103

104

105

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

128

APPENDIX A. GENERAL EXAMPLE

: block> F o o t n o t e s T h i s i s an example o f a f o o t n o t e . p o s e no problem . T h i s i s an example o f a f o o t n o t e . i s good a t t y p e s e t t i n g mathematical formulas like $ x−3y + z = 7 $ or $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ or $ \ i p {A}{B} = \ sum { i } a { i } b { i } $ . The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be typed a s one . $ x−3y + z = 7 $ $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ $ \ i p {A}{B} = \ sum { i } a { i } b { i } $ $x$ D i s p l a y e d Text Text i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . Se e how i t i s f o r m a t t e d . and l o n g e r o n e s . This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e particularly interesting . This i s the second paragraph o f the q u o t a t i o n . It i s j u s t as d u l l as the f i r s t paragraph . Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an i t e m i z e d list . i t e m i z e d T h i s i s t h e

133

129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153

154 155 156 157 158 159 160 161

162

163

164 165 166

f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k mark i s u s e d . T h i s i s t h e s e c o n d i t e m o f t h e l i s t . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an enumerated l i s t . T h i s i s t h e f i r s t i t e m o f an enumerated l i s t that i s nested w i t hi n the i t e m i z e d l i s t . This i s the second item o f the i n n e r l i s t . a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . This i s the t h i r d item o f the l i s t . You can even d i s p l a y p o e t r y . There i s an e n v i r o n m e n t f o r v e r s e Whose f e a t u r e s some p o e t s w i l l curse . For i n s t e a d o f makingThem do a l l l i n e b r e a k i n g , I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e . M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n} $ Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . $ \ i p {\Gamma }{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n}$ Here i s a sample t a b l e added by me: T a b l e H e a d 1 T a b l e H e a d 2 T e s t C e l l 1 T e s t C e l l 2 T e s t C e l l 3 T e s t C e l l 4

Dynamic element queries can be also generated, according to RDF document. Figure A.1 shows the list of files that contain element queries. Name of the files is clear and end users can simply understand what queries do. Dynamic numeric queries can be also generated, according to RDF document. Figure A.2 shows the list of files that contain numeric queries. Name of the files is clear and end users can simply understand what queries do. As an exmple, I bring the content of one element and one numeric query. Listing A.7 shows the content of GiveMeNumberOf Quote.rdql file and listing A.8 shows the content of GiveMe document1 bodymatter1 chapter1 section1 par12 m1.rdql file.

134

APPENDIX A. GENERAL EXAMPLE

Figure A.1: Dynamic Element Queries

135

Figure A.2: Dynamic Numeric Queries

Listing A.7: Content of GiveMeNumberOf Quote.rdql File 1 2

#SELECT ? x WHERE ( ?x , , )

Listing A.8: Content of GiveMe document1 bodymatter1 chapter1 section1 par12 m1.rdql File 1 2

SELECT ? x WHERE ( , , ? x )

PDF files can be also generated according to XSL-FO file. Figure A.3 demonstrates PDF files that have been generated according to XSL-FO file. PDF files are generated using pdflatex or Apache FOP. Figure A.4 demonstrates the generated small LATEX documents for generating PDF files and figure A.5 demonstrates the generated small XSL-FO files for generating them. As an example, I bring the content of one small LATEX and one small XSL-FO file. Listing A.9 shows the content of document1 bodymatter1 chapter1 section2 itemize1.tex file and listing A.10 shows the content of document1 bodymatter1 chapter1 section1 par6.fo file.

136

APPENDIX A. GENERAL EXAMPLE

Figure A.3: Generated PDF Files According to XSL-FO File

137

Figure A.4: Small LATEX Files for Generating PDF Files

138

APPENDIX A. GENERAL EXAMPLE

Figure A.5: Small XSL-FO Files for Generating PDF Files

139

Listing A.9: Content of document1 bodymatter1 chapter1 section2 itemize1.tex File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

% T h i s i s a sample LaTeX i n p u t f i l e . ( V e r s i o n o f 11 A p r i l 1 9 9 4 . ) % % A ’% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e a l l r e m a i n i n g t e x t on t h e l i n e , % and i s u s e d f o r comments l i k e t h i s one . \ documentclass { a r t i c l e } \ t i t l e {An Example Document} \ a u t h o r { L e s l i e Lamport } \ d a t e { January 2 1 , 1994} \newcommand{\ i p } [ 2 ] { ( # 1 , #2)} %\newcommand{\ i p } [ 2 ] { \ l a n g l e #1 | #2\ r a n g l e } \ b e g i n { document } \ begin { itemize } \ i t e m T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k mark i s u s e d . \ item This i s the second item o f the l i s t . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an \emph{ enumerated } l i s t . \ b e g i n { enumerate } \ i t e m T h i s i s t h e f i r s t i t e m o f an enumerated l i s t that i s nested within the itemized l i s t . \ item This i s the second item o f the i n n e r l i s t . \LaTeX\ a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . \ end { enumerate } This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . \ item This i s the t h i r d item o f the l i s t . \ end { i t e m i z e } \ end { document }

Listing A.10: Content of document1 bodymatter1 chapter1 section1 par6.fo File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23

< f o : r o o t xmlns : f o =”h t t p : / /www. w3 . o r g /1999/XSL/ Format”> A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . a l l b e g i n w i t h G. You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis

140

24

25 26 27 28

APPENDIX A. GENERAL EXAMPLE

... w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s a s p e c i a l command.

requires

Appendix B

Description of the Attached CD-ROM

The attached CD-ROM to this thesis contains • Source code of latex2rdf • JAR file of latex2rdf (latex2rdf.jar) • An installation document for latex2rdf • Required libraries for latex2rdf • Test cases for application • My Eclipse environment • Several snapshots of latex2rdf • LATEX document ontology • LATEX source codes of this thesis • PDF version of this thesis • My presentation for my M.Sc. thesis Note that I provided a Web page that contains some of these materials. The latest version of some of above items is accessible via

http://www.kbs.uni-hannover.de/~peyman/latex2rdf.htm

141

List of Symbols and Abbreviations

Abbreviation URI XML RDF RDFS OWL XSL XSLT XSL-FO CSS XPath PDF SPARQL RAD GUI API DOM FOP RDQL WYSIWYG IDE AWT OOP JDK XMP WWW W3C DCMES SQL

Description Uniform Resource Identifier eXtensible Markup Language Resource Description Framework Resource Description Framework Schema Web Ontology Language eXtensible Stylesheet Language eXtensible Stylesheet Language Transformations eXtensible Stylesheet Language Formatting Objects Cascading Style Sheet XML Path Language Portable Document Format Simple Protocol and RDF Query Language Rapid Application Development Graphical User Interface Application Programming Interface Document Object Model Formatting Objects Processor RDF Data Query Language What You See Is What You Get Integrated Development Environment Abstract Window Toolkit Object-Oriented Programming Java Development Kit Extensible Metadata Platform World Wide Web World Wide Web Consortium Dublin Core Metadata Element Set Structured Query Language

143

144

LIST OF SYMBOLS AND ABBREVIATIONS

List of Figures

2.1 2.2 2.3

Semantic Web Tower . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 A Simple RDF Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Applying XSL templates to input XML file . . . . . . . . . . . . . . . 13

4.1 4.2 4.3 4.4

Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overall View of Generating RDF from LATEX Document . . . . . . . . General Overview of Converting LATEX Document to XML . . . . . . . Flowchart of Proposed Algorithm for Generating RDF from XML by Means of Dynamic XSLT . . . . . . . . . . . . . . . . . . . . . . . . . Top Classes of LATEX Document Ontology . . . . . . . . . . . . . . . . Subclasses of DocumentElement . . . . . . . . . . . . . . . . . . . . . Subclasses of BeginEndCommand . . . . . . . . . . . . . . . . . . . . . Subclasses of Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . Overall View of Generating XSL-FO from LATEX Document . . . . . . Flowchart of Proposed Algorithm for Generating XSL-FO from XML Using Dynamic XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . Flowchart of an Algorithm for Transforming XSL-FO into PDF Using Apache FOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Overview of LATEX On Method . . . . . . . . . . . . . . . . . Flowchart of Proposed Algorithm for Generating PDF Using an Excerpt of LATEX Source File by Means of pdflatex . . . . . . . . . . . . . Flowchart of Proposed Algorithm for Finding an Excerpt of LATEX Source File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14

5.1 5.2 5.3 5.4 5.5 5.6

IBM Rational Unified Process Software Development Gantt Chart of My Thesis . . . . . . . . . . . . . . . Main Graphical User Interface of latex2rdf . . . . . . Load LATEX /XML Panel . . . . . . . . . . . . . . . Save LATEX /XML Panel . . . . . . . . . . . . . . . . RDF/XSL-FO Window . . . . . . . . . . . . . . . . 145

Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

24 25 27 29 30 30 31 31 34 35 37 38 41 42 46 47 49 50 50 51

146

LIST OF FIGURES

5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24

Snapshot of a Sample Result Window . . . . . . . . . . . latex2rdf Logo . . . . . . . . . . . . . . . . . . . . . . . . Hierarchy of Output Folders . . . . . . . . . . . . . . . . . Sequence of Generate RDF Use Case . . . . . . . . . . . . Sequence of Generate Query Use Case . . . . . . . . . . Sequence of Execute Query Use Case (RDQL Query) . . Sequence of Execute Query Use Case (SPARQL Query) . Sequence of Generate XSL-FO Use Case . . . . . . . . . . Sequence of Generate PDF Use Case Using pdflatex . . . Sequence of Generate PDF Use Case Using Apache FOP After Loading Source XML File . . . . . . . . . . . . . . . After Generating RDF . . . . . . . . . . . . . . . . . . . . After Generating XSL-FO . . . . . . . . . . . . . . . . . . After Executing a Pre-defined Query . . . . . . . . . . . . After Executing a Simple Query . . . . . . . . . . . . . . General View . . . . . . . . . . . . . . . . . . . . . . . . . Saving Results to a File . . . . . . . . . . . . . . . . . . . A Sample Example from Codes and Design Document . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

53 54 55 60 60 61 62 62 63 64 72 73 74 75 76 77 78 83

A.1 A.2 A.3 A.4 A.5

Dynamic Element Queries . . . . . . . . . . . . . Dynamic Numeric Queries . . . . . . . . . . . . . Generated PDF Files According to XSL-FO File Small LATEX Files for Generating PDF Files . . . Small XSL-FO Files for Generating PDF Files .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

134 135 136 137 138

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

List of Tables

2.1

Several XPath Expressions and Their Results . . . . . . . . . . . . . . 15

4.1 4.2 4.3

Preconditions of Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 25 Comparison between latex2xml and Tralics . . . . . . . . . . . . . . . 26 Some Properties of LATEX Document Ontology . . . . . . . . . . . . . 32

147

Building a Gateway from Text Editing in LATEX to RDF

5.7.2 License . ...... Different parts of latex2rdf application are covered by various licenses, such ..... Ist $a \not= b$ , so i s t $ ( a+b ) ( i )=1$ f \”ur wenigstens ein.

3MB Sizes 2 Downloads 94 Views

Recommend Documents

A presentation in LaTeX Beamer on TeX/LaTeX - GitHub
Hello World from \LaTeX ! \begin{equation}. \sum_{n .... pacman -S texlive-most. Debian/Ubuntu/Mint: ... '\input'. pdfTEX will produce a PDF file. Jack Rosenthal.

A presentation in LaTeX Beamer on LaTeX Beamer - GitHub
Oct 20, 2016 - \usetheme{theme goes here} after the \usepackage section. • I personally prefer the Pittsburgh theme, but others around ... add captions ...

pdf text editing online
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. pdf text editing online. pdf text editing online. Open. Extract.

Real-time RDF extraction from unstructured data streams - GitHub
May 9, 2013 - This results in a duplicate-free data stream ∆i. [k.d,(k+1)d] = {di ... The goal of this step is to find a suitable rdfs:range and rdfs:domain as well ..... resulted in a corpus, dubbed 100% of 38 time slices of 2 hours and 11.7 milli

Handling RDF data with tools from the Hadoop ecosystem - ApacheCon
Nov 7, 2012 - This can be done with a simple MapReduce job using. N-Triples|N-Quads files ... Apache Giraph is a good solution gor graph or iterative ... Building (B+Tree) indexes with MapReduce ... get RDF datasets from the Web? • ... 20.

How to make presentations with LATEX - GitHub
Aug 29, 2011 - well with PGF/TikZ packages which might make it the best solution out there. ... done from scratch, the user will end up having a unique theme for his/her ... .tex files, which get compiled when needed and then the resultant .pdf.

LaTeX Tutorial
To have formulas appear in their own paragraph, use matching $$'s to surround them. For example,. $$. \frac{x^n-1}{x-1} = \sum_{k=0}^{n-1}x^k. $$ becomes xn − 1 x − 1. = n−1. ∑ k=0 xk. Practice: Create your own document with both kinds of for

Hypertext marks in LATEX: a manual for hyperref - Michael Prokop
2 by processing hyperref.dtx. You should also read the chapter on hyperref in The LATEX Web ... The image command is intended (as with current HTML.

latex gloves.pdf
Sign in. Page. 1. /. 1. Loading… Page 1 of 1. Page 1 of 1. latex gloves.pdf. latex gloves.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying latex gloves.pdf. Page 1 of 1.

The Gateway Building, Trinity College Map.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. The Gateway Building, Trinity College Map.pdf. The Gateway Building, Trinity College Map.pdf. Open. Extract.

A Sample AMS Latex File
Abstract: This paper evaluates the IPCC SRES scenarios against fossil fuel depletion models and proposes attainable carbon emissions trajectories. The contemporary carbon feedback cycle is then evaluated in light of recent studies and attainable carb

Posters and LATEX - GitHub
Aug 23, 2011 - and there is even another website, which uses this document class and TikZ ... 1The URL is http://theoval.cmp.uea.ac.uk/~nlct/latex/posters/index.html ... and Thomas Deselaers have created the beamerposter package, which ...

LATEX Tutorial
LATEX Tutorial. Zhirong Yang. Laboratory of Computer and Information Science. Helsinki University of Technology. 16 December 2007. – Typeset by FoilTEX – ...

Gateway to the Quran - Resurgent Islam
and you were on the brink of the pit of fire and He saved you from it, Thus does He make His signs clear to you, that you may be guided (to the path that.

Photoshop-CC-Essential-Skills-A-Guide-To-Creative-Image-Editing ...
Page 1 of 3. Download ]]]]]>>>>>(eBooks) Photoshop CC: Essential Skills: A Guide To Creative Image Editing. (-PDF-) Photoshop CC: Essential Skills: A Guide To Creative. Image Editing. PHOTOSHOP CC: ESSENTIAL SKILLS: A GUIDE TO CREATIVE IMAGE EDITING