Leibniz University of Hannover Faculty of Computer Science Institute of Distributed Systems, Knowledge Based Systems
Master of Science Thesis
Building a Gateway from Text Editing in LATEX to RDF by
Peyman Nasirifard
Main advisor: Prof. Dr. Nicola Henze Second advisor: Prof. Dr. Wolfgang Nejdl
Hannover, October 2006
Abstract Semantic Web tries to help machines to understand concepts, their properties, and relations between them. In this case, machines are able to conclude and present new information using existing well-defined information. It is not just a dream and has been realized by aid of several technologies, languages and standards like Resource Description Framework (RDF) and Ontologies. LATEX, a free open source typesetting system based on TeX, has been designed originally for mathematicians and currently is being used by many students, professors and scientists for preparing their reports, papers and documents. Its package-based architecture makes it easily extensible. Gathering metadata from LATEX source file and generating RDF document from source file, according to document architecture and elements, will help end users to benefit from different query languages designed generally for RDF documents for accessing different parts of source file. In this Master of Science Thesis, several practical algorithms for transforming a LATEX document into RDF have been proposed. These algorithms use dynamic XSL templates for translating an XML document to RDF. These algorithms have been implemented in an application named “latex2rdf” which acts like a general LATEX to RDF converter. The solutions for solving visualization aspects of different parts of LATEX source file have been also addressed in this thesis.
I hereby announce that current Master of Science Thesis ”Building a Gateway from Text Editing in LATEX to RDF” has been done by myself and nobody and/or nothing helped me during this work except the references that I have explicitly mentioned in this thesis.
——————————— Peyman Nasirifard Hannover, October 20, 2006
Contents
Contents
i
1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . .
3 3 4
2 Semantic Web, XSL and Related Technologies 2.1 Semantic Web . . . . . . . . . . . . . . . . . . . . . 2.1.1 Metadata . . . . . . . . . . . . . . . . . . . 2.1.1.1 Dublin Core Metadata . . . . . . 2.1.2 Ontology . . . . . . . . . . . . . . . . . . . 2.1.3 Semantic Web Tower . . . . . . . . . . . . . 2.1.3.1 Unicode and URI Layer . . . . . . 2.1.3.2 XML and XML Schema Layer . . 2.1.3.3 RDF and RDF Schema Layer . . . 2.1.3.4 Ontology Vocabulary Layer . . . . 2.1.3.5 Logic Layer . . . . . . . . . . . . . 2.1.3.6 Proof Layer . . . . . . . . . . . . . 2.1.3.7 Trust Layer . . . . . . . . . . . . . 2.1.3.8 Digital Signature and Encryption 2.2 A Deeper Look at RDF . . . . . . . . . . . . . . . 2.2.1 RDF Model . . . . . . . . . . . . . . . . . . 2.2.2 RDF Graph . . . . . . . . . . . . . . . . . . 2.2.3 RDF Triples . . . . . . . . . . . . . . . . . 2.2.4 Validating RDF . . . . . . . . . . . . . . . . 2.2.5 Query Languages for RDF . . . . . . . . . . 2.2.5.1 RDQL . . . . . . . . . . . . . . . 2.2.5.2 SPARQL . . . . . . . . . . . . . . 2.3 XSL Family . . . . . . . . . . . . . . . . . . . . . . 2.3.1 XSLT . . . . . . . . . . . . . . . . . . . . . i
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
5 5 6 6 7 7 8 8 9 9 9 9 9 9 10 10 10 10 11 11 12 12 13 13
ii
CONTENTS
2.4
2.3.2 XSL-FO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 XPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . 15
3 LATEX and 3.1 LATEX 3.1.1 3.1.2
3.2 3.3
its Family . . . . . . . . . . . . . . . . . MiKTeX . . . . . . . . . . . . LATEX Documents in Different 3.1.2.1 LATEX to PDF . . . 3.1.2.2 LATEX to HTML . . BibTeX . . . . . . . . . . . . . . . . An Overview of LATEX Tools . . . . .
. . . . . . . . . . Formats . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
17 17 18 19 19 19 19 20
4 Extracting Metadata, Generating RDF and XSL-FO 4.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Extracting Metadata and Generating RDF from LATEX . . . 4.1.2 Querying Generated RDF . . . . . . . . . . . . . . . . . . . 4.1.3 Generating Human Understandable Format from LATEX . . 4.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Extracting Metadata and Generating RDF from LATEX . . . 4.2.1.1 Generating XML from LATEX . . . . . . . . . . . . 4.2.1.2 Transforming XML into RDF . . . . . . . . . . . . 4.2.1.3 Algorithm for Generating Dynamic XSL Templates 4.2.1.4 Algorithm for Generating ID . . . . . . . . . . . . 4.2.1.5 LATEX Document Ontology . . . . . . . . . . . . . 4.2.2 Generating Dynamic Queries from RDF . . . . . . . . . . . 4.2.3 Two Ways for Querying Generated RDF . . . . . . . . . . . 4.2.4 Generating XSL-FO from LATEX . . . . . . . . . . . . . . . 4.2.5 Generating Human Understandable Format from XSL-FO . 4.2.5.1 LATEX Off . . . . . . . . . . . . . . . . . . . . . . . 4.2.5.2 LATEX On . . . . . . . . . . . . . . . . . . . . . . . 4.3 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . .
23 23 23 23 24 24 25 25 27 27 28 28 33 34 34 34 36 36 43
5 Implementation 5.1 Methodology . . . . . . . . . . . 5.1.1 Iterations . . . . . . . . . 5.1.2 Timetable . . . . . . . . . 5.2 Queries . . . . . . . . . . . . . . 5.2.1 Dynamic Queries . . . . . 5.2.1.1 Element Queries 5.2.1.2 Numeric Queries 5.2.2 Static Queries . . . . . . .
45 45 46 47 47 48 48 48 48
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
iii
5.3 5.4 5.5 5.6
Graphical User Interface . . . . . . . . . . . . . . Output . . . . . . . . . . . . . . . . . . . . . . . Configuration . . . . . . . . . . . . . . . . . . . . Sequences . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Sequence of Generate RDF Use Case . . . 5.6.2 Sequence of Generate Query Use Case . 5.6.3 Sequence of Execute Query Use Case . . 5.6.4 Sequence of Generate XSL-FO Use Case . 5.6.5 Sequence of Generate PDF Use Case . . . 5.7 Source Code . . . . . . . . . . . . . . . . . . . . . 5.7.1 Packages and Classes . . . . . . . . . . . . 5.7.2 License . . . . . . . . . . . . . . . . . . . 5.7.3 Installation . . . . . . . . . . . . . . . . . 5.8 Lessons Learned . . . . . . . . . . . . . . . . . . 5.9 Main Tools . . . . . . . . . . . . . . . . . . . . . 5.9.1 Eclipse . . . . . . . . . . . . . . . . . . . . 5.9.2 Prot´eg´e . . . . . . . . . . . . . . . . . . . 5.9.3 Exchanger XML Editor . . . . . . . . . . 5.9.4 TeXnicCenter . . . . . . . . . . . . . . . . 5.10 Main Third Party Packages . . . . . . . . . . . . 5.10.1 Jena . . . . . . . . . . . . . . . . . . . . . 5.10.2 ARQ . . . . . . . . . . . . . . . . . . . . . 5.10.3 JDOM . . . . . . . . . . . . . . . . . . . . 5.10.4 Apache FOP . . . . . . . . . . . . . . . . 5.10.5 Xalan-Java . . . . . . . . . . . . . . . . . 5.11 Testing Solutions with an Example . . . . . . . . 5.11.1 Input File . . . . . . . . . . . . . . . . . . 5.11.2 Results . . . . . . . . . . . . . . . . . . . 5.11.2.1 A Deeper Look at One Element 5.11.2.2 Visualization of an Element . . . 5.12 Discussion and Conclusion . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 54 56 59 59 59 61 61 63 64 64 67 68 69 69 69 70 70 70 70 70 71 71 71 71 71 71 72 78 81 84
6 Summary
85
Bibliography
89
A General Example
95
B Description of the Attached CD-ROM
141
List of Symbols and Abbreviations
143
List of Figures
145
iv
List of Tables
CONTENTS
147
Acknowledgements
I would like to appreciate and thank Prof. Dr. Nicola Henze for her support and guidelines during this work. Besides being a nice professor and advisor at Leibniz University of Hannover, she is also a very kind friend to her students. I would also like to thank all people, who supported this work by providing free/open source tools and information.
1
Chapter 1
Introduction
1.1
Motivation
After the invention of printing press by Johannes Gutenberg, people always tried and still try to improve the quality of what they print. Many new devices and technologies have been developed for this purpose. After the birth of computers and the creation of electronic typesetting systems, the new effort for improving the quality of electronic typesetting systems happened. LATEX is an open source and extensible typesetting system with many useful features that are highly flexible. Nowadays, many students, professors and scientists use it to prepare a document, paper or even a book. Its extensibility by means of external packages which can be seen as plug-in makes LATEX a powerful typesetting system. There exist several LATEX compilers for different operating systems. Semantic Web tries to help machines to understand concepts, relations between concepts and their properties and process them. In this case, machines are able to conclude and present new information using existing information. Semantic Web is not Artificial Intelligence (AI), but it can be seen as a kind of intelligence in classical Web. The basic blocks of Semantic Web are metadata and ontologies. Metadata is simply data about data and ontologies define the vocabularies that is used in different domains. Metadata can be defined for every resource (existence or concept) and LATEX is not an exception. One way for presenting concepts, their properties, and their relationships with each other is defining them in a document using Resource Description Framework (RDF) language. RDF is the other building block of Semantic Web. In my Master of Science thesis, I am responsible for embedding LATEX into Semantic Web. In other words, building a gateway for transforming a plain LATEX document into RDF which is fully machine processable is the main task in my thesis. RDF should follow a LATEX document ontology. After generating 3
4
CHAPTER 1. INTRODUCTION
RDF, end users are able to query it and browse results. The other part of work is focused on human side. In this part, from a plain LATEX document, XSL-FO will be generated and finally it will be transformed into Portable Document Format (PDF) by means of third party packages.
1.2
Structure of this Thesis
Chapter 2 will give an introduction regarding Semantic Web, XSL and its related technologies. In chapter 3, I will cover LATEX typesetting system, BibTeX, and LATEX tools and packages. In chapter 4, the main problems of my thesis and my solutions, algorithms and approaches for solving them will be addressed. In chapter 5, implementation issues like methodology, timetable, and source code will be discussed and tools and packages that I used in my work will be introduced. Finally, chapter 6 is the summary of my thesis.
Chapter 2
Semantic Web, XSL and Related Technologies
In this chapter, basic technologies that have been used in this thesis will be introduced. I will define Semantic Web and its effect on current Web and how it can be used in order to improve the machine understandable part of current Web. I will also explain Semantic Web tower that has been offered by the World Wide Web Consortium (W3C) members. I will have a deeper look at RDF model, graph and triples and finally I will introduce XSL family. Regarding XSL family, I will cover XSLT, XSL-FO and XPath and finally with a conclusion I will finish this chapter.
2.1
Semantic Web
After the birth of Internet and World Wide Web (WWW), many efforts have been done and many technologies have been developed, in order to make the World Wide Web better, faster and more intelligent. One technology appeared after the other and proposals became standards in a short time. One of these efforts is Semantic Web. Semantic Web can be seen as an extension to current Web. Semantic Web is not Artificial Intelligence. Tim Berners-Lee, the creator of World Wide Web (WWW) and first hypertext-enabled browser, says [50] : ”The concept of machine-understandable documents does not imply some magical artificial intelligence which allows machines to comprehend human mumblings. It only indicates a machine’s ability to solve a well-defined problem by performing well-defined operations on existing well-defined data. Instead of asking machines to understand people’s language, it involves asking people to make the extra effort.” 5
6
CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES
In Semantic Web, with the help of other technologies, we try to help machines to understand concepts and relations between them, process them and present fast logical responses to queries. Semantic Web can assist the evolution of human knowledge as a whole [51]. For achieving this goal, we need several prerequisites which in following sections, I will provide an overview of these prerequisites.
2.1.1
Metadata
Metadata are data that describe data. Every existence or concept can have one or more metadata. As an example, my thesis has an author, title, supervisor etc. These are metadata about my thesis. We can simplify classification and querying the data by means of metadata. One of the most important metadata standards that is being used in Semantic Web projects is Dublin Core metadata standard. 2.1.1.1
Dublin Core Metadata
One of the most famous metadata standards in semantic Web is Dublin Core metadata [14]. The name Dublin Core comes from a city in USA (Dublin), where a workshop in computer science was held. Dublin Core metadata standard is an effective and small size of elements for describing a wide range of resources. The Dublin Core standard consists of two levels: Simple and Qualified. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements listed below: • Title: The name given to the resource [55]. • Creator: An entity primarily responsible for making the content of the resource [55]. • Subject: The topic of the content of the resource [55]. • Description: An account of the content of the resource [55]. • Publisher: The entity responsible for making the resource available [55]. • Contributer: An entity responsible for making contributions to the content of the resource [55]. • Date: A date associated with an event in the life cycle of the resource [55]. • Type: The nature or genre of the content of the resource [55]. • Format: The physical or digital manifestation of the resource [55]. • Identifier: An unambiguous reference to the resource within a given context [55].
2.1. SEMANTIC WEB
7
• Source: A reference to a resource from which the present resource is derived [55]. • Language: A language of the intellectual content of the resource [55]. • Relation: A reference to a related resource [55]. • Coverage: The extent or scope of the content of the resource [55]. • Rights: Information about rights held in and over the resource [55]. I will use some Dublin Core metadata elements in my work. For more information on Dublin Core metadata standard, refer to [14].
2.1.2
Ontology
One of the most important factors in success of Semantic Web depends on ontologies. An ontology is a collection of vocabularies for describing a specific domain. What is domain? It is a general word. Every existence or concept can be imagined as a domain. An ontology includes classes (or concepts) and properties that are related to a domain. Ontologies can be seen as machine understandable classification schemes. As an example, suppose a domain like book and you want to describe all vocabularies related to this domain. You may say: book has one or more authors; book has one or more chapters; book has one ISBN; book has one or more publishers; book has pages number; book has one or more editors and so on. According to human knowledge, many sentences can be built. These sentences in an XML-like shape will build an ontology. Till now, many people developed many different ontologies in different domains. In Semantic Web world, Wine ontology is very famous. Wine ontology is being used for introducing and teaching ontologies. It contains a relative complete classification of wine types. Wine ontology can be browsed at [58]. A small selection of OWL ontologies can be found at [1]. There exist some semantic search engines, like Swoogle [54], which can help us to search for ontologies and/or more things regarding Semantic Web. One important thing about ontologies is that they can be imported into other ontologies. For example, wine ontology imports food ontology. In food ontology, several common vocabularies about foods have been defined. A step-by-step guide for developing ontologies is [37]. For more information on ontologies, refer to [13]. For my thesis, I developed an ontology for LATEX document that I will describe it in following chapters.
2.1.3
Semantic Web Tower
In a classical view, we can build a tower or stack from technologies and concepts that are used in Semantic Web. The most famous tower (stack) of Semantic Web,
8
CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES
offered by the World Wide Web Consortium (W3C) [2] members, has seven levels or layers. Figure 2.1 [51] demonstrates this tower.
Figure 2.1: Semantic Web Tower In following sections, I will explain each layer of this stack. 2.1.3.1
Unicode and URI Layer
In underlying layer, we see Unicode and Uniform Resource Identifier (URI). The aim of this layer is to identify each existence or concept by assigning a unique ID to them. This ID can be meaningful or meaningless for human. For example an ID like AD43F53SDERF34JK can be imagined as a meaningless ID for human and an ID like math course is meaningful. The only restriction is identifying it uniquely. To understand the importance of this layer, suppose a town that all people have the same first name and surname. What a mess! Bob says to Bob: How is Bob? and Bob replies: Which Bob? Actually, without this layer, our tower will fall down. Better say, the lack of each layer will destroy the tower. 2.1.3.2
XML and XML Schema Layer
Extensible Markup Language (XML) is one of the main technologies and standards in current Web. With the help of XML, applications are able to integrate and interact with each other and speak together. In order to validate an XML file, we need to define a structure or schema and each XML that follows this schema is called a valid XML. XML is a very wide topic; For more information on XML, refer to [16].
2.1. SEMANTIC WEB
2.1.3.3
9
RDF and RDF Schema Layer
Resource Description Framework (RDF) in a language for describing resources, their metadata and relationships with other resources. RDF schema is a RDF file that describes vocabularies that we use in RDF for describing resources. I will have a deeper look at RDF in next section. 2.1.3.4
Ontology Vocabulary Layer
I explained briefly in previous section what ontologies are; But I did not say how they should be presented. For presenting ontologies, there exist several languages that help us to describe ontologies in an XML-like format and structure, such as Web Ontology Language (OWL) which has three different types: OWL Lite, OWL DL, and OWL full. Each type offers different features. In brief, it is easier to reason about OWL Lite than OWL DL and OWL DL than OWL Full. In other words, OWL Lite is subset of OWL DL and OWL DL is subset of OWL Full. According to [43], RDF documents will generally be in OWL Full, unless they are specifically constructed to be in OWL DL or Lite. For more information regarding OWL, refer to [43]. RDF can be also imagined as an ontology vocabulary language. 2.1.3.5
Logic Layer
Logic layer of Semantic Web stack is one of the most important layers in this tower. In this layer, logic statements using logic expressions (like NOT, AND, etc.) and first order logic will be defined. These rules actually model the system. 2.1.3.6
Proof Layer
In this layer, reasoning will happen. In previous layer, logic statements have been developed and in this layer, proof layer, the result of above rules will be produced. 2.1.3.7
Trust Layer
Trust layer plays an important rule in this architecture. This layer actually covers other layers and guarantees that the parties are trusted. 2.1.3.8
Digital Signature and Encryption
Digital signature and encryption can be used to make a more secure architecture. Without a robust security architecture, the tower will fall down. I explained in brief, what Semantic Web and so-called Semantic Web tower are. It was just an overview and brief introduction. For more information regarding Semantic Web tower, refer to [19] or [32].
10
2.2
CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES
A Deeper Look at RDF
In this section, I try to describe RDF a bit more, because most part of my thesis uses RDF. I will explain RDF graph and RDF model. I will also have a look at RDF query languages that I am going to use in my work. The first question is why we need RDF at all. Why not XML? Tim Berners-Lee has a nice answer for this question. According to Tim Berners-Lee’s comments [49], the mapping from XML documents to semantic graphs is many to one and we need also a schema to know what the mapping is. Therefore, generating a unique model sounds to be hard or impossible. The solution would be building another layer over XML in order to uniquely generate this semantic model and this layer is RDF layer. If you take a look at Semantic Web tower in previous section, you will see the right place of RDF layer.
2.2.1
RDF Model
RDF model is a phrase that is heard most times. What do we obtain from RDF model and why is it useful? RDF model is simply the semantic view of RDF. In other words, it is a mental view of RDF. For realization of RDF model, RDF graph can be generated. To my view, RDF model and RDF graph can be used with the same purpose.
2.2.2
RDF Graph
From a RDF document, a RDF graph can be built. Figure 2.2 demonstrates a simple RDF graph. This RDF graph is an excerpt of my thesis. According to this graph, we can build several simple sentences. Sentences like ‘‘The title of thesis is Building a Gateway from Text Editing in LATEX to RDF’’ or ‘‘The author of thesis is Peyman Nasirifard’’ can be built using this graph. We can build something called Triple from this graph. In next section, I explain a bit more about triples.
2.2.3
RDF Triples
RDF graph is a set of triples. These triples are composed of Subject, Predicate and Object. Subjects and predicates are identified by URI values, whereas an object can be another URI or a value. According to above graph, following triples in listing 2.1 can be extracted from RDF graph. Listing 2.1: Triples of Previous RDF Graph 1 2 3 4
( Thesis ( Thesis ( Thesis ( Thesis
, h a s T i t l e , B u i l d i n g a Gateway from Text E d i t i n g i n LaTeX t o RDF) , hasAuthor , Peyman N a s i r i f a r d ) , h a s A d v i s o r , P r o f . Dr . N i c o l a Henze ) , hasDate , October 2 0 0 6 )
2.2. A DEEPER LOOK AT RDF
11
Figure 2.2: A Simple RDF Graph
2.2.4
Validating RDF
A RDF document can be validated according to several standards and issues. Besides validity and well-formness of RDF that are inherited from its mother (XML), a RDF document can be examined whether all triples are valid or not. In other words, it is a validator of RDF model. It can be also validated, whether all resources exist in RDF document or some of them have been simply omitted. There exists many online and offline RDF validators around. Many XML editors, which support RDF, can also validate it. I used W3C online validator at [60] for my work. In this validator, after uploading RDF, the triples will be built server-side and will be shown in browser. I would say, it is not a very powerful validator, but it is able to build triples successfully and also check the validity of namespaces.
2.2.5
Query Languages for RDF
The power of a new data model should lie in the ability to access the data easily. RDF as a new data model should follow this rule. It is good if techniques that are used for one data model could be adapted for using within another models. One of the most famous query languages for relational data models is Structured Query Language (SQL). SQL can be easily extended for accessing RDF model too. The result of this extension is different RDF query languages like RDQL and SPARQL. In next section, I will take a look at these query languages that I will use them in my work.
12
CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES
2.2.5.1
RDQL
RDQL is a simple query language for RDF data model. Its syntax is very similar to SQL syntax and people, who know SQL, can simply learn RDQL. There exist many implementations, like Jena, for RDQL. For more information on RDQL, refer to [44]. Listing 2.2 shows a simple RDQL query. Listing 2.2: A Simple RDQL Query 1 2 3
SELECT ? x WHERE ( ? x ,
, )
This query will find all statements in the graph that have predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type and object http://example.com/someType. The variable ?x will be bound to the label of the subject resource. All such x are returned. Note that ? introduces variable, but it is not part of the variable.
2.2.5.2
SPARQL
SPARQL is another query language for RDF, designed by W3C [2]. It is also very similar to RDQL. There exist several differences between RDQL and SPARQL. For example, in different clauses of RDQL, “()” is used, whereas in SPARQL, “{}” is used. For more information on SPARQL, refer to [48]. Listing 2.3 shows a simple SPARQL query. Listing 2.3: A Simple SPARQL Query 1 2 3
PREFIX r d f s y n t a x : SELECT ? x WHERE {? x r d f s y n t a x : t y p e }
This query will find all statements in the graph that have predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type and object http://example.com/someType. The variable ?x will be bound to the label of the subject resource. All such x are returned. Note that ? introduces variable, but it is not part of the variable. This query does exactly what previous RDQL query does, but it is a bit different in syntax. Unfortunately, several well-known issues in SQL, like GROUP BY and aggregate functions such as SUM() and COUNT() are not available in RDQL and SPARQL specifications and to my view, that is one of the main shortages in RDQL and SPARQL. RDF is a very wide topic to discuss. For more information on RDF, its data model, query languages etc., refer to [44].
13
2.3. XSL FAMILY
2.3
XSL Family
XSL stands for eXtensible Stylesheet Language and is a family of recommendations for defining XML document transformation and presentation. With the help of XSL, we are able to access a specific element within an XML document and translate one XML document to another XML document. We are also able to work on visualization aspects of XML documents. In this section, I present a deeper look at XSL family languages and its related technologies. I will introduce three members of XSL family: XSLT, XSL-FO and XPath.
2.3.1
XSLT
Extensible Stylesheet Language Transformations (XSLT) is simply an XML language for transforming XML documents into another XML documents. The way it works is very straightforward: We define one or more XSL templates to translate a special XML structure into other formats. In next step, we employ a so called XSLT processor or engine. There exist many free open source XSLT processors for Java like Xalan-Java or SAXON. As a black box view, an XSLT processor has two inputs: The first input is the input XML file and the other is a list of XSL templates. XSLT processor applies templates to input XML file and generates a new (maybe XML) file. In this case, the input file would not change, but a new file will be generated. Figure 2.3 demonstrates a general overview of this process. For more information on XSLT, refer to [9].
Figure 2.3: Applying XSL templates to input XML file Listing 2.4 shows a sample XSL template. If we apply this XSL template on listing 2.5, the output will be listing 2.6. s Listing 2.4: A Simple XSL Template 1 2 3 4
14
5 6 7 8 9 10 11 12
CHAPTER 2. SEMANTIC WEB, XSL AND RELATED TECHNOLOGIES
Listing 2.5: Input XML 1 2 3 4
< t i t l e >B u i l d i n g a Gateway from Text E d i t i n g i n LaTeX t o RDF Peyman N a s i r i f a r d
1 2 3 4 5 6
B u i l d i n g a Gateway from Text E d i t i n g i n LaTeX t o RDF
Peyman N a s i r i f a r d
Listing 2.6: Output XML
2.3.2
XSL-FO
Extensible Stylesheet Language Formatting Objects (XSL-FO), the other member of XSL family is simply an XML language for document formatting. It contains both data and its formatting issues, like font family, font size, color etc. in only one document. For a better understanding, we can imagine XSL-FO as a combination of HTML and Cascading Style Sheet (CSS) in only one document. Like XSLT processors, there exist also so called XSL-FO processors. The task of XSL-FO processors is applying formatting issues to document and presenting a readable document to end users. For more information on XSL-FO, refer to [12]. Listing 2.7 shows a simple XSL-FO file. The element is the root element of XSL-FO documents. The element contains one or more page templates. In listing 2.7, it contains only one page template, named my-page. One or more elements describe the page contents. The master-reference attribute refers to the page templates that have been defined before in the . is composed of several and which are actually the contents of document. XSL-FO files can be simply transformed into other formats like HTML or PDF. Listing 2.7: A Simple XSL-FO File 1 2 3 4 5 6 7 8 9 10
2.4. DISCUSSION AND CONCLUSION
11 12 13 14 15
15
H e l l o , w o r l d !
2.3.3
XPath
XML Path Language (XPath) is a powerful non-XML language for addressing parts of an XML document. It joins with XSLT for addressing different parts of an XML document. XPath offers also many useful functions for testing nodes, working with strings, numbers and so on. Each XSLT processor should have an XPath engine, for getting the desired part of an XML document. Table 2.1 shows several XPath expressions and their results. For more information on XPath, refer to [9]. XPath Expression * /parent/child[1] //@att //element[@att]
Result Matches any element node Selects the first child element that is the child of the parent element Selects all attributes that are named att Selects all the element elements that have an attribute named att
Table 2.1: Several XPath Expressions and Their Results
2.4
Discussion and Conclusion
In this chapter, I explained technologies and standards regarding Semantic Web and XSL family that I am going to use in my thesis. I explained what Semantic Web is and why we use it and its effect on current Web. I explained the role of metadata in Semantic Web and specially Dublin Core metadata standards in my thesis. I described ontologies as one of success factors in Semantic Web. I covered Semantic Web tower (stack) and different technologies and standards that play critical rules in this game. I explained RDF, one of the main building blocks of Semantic Web, its model, graph and triples. I had a look at XSL and its family. XSLT has been explained and the way we access each element in XML documents using a non-XML language (XPATH) has been described. I had a look at XSL-FO and its structure. The role of XSLT and XSL-FO processors and their requirements have been depicted. I tried to support most parts with simple examples. In next chapter, I am going to explain LATEX and its family.
Chapter 3
LATEX and its Family
In this chapter, I will take a look at LATEX and its family members. I will explain what LATEX is and the advantages that it offers. The output of compiling a LATEX document can be presented in different formats. I will cover different ways for presenting the output of LATEX and the tools and packages that exist for this purpose. I will also have an overview on different LATEX tools (mostly LATEX converters) that I found during my thesis. I will also take a look at BibTeX and its structure and how it can be used for bibliographic purposes.
3.1
LATEX
In general, typesetting systems fall into two main groups; the first group is so called ”What You See Is What You Get” or simply WYSIWYG, and the other group is focused on separating view form content or better say this group is not WYSIWYG. In [42], it calls the second group as markup systems. Microsoft Word [35] and Open Office [39] are two examples of first group, whereas TEX is an example of second group. TEX is a typesetting system created by Donald Knuth at Stanford University. It is an extensible and portable typesetting system. For more information on TEX and its structure, refer to [42] or [56]. LATEX is a document preparation system for the TEX. LATEX is pronounced like ”Latesh” or ”Latech” and has been originally developed for mathematicians. LATEX is implemented as a TEX macro package. A macro package is a set of predefined commands. Nowadays, many scientists, researchers, professors and students use LATEX for preparing their documents and papers. I personally prepared this thesis with LATEX and I am feeling its advantages. It is comfortable and powerful and supports all features that I need. In [28], there exists a nice user-friendly 17
18
CHAPTER 3. LATEX AND ITS FAMILY
tutorial on LATEX; from getting started to handling graphics and errors. The only disadvantage of markup systems is that it costs a bit more time to manage rather than WYSIWYG systems. It has a startup time, that end users should consume in order to get familiar with its environment and different commands. As it is known, it is so called: ”Every start is hard”. I can say, in a system, if document complexity and size grow up, then LATEX acts very better than other typesetting systems like Microsoft Word. LATEX supports many typesettings. In [8] there exists a comprehensive reference on LATEX typesetting in mathematics, graphics and multilingual documents. Listing 3.1 shows a very simple LATEX document. Listing 3.1: A Simple LATEX Document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32
% T h i s i s a sample LaTeX f i l e . % % A ’% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e a l l % and i s u s e d f o r comments l i k e t h i s one \ documentclass { a r t i c l e }
r e m a i n i n g t e x t on t h e l i n e ,
\ t i t l e {An Example Document} \ a u t h o r {Peyman N a s i r i f a r d } \ d a t e { September 1 5 , 2006}
% % % % %
S p e c i f i e s t h e document c l a s s The p r e a m b l e b e g i n s h e r e . D e c l a r e s t h e document ’ s t i t l e . D e c l a r e s t h e a u t h o r ’ s name . D e l e t i n g t h i s command p r o d u c e s today ’ s d a t e .
\ b e g i n { document }
% End o f p r e a m b l e and b e g i n n i n g o f t e x t .
\ maketitle
% Produces the
title .
T h i s i s an example document . \ section { First Section }
It
i s the
first
% Produces s e c t i o n heading . Lower−l e v e l % s e c t i o n s a r e begun w i t h s i m i l a r % \ s u b s e c t i o n and \ s u b s u b s e c t i o n commands . s e c t i o n o f my document . I can w r i t e e v e r y t h i n g t h a t I want .
\ s u b s e c t i o n { F i r s t Su bs ect i o n } % Produces s u b s e c t i o n heading . I t i s a sample s u b s e c t i o n . You can even have s u b s u b s e c t i o n i n your document . \ s e c t i o n { Second S e c t i o n } I t i s a n o t h e r s e c t i o n o f my document . I can put image , draw t a b l e , w r i t e math f o r m u l a s and much much more i n \LaTeX . % The \LaTeX command g e n e r a t e s t h e LaTeX l o g o . \ section { conclusion } I would l i k e t o s a y \LaTeX i s a v e r y n i c e t y p e s e t t i n g s y st e m . I e n j o y u s i n g \ end { document }
it .
% End o f document .
In Listing 3.1, several simple LATEX commands have been used. This document has only three sections and one subsection. It also contains useful comments for understanding different commands. There exist many tools called LATEX editors for free and commercial purposes. Note that LATEX documents are plain text documents; therefore, source files can be written in a simple text editor and it can be compiled with LATEX compiler.
3.1.1
MiKTeX
MiKTeX is simply a TEX implementation for the Windows platform. In other words, it is a compiler for LATEX and TEX documents under Windows. It supports
3.2. BIBTEX
19
almost all versions of Windows, from Windows 98 to Windows XP. It contains all necessary packages for compiling and visualizing LATEX documents. It also contains all necessary files for generating PDF from source file. It can be installed online or offline. In other words, end users are able to download all packages and install them offline or simply download an installer and installer can download desired packages from Web site. A complete MiKTeX installation may take several hours, depending on different systems and the number of packages that is going to be installed. For more information on MiKTeX, refer to [36].
3.1.2
LATEX Documents in Different Formats
As I said, the advantage of LATEX is separating content from view, therefore there exist many possibilities for view section. LATEX documents can be transformed into many other formats, like Portable Document Format (PDF) and HTML. For achieving this goal, there exist many useful tools that can be employed. 3.1.2.1
LATEX to PDF
PDF is currently one of the most common formats that many people prefer to use. Therefore, generating PDF from LATEX is one of the most common goals to achieve. There exist several tools like pdflatex for generating PDF from a plain LATEX document. Under Windows platform, pdflatex is part of MiKTeX project. 3.1.2.2
LATEX to HTML
Generating HTML from LATEX is another goal of LATEX users, specially those people, who want to publish their works on the Web. The latex2html project [33] and tex4ht project [33] are two most common tools for generating HTML from LATEX. Both of them are highly configurable. The tex4ht is even configurable for generating XML documents from LATEX, but this process is a bit complex.
3.2
BibTeX
BibTeX is a file format and also a program developed for LATEX environment. It is being used for preparing the bibliography and reference parts of a document. According to [28], BibTeX supports fourteen kinds of document: article, book, booklet, conference, inbook (part of a book), incollection (part of a book with its own title), manual, master thesis, misc, Ph.D. thesis, technical report, and unpublished. I personally feel the lack of an entry as a Web site. With the help of Persistent Uniform Resource Locator (PURL) which is actually an intermediate resolution service [38], we are able to assign static URLs to Web pages. I do hope that in future this and/or similar entries would be supported in BibTeX.
20
CHAPTER 3. LATEX AND ITS FAMILY
Someday, there were no official laws for doing e-business, but now there exist. For more information on BibTeX, refer to [28]. Listing 3.2 shows a simple bibliographic document. It contains an article and a book in BibTeX format. Listing 3.2: A Simple Bibliographic Document 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% ∗∗∗∗∗∗∗∗∗∗ B i b l i o g r a p h y ∗∗∗∗∗∗∗∗∗∗ % The n e x t i t e m i s an a r t i c l e . @ARTICLE{Sem2001 , AUTHOR = ”Tim B e r n e r s −Lee , and James H endl er , and Ora L a s s i l a ” , TITLE = ”The S e m a n t i c Web, A new form o f Web c o n t e n t t h a t i s m e a n i n g f u l t o c o m p u t e r s w i l l u n l e a s h a r e v o l u t i o n o f new possibilities ”, JOURNAL = ” S c i e n t i f i c American ” , YEAR = ”2001” , VOLUME = ”” , NUMBER = ”” , PAGES = ”” , MONTH = ”May” , NOTE = ”” } % The n e x t i t e m i s a book . @BOOK{ Ltx1994 , AUTHOR = ” L e s l i e Lamport ” , TITLE = ”LaTeX : A Document P r e p a r a t i o n System , User ’ s Guide and R e f e r e n c e Manual ” , PUBLISHER = ” Addison−Wesley ” , YEAR = ”1994” , VOLUME = ”” , SERIES = ”” , ADDRESS = ”” , EDITION = ” Second ” , MONTH = ”” , NOTE = ”” }
Due to simple structure of BibTeX, there are several efforts for transforming BibTeX into XML. There exists also some DTDs offered for BibTeX, like [33]. There exist even several efforts for transforming a BibTeX file into RDF format, like [59] and [34]. The former has been implemented using Java and the latter is a Perl script which has an online interface for demo purposes.
3.3
An Overview of LATEX Tools
During my thesis, I found many useful tools in different aspects of LATEX, mostly in converting a LATEX document to other formats. In this section, I present a survey or an overview of tools that I have found during my work. • JLatex: An editor for LATEX. For more information, refer to [25]. • JDVI: This tool enables us to view DVI files in a browser. It is a Java applet. For more information, refer to [52].
3.3. AN OVERVIEW OF LATEX TOOLS
21
• BibTeX2HTML: It is a BibTeX to HTML converter. For more information on it, refer to [24]. • GELLMU: Generalized Extensible LATEX-Like Markup (GELLMU) is LATEXlike markup to create documents in an easy plain text format that may be faithfully converted to high-powered documents marked up under SGML [57]. For more information, refer to [57]. • HEVEA: HEVEA is a quite complete and fast LATEX to HTML translator. HEVEA was written in Objective Caml. For more information, refer to [20]. • Hyperlatex: Hyperlatex is a set of macro definitions that allows users to write one document for two media and to have the output look good in both printed text and on the Web. For more information, refer to [40]. • LaTeX2HTML: It is a converter that was written in Perl for generating HTML documents from LATEX source files. For more information, refer to [29]. • TtH: It is a translator for TEX documents to HTML documents. It has also a commercial version with additional features called TtHgold. For more information, refer to [53]. • vulcanize: It is a very simple Perl script for converting LATEX documents to HTML. According to its documentation, it does not work very well with nested LATEX commands. For more information, refer to [30]
Chapter 4
Extracting Metadata, Generating RDF and XSL-FO
In this chapter, I will focus on theoretical aspects of my thesis. I will explain the motivations of my thesis and what exactly my thesis is. Problems and the relations between different parts of thesis will be explained. After explaining the problems, I will focus on solutions, proposals and algorithms that I have presented for solving the problems. I will describe the advantages and to somehow disadvantages of proposed algorithms. I will reason why I used a specific tool, technology or standard. After explaining the theoretical aspects, in next chapter, I will discuss implementation issues. I will start with the problems description.
4.1
Problems
Building a Gateway from Text Editing in LATEX to RDF and XSL-FO, the ability to query generated RDF, and visualize its content can be divided into several subproblems. In following sections, I will explain these subproblems.
4.1.1
Extracting Metadata and Generating RDF from LATEX
The main subproblem in my thesis is generating RDF from a plain LATEX document. In this part, a source document should be translated to a machine understandable format, i.e. RDF.
4.1.2
Querying Generated RDF
The main purpose of generating RDF is the ability to query it and getting reasonable responses. One of the problems to solve is the ability to query RDF in a user-friendly manner. 23
24 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
4.1.3
Generating Human Understandable Format from LATEX
After generating a machine understandable format, i.e. generating RDF, I will also generate a human-readable format from a plain LATEX document. This stage is mainly for human side and end users. In this case, end users are able to have an overview on different parts of source document.
4.2
Solutions
After browsing the problems that are actually the topics of my thesis, I will explain the solutions that solve above problems. There exists several subproblems that I mentioned above and the solutions can be also divided into several subsolutions. One of the first things that I did during my thesis was drawing a use case diagram to visualize the requirements of thesis. Figure 4.1 demonstrates use case diagram. Generating RDF document, generating queries, generating XSL-FO document, generating a user friendly format from XSL-FO (generating PDF), and executing queries against RDF are five main use cases of project.
Figure 4.1: Use Case Diagram
Table 4.1 shows the preconditions of each use case. Note that Generated XML is not an use case, but it is a precondition for Generate RDF and Generate XSL-FO use cases.
25
4.2. SOLUTIONS
Use case Generate RDF Generate Query Generate XSL-FO Generate PDF Execute Query
Precondition Generated XML Generate RDF Generated XML Generate XSL-FO Generate RDF (sometimes Generate Query)
Table 4.1: Preconditions of Use Cases
Extracting Metadata and Generating RDF from LATEX
4.2.1
I explained in previous chapters what RDF is. To my view, the main problem in my thesis was generating RDF from a plain LATEX document. For this purpose, there exist several approaches. The precondition of all approaches is that the user should be able to access all elements in LATEX source document. Considering this issue, one approach is developing a compiler or API for LATEX documents in order to access document elements. Unfortunately, there is no Java API available for LATEX documents. There exist such efforts for BibTeX, like javabib [26], that is actually a BibTeX parser written in Java. After considering all limitations and possibilities, due to time limitations of thesis, I omitted this approach. Generating a compiler or Java API for LATEX could be itself a nice thesis topic. Another approach is translating a LATEX document into a general more accessible format like XML and then translating XML to desired RDF. I chose this approach, due to accessibility of so called LATEX to XML converters. In a tree-like view, this section can be divided into two branches that I will explain in followings. Figure 4.2 shows the general overview of this approach.
Figure 4.2: Overall View of Generating RDF from LATEX Document As I explained, the main task of my thesis was generating RDF; therefore, I named my application latex2rdf, as it acts like a LATEX to RDF converter. 4.2.1.1
Generating XML from LATEX
For generating XML from LATEX document, I tried to find the best free tool available on net. This phase was very important, because I wanted to reduce the loss of metadata from LATEX document. Therefore, I focused on this section several weeks to select a good tool. I found several LATEX to XML converters which I introduce in followings:
26 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
Converter
Docs
Platform
Configuration
latex2xml
Good
Good structure of configuration
Tralics
Good
Independent (implemented in Java) Only Linux (possible under Windows)
Complex structure of configuration
Third-party Dependency No
Yes
Table 4.2: Comparison between latex2xml and Tralics
• latex2xml: It was the first tool that I found. It is the result of diploma work of three students at Berne University of Applied Sciences. latex2xml is a converter that transforms LATEX document into a definable XML structure. It is highly configurable by means of XML configuration files. It has a buildin compiler for parsing LATEX documents. They implemented latex2xml in Java, therefore it is platform independent. For more information regarding latex2xml and its documentation, refer to [4]. • Tralics: It is another LATEX to XML converter. It has been developed at ”The French national institute for research in computer science and control” using C++ and Perl programming languages. It operates only under Linux, but using Cygwin [11], it can be also employed under Windows. It has a comprehensive documentation. For more information regarding Tralics, refer to [31]. After comparing these two packages, I decided to use latex2xml for my thesis. Table 4.2 shows the comparison between these two packages. I should say latex2rdf is not dependable on latex2xml. Every “LATEX to XML” converter can be used in it. Figure 4.3 shows the main approach for producing XML using latex2xml and latex2rdf configurations. latex2xml is dependable on JDK 1.4.2, therefore the JDK 1.4.2 home folder should be determined in latex2rdf configuration file. For converting a LATEX document into XML, latex2rdf takes a look firstly at its configuration to find JDK home and then it invokes latex2xml. Note that latex2xml is not compatible with JDK 1.5. In the case of an unsuccessful transformation, a warnings file would be generated to prompt end users what the reasons were. Two most common problems are “unknown command” and “no output label”. The first problem means that there exist one or more LATEX commands in source file that have not been defined in configuration file. The solution is simply opening one of configuration files (LaTeXCommands.xml) and putting the omitted command in it. Putting a command into LaTeXCommands.xml has also several other parameters that have
4.2. SOLUTIONS
27
Figure 4.3: General Overview of Converting LATEX Document to XML
been addressed in latex2xml documentation. The second common problem (no output label) means that latex2xml does not know which tag should be used in XML file. The solution is simply opening one of configuration files (LabelAssociations.xml) and putting the label tag into it. More information is available in latex2xml documentation [5]. 4.2.1.2
Transforming XML into RDF
After generating XML, there should be a way to transform it into RDF. The most practical way is using XSLT. There exists a problem and that is the generated XML is dynamic and developing static XSL templates for transforming them into RDF is not extensible, comprehensive, logical, and it is really time-consuming. Therefore, I decided to generate dynamic XSL templates for transforming XML into RDF. These templates are dependable on XML structure. In next section, I present an algorithm that I designed for producing dynamic XSL templates. 4.2.1.3
Algorithm for Generating Dynamic XSL Templates
This algorithm is straightforward. Firstly, I generate all possible children of an element in XML using a recursive method. According to children of an element and also absolute path of different elements, by means of functionalities of XSL and XPath node testing, I generate dynamic XSL templates for all possible children of an element. Generating dynamic XSL templates is configurable by means of config file. For example, end users can determine to summarize RDF by omit-
28 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
ting some elements. At the end of next chapter, an example regarding this issue will be presented. Figure 4.4 demonstrates the flowchart of this algorithm. The generated RDF is based on an ontology that I have developed for LATEX documents. In following sections, I will explain this ontology too. 4.2.1.4
Algorithm for Generating ID
There exist two general approaches for generating IDs in RDF. One approach is employing XSLT processors for doing this and the other is generating ID by an XSL template. Most XSLT processors have a build-in ID generator for XML tags; but the problem is that the generated ID by these processors offers no semantic in it. For example an ID like A02B21C has no semantic and nothing can be extracted from it, but an ID like document1 bodymatter1 chapter1 section2 tabular1 emphasizes the first tabular of second section of first chapter of document. The second approach sounds much better than first one. The way this ID is generated is very simple and straightforward. Getting all parents, grandparents, parents of grandparents, etc. of an element from XML source and also getting their indices (orders) from source, combining them and adding a simple underline ( ) character between elements will lead to such an ID. Listing 4.1 shows an XSL template for generating unique ID in RDF document. Listing 4.1: XSL Template for Generating Unique ID in RDF 1 2 3 4 5 6 7 8 9
4.2.1.5
LATEX Document Ontology
As I explained in previous sections, ontologies are main blocks for enabling Semantic Web. For my thesis, I needed a reference for LATEX document elements. Therefore, I developed a LATEX document ontology. In following, I explain classes and properties of this ontology. LATEX document ontology has two main classes: LatexDocument which represents a LATEX document and DocumentElement which indicates document elements. Figure 4.5 demonstrates these two top classes of LATEX document ontology. In DocumentElement, there exist several categories and LATEX commands are sorted according to their relevant categories. Figure 4.6 shows the subclasses of DocumentElement.
4.2. SOLUTIONS
29
Figure 4.4: Flowchart of Proposed Algorithm for Generating RDF from XML by Means of Dynamic XSLT
30 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
Figure 4.5: Top Classes of LATEX Document Ontology
Figure 4.6: Subclasses of DocumentElement
Each class has its own subclasses. As an example, I present here subclasses of BeginEndCommand class and Sectioning class. Figure 4.7 demonstrates subclasses of BeginEndCommand class. Figure 4.8 demonstrates subclasses of Sectioning class. I try to explain each category: • BeginEndCommand: This class contains commands that are surrounded by \begin and \end. • Footnote: This class contains footnote command(s). • Formatting: This class contains formatting commands, like center or bold. • Links: This class contains links command(s). • lstinputlisting: This class contains listing command(s). • MathFormula: This class contains math formula(s). • Misc: This class contains misc commands that do not fit in other classes. • Sectioning: This class contains sectioning commands, like section or subsection. • Tabling: This class contains tabling commands, like row and cell in a table. • TheBibliography: This class contains bibliographic command(s).
31
4.2. SOLUTIONS
Figure 4.7: Subclasses of BeginEndCommand
Figure 4.8: Subclasses of Sectioning
32 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
In following, I provide an overview of ontology properties and the structure of properties. For classes that are often being used, there exists a property. This property is composed of the word has plus the name of class that is actually the range of property. For example, hastable is a property that its range is the table class, or hasquote is a property that its range is the quote class. For defining a property, the domain of property should be also specified. In LATEX document ontology, the domain of properties, is all classes that there exists a possibility that the range of property can appear in them. For example, quote can happen inside paragraphs; therefore par is one of domains of hasquote property and its range is quote. There may exist one or more domains for a property. Table 4.3 shows the domains and range of several properties. Property hasrow hasstitle hastable
Domain tabularx, tabular chapter, section, subsection, subsubsection chapte, par, section, subpar, subsection, subsubsection
Range row stitle table
Table 4.3: Some Properties of LATEX Document Ontology Listing 4.2 demonstrates an excerpt of LATEX document ontology. This ontology has been developed using Prot´eg´e that I will introduce it in implementation chapter. Listing 4.2: An Excerpt of LATEX Document Ontology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
. . . . . . . . .
4.2. SOLUTIONS
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
33
. . . . . . . . . . . .
4.2.2
Generating Dynamic Queries from RDF
Due to dynamic nature of XML and XSL templates, I thought there can be also a way for generating dynamic queries. It is possible to generate two kinds of query from RDF. First type of queries that I call them numeric queries and the other is element queries. The first type, numeric queries, are those queries that their results are the number of a specified element. In other words, they count the frequency of an element in RDF. For example, how many chapters exist in my document. Unfortunately, several well-known issues in SQL, like GROUP BY and aggregate functions such as SUM() and COUNT() are not available in RDQL and SPARQL. Therefore, I defined a protocol for counting queries. This simple protocol operates as follows: If a query starts with a special character (#), that means the number of results should be returned; Otherwise the results will be returned. The second type, element queries, are those queries that their results are a specific element in RDF. In other words, end users can access different parts of LATEX document by means of these queries. For example, end users can access the third item of first itemize of second section of fifth chapter of document. These kinds of queries can be automatically generated for all possible elements after generating RDF.
34 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
Beside these queries, I will also generate several static RDQL queries. For example, getting all definitions from document. More information about structure and source code of these queries is available in implementation chapter.
4.2.3
Two Ways for Querying Generated RDF
After generating RDF and maybe queries, end users should be able to query RDF and get desired results. There exist two general approaches for executing queries against RDF. The first one is writing a query in RDQL/SPARQL format and executing it. This kind of query can be generated automatically or can be selected from predefined queries which I explained in previous section, or end users can simply write their own queries in RDQL/SPARQL and execute them. The other approach is a query language similar to human language. Actually, in this approach, end users should be familiar with the structure of elements and IDs in RDF. They should know how IDs are generated. In other words, in this kind of query, end users give only the ID of desired element and a RDQL query will be automatically generated using element ID and will be executed against RDF. As an example, end users may say, I need the title of first subsection of second section of third chapter of document.
4.2.4
Generating XSL-FO from LATEX
The way I generate XSL-FO from LATEX document is very similar to generating RDF. In other words, for generating XSL-FO, firstly I generate dynamic XSL templates and then I apply them to the XML document of LATEX . Figure 4.9 demonstrates this approach and figure 4.10 shows the flowchart of this approach.
Figure 4.9: Overall View of Generating XSL-FO from LATEX Document
4.2.5
Generating Human Understandable Format from XSL-FO
After generating XSL-FO, end users should be able to see the results in a userfriendly manner. This can be achieved by finding a way to show the content of XSL-FO elements in a human understandable format, like HTML or PDF. I decided to use PDF for visualization purposes. For generating PDF, there exist two general approaches: LATEX off and LATEX on. Note that I use the phrase LATEX on for emphasizing that the output is generated with consideration of LATEX commands. Note that there exist some technologies like Adobe XMP [3] for adding metadata to PDF files, but it is mainly focused on general metadata,
4.2. SOLUTIONS
35
Figure 4.10: Flowchart of Proposed Algorithm for Generating XSL-FO from XML Using Dynamic XSLT
36 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
like author and creation date of a document and it can not be used for more specific metadata. 4.2.5.1
LATEX Off
In this approach, the output will be generated without any consideration of LATEX commands. In other words, the LATEX commands and plain text of document are embedded together in a PDF file. The advantage of this approach is that due to accessibility of third-party packages like Apache FOP, for transforming XSL-FO into PDF, this stage can be done straightforward. This approach is also faster than second approach. The disadvantage is that, when the text contains many LATEX commands, reading the generated PDF may sound hard. Figure 4.11 demonstrates the general flowchart for transforming XSL-FO into PDF using Apache FOP. I try to describe this algorithm. This algorithms turns on with the input XSLFO file. After getting input file, it will be separated into several small XSL-FO files, according to configuration. Each file is stored in output folder. The folder name comes from configuration. After separating XSL-FO into several small XSL-FO files, Apache FOP will be employed and PDF files will be generated. As I said, this method is fast, but it is not a good visualization method, because Apache FOP does not understand LATEX commands. 4.2.5.2
LATEX On
In this approach, the output will be generated with considerations of LATEX commands. In other words, a compiler that understands LATEX commands will be employed and after compiling source code, PDF files are generated. This stage is a bit hard, needs text processing algorithms and many configuration items and much more. The main problem in this approach is fetching desired part of source from main source file. After getting an excerpt from LATEX source file, pdflatex will be invoked for generating PDF from source file. Figure 4.12 demonstrates the overview of this general approach for getting the desired part of LATEX source code. I try to describe this algorithm in detail. I would say this algorithm was one of the most complex algorithms in my thesis. I can break this algorithm into several small sub-algorithms: Purifying XML, purifying LATEX source code, text processing algorithm, post-processing tasks, generating source files and finally invoking pdflatex. In followings, I will discuss each sub-algorithm. • Purifying XML: In this sub-algorithm, several changes will be applied on XML file that has been generated by latex2xml. Actually, these tasks are not configurable by means of latex2xml configuration files; therefore, some purification tasks are needed on generated XML. These purification tasks
4.2. SOLUTIONS
37
Figure 4.11: Flowchart of an Algorithm for Transforming XSL-FO into PDF Using Apache FOP
38 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
Figure 4.12: General Overview of LATEX On Method
39
4.2. SOLUTIONS
can be done via XSL templates or text processing algorithms. I used the suitable way for each task. I developed a static XSLT for purifying math formulas. Listing 4.3 shows a static XSL template for adding a $ sign at front and end of math formulas in whole document. Besides this purification task, I should also handle special LATEX characters. For achieving this goal, XML will be the best choice rather than LATEX source file or XSL-FO document, because the characters, that should not change, do not exist in XML, but exist in LATEX source code or XSL-FO. In this stage, some text processing algorithms should exist for handling these special characters. The question is which special characters should be handled. The list of characters, that should change, come from configuration file. Some default characters exist in default configuration file, but it is extensible and end users can define many new characters. The structure of this configuration file has been explained in configuration section. One more important thing during purification is the characters that should not change. For example, verbatim environments, math formulas and CDDATA sections should not change. To achieve this goal, I remove temporarily these environments from XML file and after substitution of LATEX characters, these environments will come back. As I mentioned, such tasks will be done perfectly only in XML file, rather than source file or XSL-FO. The aim of purification of XML is producing a better XSL-FO. Actually, I use this new XML for generating dynamic XSL templates and producing XSL-FO. Purifying XML is a preprocessing task. Listing 4.3: An XSL template for adding $ sign in math formulas 1 2 3 4 5 6 7 8 9 10 11 12 13 14
s e l e c t =”@∗ | node ( ) ”/>
$< x s l : v a l u e −o f />$
s e l e c t =”.”
• Purifying LATEX source file: In this sub-algorithm, LATEX source file will be purified for using in next text processing algorithms. These tasks aim to enhance the structure of source file. Firstly, all verbatim environments will be temporarily removed from source file. These environments will be stored in a temporary place for retrieving in next sections. The reason is that verbatim environments may contain some commands and characters which lead to an unsuccessful text processing. Next, all comments will
40 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
be deleted from source file. The reason is that comments play no role for generating PDF, but they may contain some characters that result to an unsuccessful text processing algorithm in next sections. In next stage, source file should be examined whether there exists the keyword chapter between beginning the document and first section of document or not. If not, an empty chapter keyword (\chapter{}) will be added to source code. The reason is that, text processing algorithms are highly dependable on this keyword. After doing these steps, the source code has been purified and is ready for text processing algorithms. Purifying LATEX source file is also a pre-processing task. • Text processing: In this step, the purified XSL-FO and LATEX source code will be used as input for text processing algorithm. The other input would be the desired part of source file. This part is identified by an ID that exists in XSL-FO file. In this algorithm, I get the desired ID and according to ID, I try to find the suitable part from source code. In this process, the index of chapter and index of section will be extracted from ID. After finding the suitable chapter and section, I begin to dig inside it. The aim of digging is finding the last word in ID according to its index. As an example, suppose the ID is document1 bodymatter1 chapter1 section2 definition3 ; In this case, I try to find second section of first chapter and I try to dig inside it to find the third definition. This is simply done by finding the third \begin{definition} and third \end{definition} in founded section and returning the text that is bounded by these two commands. Note that comments and verbatim environments have been removed in pre-processing steps, therefore they have no effect on search result. In some cases like section titles or paragraphs, there exists no explicit LATEX command for specifying these parts. Therefore, it leads to an unsuccessful search in LATEX source code. In this case, the desired text will be extracted from XSL-FO and will be returned. • Post-processing tasks: After getting the desired part of source file, I should do some post-processing tasks. In this step, the effects of pre-processing tasks are neutralized. In other words and as an example, the verbatim environments that had been removed temporarily from source code in preprocessing tasks, will be added again in LATEX source code. • Generating source file: In this step, all preamble commands from main LATEX source file will be copied to a new file and the contents that have been extracted and purified from main source file in previous section will be added to this file and finally it will be stored in output folder. The path to output folder will be determined by means of configuration file.
4.2. SOLUTIONS
41
Figure 4.13: Flowchart of Proposed Algorithm for Generating PDF Using an Excerpt of LATEX Source File by Means of pdflatex
42 CHAPTER 4. EXTRACTING METADATA, GENERATING RDF AND XSL-FO
Figure 4.14: Flowchart of Proposed Algorithm for Finding an Excerpt of LATEX Source File
4.3. DISCUSSION AND CONCLUSION
43
• Invoking pdflatex: In this stage, pdflatex will be invoked according to configuration. The path to pdflatex, the path to required source files and the other information will be determined by means of configuration file. For more information regarding configuration items, refer to implementation chapter. Figure 4.13 demonstrates the general flowchart of proposed algorithm for transforming XSL-FO into PDF by means of pdflatex. Figure 4.14 demonstrates text processing algorithm for finding an excerpt from LATEX source file. This algorithm is used in the algorithm demonstrated in figure 4.13.
4.3
Discussion and Conclusion
In this chapter, I presented the problems that should be solved in my thesis. These problems were mainly focused on generating RDF, the ability to query it, and visualizing different parts of source file. After introducing problems, I presented my solutions for solving the problems. I presented several algorithms for generating RDF and XSL-FO from LATEX source file. These algorithms produce dynamic XSL templates for transforming an XML document into RDF or XSLFO. An algorithm for generating unique ID for RDF elements has been explained. The generated RDF is based on an ontology for LATEX documents that I described it. I explained two general ways for generating dynamic queries by means of RDF model. For visualization part, two methods (LATEX on and LATEX off) have been introduced. In LATEX on method, with the help of LATEX compiler, a better view will be generated, but it is a bit complex and has a bit more configuration details. In LATEX off method that is faster than first method, the generated view does not care on LATEX commands. In next chapter, I will cover implementation issues.
Chapter 5
Implementation
As I explained before, I named the application latex2rdf, as it is a common LATEX to RDF converter. However the name latex2rdf does not cover other functionalities of application like generating XSL-FO, querying RDF etc., but I think it is simple and clean. In this chapter, I focus on implementation issues of my thesis. I will describe the software engineering methodology that I used in my thesis and also timetable of development. I will explain latex2rdf features and functionalities. I will also explain its configuration file and how latex2rdf can be configured by means of an XML file. Graphical User Interface (GUI) of latex2rdf and its different elements will be described. Structure of output folders, different types of query and several UML diagrams will be presented. I will also present an overview of tools and third-party packages that I used in my thesis. I will also cover tips for generating XML and several hints regarding latex2rdf. After explaining different issues of implementation, I will present an example to demonstrate latex2rdf and its outputs.
5.1
Methodology
For implementing latex2rdf, I used IBM Rational Unified Process [41] methodology. According to IBM RUP, one of the software development principles is developing software iteratively. latex2rdf was made in several iterations. In each iteration, some features were added and some bugs were fixed. Figure 5.1 [41] demonstrates RUP software development lifecycle. In next sections, I will take a look at each iteration and timetable of developing latex2rdf. 45
46
CHAPTER 5. IMPLEMENTATION
Figure 5.1: IBM Rational Unified Process Software Development Lifecycle
5.1.1
Iterations
latex2rdf was developed in several main iterations. In followings, I list main changes in each iteration: Main features of first iteration: • Generating a simple RDF from a LATEX document • A simple graphical user interface • Ability to execute very simple RDQL queries Main features of second iteration: • Generating a more detailed RDF from LATEX document • A better graphical user interface • Ability to execute different RDQL queries • User interaction by means of suitable messages and alerts • Generating XSL-FO and PDF from LATEX documents • Generating dynamic RDQL queries from RDF • Generating several general static RDQL queries • Configuration issues
47
5.2. QUERIES
Main features of third iteration: • Generating a more detailed RDF from LATEX document • A better graphical user interface • Ability to execute SPARQL queries • Ability to configure latex2rdf in detail (generating RDF and XSL-FO) • A better visualization of PDF files In each iteration, I fixed possible bugs and tried to do my best to make a clean code. I tried to add helpful comments and JavaDoc for a better understanding of different parts and methods of source code.
5.1.2
Timetable
Roughly speaking, I started my thesis in April 2006. The first month was mainly focused on understanding concepts, analysis and planning. In first month, I got more familiar with LATEX environment, different tools of LATEX , available tools for generating XML from LATEX and so on. From May 2006 till August 2006, I designed, implemented and tested latex2rdf in three main iterations. After that, I focused on writing thesis and fixing several bugs and also adding several features to latex2rdf. A Gantt chart is a useful tool for planning and scheduling projects. Figure 5.2 demonstrates Gantt chart of my thesis.
Figure 5.2: Gantt Chart of My Thesis
5.2
Queries
In previous chapter, I explained different types of RDQL/SPARQL queries. In this section, I explain static and dynamic queries, that are produced by latex2rdf, in a more detailed manner.
48
5.2.1
CHAPTER 5. IMPLEMENTATION
Dynamic Queries
Basically, latex2xml produces two groups of queries dynamically. The first group is element queries and the other is numeric queries. 5.2.1.1
Element Queries
Element queries are those queries that aim to return a specified element from document. For example, the title of second section of third chapter of document can be seen as an element query. The naming standards that latex2rdf uses for storing these queries are also very simple and user-friendly. These queries begin with GiveMe phrase and after that it comes an underline ( ) and after underline, the ID of desired element appears. For example, GiveMe document1 bodymatter1 chapter1 section1 par5 m4.rdql is an element query and means the fourth math formula of fifth paragraph of first section of first chapter of document. Listing 5.1 shows an element query. Listing 5.1: Content of GiveMe document1 bodymatter1 chapter1 section1 par12 m4.rdql File 1 2 3 4 5 6
SELECT ? x WHERE ( , , ?x )
5.2.1.2
Numeric Queries
Numeric queries are those queries that aim to return the number of a desired element from document. For example, the number of math formulas or the number of sections or subsections in a document are numeric queries. The naming standards which latex2rdf uses for storing these queries is also very simple and user-friendly. These queries begin with GiveMeNumberOf phrase and after that it comes an underline ( ) and after underline, the name of element appears. For example, GiveMeNumberOf Section.rdql and GiveMeNumberOf Tabular.rdql are two numeric queries. The former returns the number of sections and the latter returns the number of tabulars in document. Listing 5.2 shows a numeric query. Listing 5.2: Content of GiveMeNumberOf Section.rdql File 1 2 3
#SELECT ? x WHERE ( ?x , , )
5.2.2
Static Queries
Some general purpose static queries were also generated during my thesis. Due to structure of these queries, making them dynamically is not simple; therefore,
5.3. GRAPHICAL USER INTERFACE
49
these queries were developed statically. Listing 5.3 shows a simple static query. This RDQL query will return all footnotes located in document. Listing 5.3: Content of GiveMeAllNote.rdql File 1 2 3 4 5
SELECT ? z WHERE ( ?x , , ) , ( ? y , , ) , ( ? y , , ? x ) , ( ? y , , ? z )
5.3
Graphical User Interface
For developing Graphical User Interface (GUI) of latex2rdf, Java Swing [27] was used. Swing released after Java AWT and is a new library from Sun Microsystems for developing GUI controls. It supports many features that are needed for developing an advanced user interface. Figure 5.3 is a snapshot of latex2rdf GUI.
Figure 5.3: Main Graphical User Interface of latex2rdf In followings, I will describe each element of GUI:
50
CHAPTER 5. IMPLEMENTATION
• Element 1: It is the main window of LATEX or XML document. After loading LATEX or XML document, its content will be shown in this element. • Element 2: Pushing this button will lead to open a window for selecting desired LATEX or XML file. After selecting file, the content of it will be copied to Element 1. Figure 5.4 shows this window.
Figure 5.4: Load LATEX /XML Panel
• Element 3: Pushing this button will lead to open a window for selecting a file. After selecting file, the content of element 1 will be stored in desired file. Figure 5.5 shows this window.
Figure 5.5: Save LATEX /XML Panel
• Element 4: Pushing this button will lead to generate RDF. Actually, after pushing this button, the content of element 1 will be examined. If it is an XML file, then it will be directly transformed into RDF; otherwise, the
5.3. GRAPHICAL USER INTERFACE
51
content of element 1 will be firstly translated to XML and then will be transformed into RDF. • Element 5: This element is RDF box. After generating RDF, it will be copied to this box. If generating RDF is not successful, the exception message and/or a guideline will be shown in element 21. After generating RDF, model will be generated automatically. Note that after generating RDF, the RDF file will be also presented in a separated frame or window. Figure 5.6 demonstrates this window.
Figure 5.6: RDF/XSL-FO Window
• Element 6: Pushing this button will lead to open a window for selecting a RDF file. After selecting file, the content of RDF will be copied into element 5 and RDF model will be generated automatically. That means, latex2rdf can be also used as a stand-alone application for generating queries and also for executing RDQL/SPARQL queries against RDF. Figure 5.4 shows this window. • Element 7: Pushing this button will lead to generate dynamic queries according to RDF model. All queries will be stored in output folder. The path
52
CHAPTER 5. IMPLEMENTATION
to output folder is extracted from configuration file. For understanding the structure of these queries, refer to previous sections. • Element 8: Pushing this button will lead to generate XSL-FO. Actually, after pushing this button, the content of element 1 will be examined. If it is an XML file, then it will be directly transformed into XSL-FO; otherwise, the content of element 1 will be firstly translated to XML and then will be transformed into XSL-FO. • Element 9: This element is XSL-FO window. After generating XSL-FO, it will be copied to this window. If generating XSL-FO is not successful, the exception message and/or a guideline will be shown in element 21. Note that after generating XSL-FO, the XSL-FO file will be also presented in a separated frame or window. Figure 5.6 demonstrates this window. • Element 10: Pushing this button will lead to generate PDF files. Actually, after pushing this button, the content of element 9 will be cleaved into several small XSL-FO or LATEX files and each XSL-FO or LATEX file will make a PDF document. • Element 11: This element is RDQL/SPARQL query window. End users can load into or simply write a query in it. • Element 12: This element chooses the query type. latex2rdf has the ability to execute two kinds of query: RDQL and SPARQL. According to query type, end users should select the right type from radio button. • Element 13: This element acts as a cleaner. Pushing this button will clear the content of element 11. • Element 14: Pushing this button will lead to open a window for selecting desired query file. Supported query types are RDQL and SPARQL queries. After selecting file, the content of it will be copied to Element 11. Figure 5.4 shows this window. • Element 15: Pushing this button will get the query from element 11 and execute it against RDF in element 5. • Element 16: This element is result box. After executing query using element 15, the results will be shown in this box. Two phrases – Begin – and – End – are being used for separating results of different queries. A better view, most likely for RDQL queries with one variable and SPARQL queries with two variables is also available after using element 15. A snapshot of this auxiliary window is demonstrated in figure 5.7.
5.3. GRAPHICAL USER INTERFACE
53
Figure 5.7: Snapshot of a Sample Result Window
• Element 17: This element aims to be a simple query box for the users, who know the structure of elements and IDs in RDF. In this case, end users write the hierarchy of desired element which is actually the ID of that element in this window. For example a hierarchy like document1 bodymatter1 chapter1 section1 stitle1 means first title of first section of first chapter of first bodymatter of first document and its query means “give me all information regarding this ID from RDF”. Another example: An expression like document1 bodymatter1 chapter2 section3 itemize2 item3 par5 means fifth paragraph of third item of second itemize of third section of second chapter of first bodymatter of first document. If such a hierarchy exists in IDs of elements, all information (triples) regarding this ID will be extracted from RDF (element 5) and will be shown in result box (element 16) after pushing element 18. In case of lack of such an ID, en empty string surrounded by – Begin – and – End – will be shown. • Element 18: Pushing this button will build a RDQL query based on information located in element 17 and execute this query against element 5 and show the results in element 16. • Element 19: Pushing this button will lead to open a window for selecting a file. After selecting file, the content of element 16 will be stored in desired file. Figure 5.5 shows this window. • Element 20: This element acts as a cleaner. Pushing this button will clear the content of element 16. • Element 21: This element is status box. This box aims to be a one-way communication window between application and end users. All exceptions, messages, guidelines and/or other kinds of alerts are presented in this box. After launching application, a welcome message will be shown and it tries to
54
CHAPTER 5. IMPLEMENTATION
load the path to JDK 1.4 home folder from configuration file and a message indicating this path will be shown. If this value is not set in configuration file, the current folder is assumed to be JDK 1.4 home folder and a dot sign will be shown as JDK 1.4 home folder. • Element 22: This element acts as a cleaner. Pushing this button will clear the content of element 21. • Element 23: Pushing this button will lead to open a window for presenting several guidelines and hints regarding application. • Element 24: Pushing this button will lead to open a window for presenting several messages about application. • Element 25: Pushing this button will finish the application. • Element 26: This element is logo of application. Logo of latex2rdf was generated using a free online logo generator. For more information regarding this logo generator, refer to [10]. Figure 5.8 shows this logo.
Figure 5.8: latex2rdf Logo
• Element 27: This element contains two radio buttons for generating PDF documents. These two radio buttons are pdflatex and Apache FOP. After pushing the button in element 10, the value of this radio button will be read and according to its value (pdflatex or Apache FOP), desired tool or package will be employed for generating PDF files.
5.4
Output
The application generates several kinds of output and stores them in several different output folders. Figure 5.9 demonstrates the hierarchy of output folders. The name of folders is clear. In followings, I will explain a bit more about structure of folders. • Output root folder: This folder shows the root folder of output. The path to this folder is stored in configuration file and end users can change it to point to their desired output root folder. • RDF folder: The RDF file, generated by the application, will be stored in this folder.
55
5.4. OUTPUT
• PDF folder: The PDF documents, generated by the application, will be stored in this folder. • XSL folder: The XSL files, generated dynamically by the application for transforming XML into RDF and XSL-FO, will be stored in this folder. • XSL-FO folder: XSL-FO documents, generated by the application, will be stored in this folder. These documents include actually the main XSL-FO document and also small XSL-FO files of main file. • TEX folder: LATEX documents, generated by the application, will be stored in this folder. These documents are actually the representation of separated elements of main source file. • Queries/Elements folder: The element queries, generated by the application, will be stored in this folder. • Queries/Numeric folder: The numeric queries, generated by the application, will be stored in this folder. • Queries/Predefined folder: This folder is a static folder and application does not write anything in it. It contains the general static predefined RDQL queries. Note that this folder exists in default output root folder of application (out folder) and will not be copied to other output root folders. Therefore, end users can access these queries always from this folder.
Figure 5.9: Hierarchy of Output Folders Note that old files with the same name in folders will be overwritten without any notification. For avoiding this issue, end users can simply change the output root folder in configuration file or move old files to a new place.
56
5.5
CHAPTER 5. IMPLEMENTATION
Configuration
Configuration of latex2rdf is done by means of an XML file named config.xml. latex2rdf tries to load configuration items from this file which should be located under config/latex2rdf folder. Listing 5.4 shows DTD file of config.xml. This DTD is stored in the same folder as config.xml. Listing 5.4: config.dtd: DTD of config.xml 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Listing 5.5 shows an example of config.xml. This config.xml is default config file in latex2rdf package. Listing 5.5: A Sample Configuration File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
C: / j a v a / j 2 s d k 1 . 4 . 2 1 2 C: / Programme/MiKTeX 2 . 5 / miktex / b i n / p d f l a t e x . exe E : / w o r k s p a c e / T e s t c a s e s / M a t t h i a s f i l e s 1 ¨ ¨ ¨
5.5. CONFIGURATION
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
57
d a t a / p u r i f y / afterXSLFO / p u r i f y 1 . x s l out 0 0 - d e f i n i t i o n
- example
- p r o o f
- theorem
- lemma
- c o r o l l a r y
- c o n j e c t u r e
- quote
- t a b u l a r
- i t e m i z e
I will describe the role of each element in configuration tasks. • jdk14: This element should point to JDK 1.4 folder. latex2xml needs JDK 1.4 for producing XML. • pdflatex: This element is being used for pdflatex. It contains several children that all of them are related to pdflatex. In followings, I will describe each child of pdflatex element. – path: This element should point to main file of pdflatex. Actually, latex2rdf will get the value of this element and invoke pdflatex exactly according to its path. If this element is not set, the default value of it would be pdflatex.exe. – sourceNeededFiles: This element should point to folder, where needed source files for compiling LATEX source and invoking pdflatex exist. For example, if LATEX source file uses some images or external files, this element should point to folder, where these images are stored in it. – laTeXSubstitutionRules: This element is actually a list of substitution rules. The reason I need these rules has been described in previous sections. This element contains zero or more insteadOf element.
58
CHAPTER 5. IMPLEMENTATION
∗ insteadOf: Each insteadOf element indicates a substitution rule. This element has two attributes: this and use. The attribute this indicates the character or the list of characters that should be removed and the attribute use indicates the character or the list of characters that should be replaced with previous characters. For example an statement like means replacing all { characters in document with \{. • outputPurify: This element is being used for purifying output. – beforeRDF: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying before generating RDF. This element can exist zero or more times inside of its parent element. – afterRDF: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying after generating RDF. This element can exist zero or more times inside of its parent element. – beforeXSLFO: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying before generating XSL-FO. This element can exist zero or more times inside of its parent element. – afterXSLFO: This element contains zero or more xsl elements. ∗ xsl: This element shows the path of an XSL file containing XSL templates for applying after generating XSL-FO. This element can exist zero or more times inside of its parent element. • outputFolder: This element points to root output folder of latex2rdf. In previous section, I described about structure of output folders. • fullContentItems: This element is a collection of item elements. – item: This element is child of fullContentItems. Each item tells latex2rdf to generate a content tag in RDF. The content tag contains all contents of that element. • fullRDF: This element is being used for limiting the size of RDF. If this element is set to 0, that means latex2rdf should not generate a full RDF, and it should stop when it sees an element that has been mentioned in item. That means, latex2rdf does not go deeper for generating RDF, as long as it sees an element defined in item. If it is not set to 0, latex2rdf will generate a full RDF.
5.6. SEQUENCES
59
• fullXSLFO: This element is being used for limiting the size of XSL-FO. If this element is set to 0, that means latex2rdf should not generate a full XSLFO, and it should stop when it sees an element that has been mentioned in item. That means, latex2rdf does not go deeper for generating XSL-FO, as long as it sees an element defined in item. If it is not set to 0, latex2rdf will generate a full XSL-FO.
5.6
Sequences
As I explained in previous sections, there exist five main use cases for my thesis. In this section, for each use case, I present a UML sequence diagram to clarify how it works.
5.6.1
Sequence of Generate RDF Use Case
Figure 5.10 demonstrates sequence diagram of Generate RDF use case. I try to describe its process briefly: Firstly, latex2rdfUI checks whether the input content is XML or a LATEX document. If it is a LATEX document, latex2rdf tries to invoke latex2xml. After a successful invocation, it calls makeRDF() method of RdfGenerator class. This method gets the XML file and generates dynamic XSL templates and stores them in an XSL file. createInit() method generates actually the header of XSL file. The path of XSL file is set according to configuration file. After storing XSL file, it creates a new instance from XSLTApplier class and invokes makeTransform() method of this class and it will apply XSL templates on XML file and finally, RDF will be generated.
5.6.2
Sequence of Generate Query Use Case
Figure 5.11 demonstrates sequence diagram of Generate Query use case. I try to describe its process briefly: As I explained before, two main general types of query will be generated automatically: numeric and element queries. For making numeric queries, latex2rdfUI creates a new instance of NumericQueryGenerator class and invokes its generateQuery() method. This method gets the generated RDF model as input and according to RDF model, numeric queries will be generated. As I said before, numeric queries are actually the number of results. Note that an special sign (#) at front of a query makes it a numeric query. After successful generation of numeric queries, element queries will be generated. For this purpose, latex2rdfUI creates a new instance of ElementQueryGenerator class and invokes its generateQuery() method. This method gets the generated RDF model as input and according to RDF model, element queries will be generated.
60
CHAPTER 5. IMPLEMENTATION
Figure 5.10: Sequence of Generate RDF Use Case
Figure 5.11: Sequence of Generate Query Use Case
5.6. SEQUENCES
5.6.3
61
Sequence of Execute Query Use Case
Figure 5.12 and 5.13 demonstrate Execute Query use case. The former is sequence of executing RDQL queries and the latter is sequence of executing SPARQL queries. In RDQL query process, latex2rdfUI generates a new instance of RDQLRunner class and invokes its ExecuteRDQLQuery() method. This methods gets RDF model and query as input and tries to invoke the query against model. Firstly, removeSpecialSign() will be invoked. This method gets query as input and checks whether it is a numeric or element query. If it is a numeric query, the special sign (#) will be removed from query. After executing query using Jena and ARQ engine, the method addToHashmap() will be executed. This method gets key and value as inputs and stores it in a HashMap. These two parameters are actually the result of query. This HashMap will be used for result window in visualization step. The process of executing SPARQL queries is very similar to RDQL queries. Therefore, I do not explain it once more.
Figure 5.12: Sequence of Execute Query Use Case (RDQL Query)
5.6.4
Sequence of Generate XSL-FO Use Case
Figure 5.14 demonstrates sequence diagram of Generate XSL-FO use case. I try to describe its process briefly: Firstly, latex2rdfUI checks whether the input content is XML or a LATEX document. If it is a LATEX document, latex2rdf tries to invoke latex2xml. After a successful invocation, it calls makeXSLFO() method from XSLFOGenerator class. this method gets the XML file and generates dynamic XSL templates and stores them in an XSL file. The path to XSL file is set according to configuration file. After storing XSL file, it invokes makeTrans-
62
CHAPTER 5. IMPLEMENTATION
Figure 5.13: Sequence of Execute Query Use Case (SPARQL Query)
form() method of XSLTApplier class and it will apply XSL templates on XML file and finally, the main XSL-FO file will be generated.
Figure 5.14: Sequence of Generate XSL-FO Use Case
5.6. SEQUENCES
5.6.5
63
Sequence of Generate PDF Use Case
Figure 5.15 and 5.16 demonstrate sequence diagrams of Generate PDF use case. As I mentioned before, there exist two general ways for generating PDF from source code: LATEX on method and LATEX off method. Figure 5.15 shows first method and figure 5.16 demonstrates the second method. In LATEX on method, an object from FOSeparator2Tex class will be generated. This object aims to separate the main XSL-FO file into several small LATEX files and it will be done with the help of LATEX source code and XSL-FO file. After that, separateFOAndGeneratePDF() method will be invoked. This method gets the path to main XSL-FO file as input. Inside this method, a recursive method called generateSimpleTex() will be invoked. This method is actually a traverse method for all elements and inside it, another method named getContent() will be invoked. getContent() method tries to find suitable content for pdflatex with the help of LATEX source code and also XSL-FO file. After getting the desired content, a new instance from SimpleTexCreator class will be generated. After that, createTex() method of this class will be invoked. This method gets the content and a file name and stores this content into desired file. Finally, a new instance from Tex2PDF class will be generated and its method called convertTex2PDF() will be invoked. This method calls actually pdflatex according to configuration and input files. After that, if no problem occurs, the PDF files will be generated. The process of generating PDF in LATEX off method is very similar to LATEX on method. Therefore, I do not explain it once more.
Figure 5.15: Sequence of Generate PDF Use Case Using pdflatex
64
CHAPTER 5. IMPLEMENTATION
Figure 5.16: Sequence of Generate PDF Use Case Using Apache FOP
5.7
Source Code
In this section, I try to explain source code and its structures. latex2rdf has been developed using Java programming language. Java is an object-oriented programming (OOP) language developed by Sun Microsystems. I used version 1.5 of Java Development Kit (JDK) [46], but it should also work with JDK 1.4. The application has been developed and tested mainly under Windows XP Service Pack 2, but as Java is a platform-independent programming language, it should be also executed with no problem under UNIX based operating systems and/or other operating systems with Java support. I tried to add always suitable JavaDoc comments to source code and methods.
5.7.1
Packages and Classes
The latex2rdf application is composed of several packages and classes. In next section, I try to explain each package and class in brief. • Package exceptions: This package contains exceptions which I use in source code. Package exceptions contains following classes: – Class Latex2XmlFailureException: This exception will be thrown, whenever generating XML from LATEX document is unsuccessful. After throwing this exception, end users should check latex2xml log files to see the cause of problem.
5.7. SOURCE CODE
65
– Class LatexWindowIsEmptyException: This exception will be thrown, whenever the user wants to generate RDF or XSL-FO, but the LATEX /XML box is empty and contains no data. – Class QueryExecutionException: This exception will be thrown, whenever execution of query is unsuccessful. In other words, when there exists a problem in query or RDF model is not available, this exception will be thrown. – Class Xml2RdfFailureException: This exception will be thrown, whenever generating RDF from XML is not successful. • Package queryengine: This package contains classes for generating dynamic queries and also executing different types of queries. Package queryengine contains following classes: – Class ElementQueryGenerator: This class contains methods for generating dynamic elements queries from RDF model and storing them in output folder. – Class NumericQueryGenerator: This class contains methods for generating dynamic numeric queries from RDF model and storing them in output folder. – Class RDQLRunner: This class can execute a RDQL query against a RDF model. This class gets as input one query and one RDF model and executes the query against model and stores the results in a HashMap for result window. – Class SPARQLRunner: This class can execute a SPARQL query against a RDF model. This class gets as input one query and one RDF model and executes the query against model and stores the results in a HashMap for result window. • Package rdfengine: This package contains class(es) for generating RDF from LATEX source code. Package rdfengine contains following class: – Class RdfGenerator: This class is the main class for generating RDF from LATEX document. This class contains all methods for generating XSL templates, applying them into source, generating RDF and storing it in output folder. • Package test: This package contains several tests, specially for utils package. Package test contains following classes: – Class TestConfigReader: This class contains several test cases for testing ConfigReader class.
66
CHAPTER 5. IMPLEMENTATION
– Class TestFileUtils: This class contains several test cases for testing FileUtils class. – Class TestFO2PDF: This class contains several test cases for testing FO2PDF class. – Class TestFOSeparator2Tex: This class contains several test cases for testing FOSeparator2Tex class. – Class TestLatex2Xml: This class contains several test cases for testing LATEX to XML converter. – Class TestLatexUtils: This class contains several test cases for testing LatexUtils class. – Class TestOutputPath: This class contains several test cases for testing ConfigReader class, specially for output folders. – Class TestQueryGenerator: This class contains several test cases for testing numeric and element query generator classes. – Class TestRDQLRunner: This class contains several test cases for testing RDQLRunner class. – Class TestRunApp: This class contains several test cases for testing the execution of an external executable file. – Class TestString: This class contains several test cases for working with strings. – Class TestXalan: This class contains several test cases for working with Xalan-Java. • Package ui: This package contains Graphical User Interface (GUI) of latex2rdf. Package ui contains following classes: – Class AboutFrame: This class is the main frame of About window. After pushing About button, this window will appear. – Class HelpFrame: This class is the main frame of Help window. After pushing Help button, this window will appear. – Class Latex2rdfUI: This class is the main user interface of latex2rdf. It contains the main() method of latex2rdf application. – Class OutputFrame: This class is the output frame for presenting RDF or XSL-FO content in a separate window. – Class ResultFrame: This class is the result window. After executing a query, the results will appear in this window. • Package utils: This package contains utility classes of application. Package utils contains following classes:
5.7. SOURCE CODE
67
– Class ConfigReader: This class contains different methods for handling configuration issues. – Class ExecuteFile: This class is being used for executing an external executable file. – Class FileUtils: This class contains several utility methods for working with files, such as saving a file, deleting some files, loading a file, etc. – Class LatexUtils: This class contains several utility methods for working with LATEX documents, for example removing verbatim environments, deleting comments from source file, etc. – Class StringUtils: This class contains several utility methods for working with strings. – Class XSLTApplier: This class contains methods for applying XSL templates into an XML file. • Package xslfoengine: This package contains classes for generating XSL-FO and converting them to PDF using Apache FOP or pdflatex. Package xslfoengine contains following classes: – Class FO2PDF: This class contains methods for generating a PDF document from XSL-FO file using Apache FOP. – Class FOSeparator2FO: This class contains methods for separating an XSL-FO file into several small XSL-FO files. – Class FOSeparator2Tex: This class contains methods for separating an XSL-FO file into several small LATEX files using XSL-FO and LATEX source file. – Class SimpleFOCreator: This class generates a simple XSL-FO file and stores it in output folder. – Class SimpleTexCreator: This class generates a simple LATEX document and stores it in output folder. – Class Tex2PDF: This class contains several methods for generating a PDF document from a LATEX document using pdflatex. – Class XSLFOGenerator: This class is the main class for generating XSL-FO from LATEX document. This class contains all methods for generating XSL templates, applying them into source, generating XSLFO and storing it in output folder.
5.7.2
License
Different parts of latex2rdf application are covered by various licenses, such as Hewlett-Packard Development Company License for Jena package. For more
68
CHAPTER 5. IMPLEMENTATION
information on each license, refer to each project/package Web site. The latex2rdf application was developed under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version [18]. Listing 5.6 shows latex2rdf license agreement. Listing 5.6: latex2rdf License Agreement /∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗ C o p y r i g h t ( c ) 2006 by Peyman N a s i r i f a r d ∗ All rights reserved ∗ ∗ T h i s f i l e i s p a r t o f t h e l a t e x 2 r d f p r o j e c t . The l a t e x 2 r d f p r o j e c t ∗ i s f r e e s o f t w a r e ; you can r e d i s t r i b u t e i t and / o r modify i t under ∗ t h e t e r m s o f t h e GNU G e n e r a l P u b l i c L i c e n s e a s p u b l i s h e d by t h e ∗ F r e e S o f t w a r e Foundation ; e i t h e r v e r s i o n 2 o f t h e L i c e n s e , o r ∗ ( a t your o p t i o n ) any l a t e r v e r s i o n . ∗ ∗ The GNU G e n e r a l P u b l i c L i c e n s e can be f o u n d a t ∗ h t t p : / /www. gnu . o r g / c o p y l e f t / g p l . html . ∗ ∗ T h i s f i l e i s d i s t r i b u t e d i n t h e hope t h a t i t w i l l be u s e f u l , ∗ but WITHOUT ANY WARRANTY; w i t h o u t even t h e i m p l i e d w a r r a n t y o f ∗ MERCHANTABILITY o r FITNESS FOR A PARTICULAR PURPOSE. See the ∗ GNU G e n e r a l P u b l i c L i c e n s e f o r more d e t a i l s . ∗ ∗ T h i s c o p y r i g h t n o t i c e MUST APPEAR i n a l l c o p i e s o f t h e f i l e ! ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗/
5.7.3
Installation
In this section, I explain the installation instructions of latex2rdf. The latex2rdf has a main JAR file called latex2rdf.jar. This JAR file can be invoked for execution of latex2rdf. Following steps should be done before invoking latex2rdf. • Check the configuration file of latex2rdf (config.xml) and set JDK 1.4.2 home folder, the path to pdflatex, etc. • Check the configuration files of latex2xml to see whether all options are available. • Make sure jena.jar (verion 2.4) and its related libraries, fop.jar (version 0.92 beta) and its relates libraries, xercesImpl.jar (version 2.8.0), xalan.jar (version 2.7.0), jdom.jar (version 1.0), and latex2xml.jar (version 1.2) exist in the same folder that latex2rdf.jar exist; otherwise, end users should open latex2rdf.jar and edit its manifest file to correct CLASSPATH. After doing above steps, latex2rdf can be invoked by a command like java -jar latex2rdf.jar. After execution of latex2rdf, main graphical user interface will appear and end users are able to work with it. Note that for backup purposes, there exist all configuration files, structure of folders and so on in latex2rdf.jar.
5.8. LESSONS LEARNED
5.8
69
Lessons Learned
In this section, I will describe several experiences that I learned during my thesis. I will also present several tips for running application and getting results. • latex2xml: latex2rdf is not dependable on latex2xml. As I explained in previous sections, dynamic XSL templates will be generated to transform XML into RDF and this process is not dependable on XML. In other words, if end users want to employ other so called LATEX to XML converters, like Tralics [31], the RDF should be also generated dynamically from this new XML structure. • purification: Sometimes, end users need to purify the XML using static XSL templates. As an example, I had experienced some kind of problem with XML documents. The problem was that latex2xml produced some undesired par/note tags. After XML, I removed these undesired tags with some static XSL templates. After purifying XML, I set it as input to latex2rdf and it generated a correct RDF file. As I explained before, there exists a way for automating this process by means of configuration file. • JDK version: latex2xml is highly dependable on JDK 1.4.2; therefore, end users should set the path to JDK 1.4.2 home folder in configuration file. • Operating System: latex2rdf was implemented in Java programming language; therefore it is platform independent. I developed and tested the application under Windows XP Service Pack II platform, but it can be executed in other environments, like Linux or Mac OS, too.
5.9
Main Tools
In this section, I introduce tools and main third-party packages that I used during my thesis. I am not going to advertise a tool or package, but only introducing it and describing its useful features, advantages and disadvantages.
5.9.1
Eclipse
Most parts of development was done in Eclipse. Eclipse is a free and open source Integrated Development Environment (IDE) for Java and J2EE applications. Its plugin-based architecture enables it to be extended easily by developers around the world. One of the most useful plugins for me during development was Visual Editor for Eclipse. I used it for generating Graphical User Interface (IDE). It is a powerful plugin for designing user interface and to some extend generating its source code. It supports both Swing and AWT. Generally, IBM is the sponsor
70
CHAPTER 5. IMPLEMENTATION
of Eclipse project. I used Eclipse version 3.1.2 during development. For more information on Eclipse platform and project, refer to [15].
5.9.2
Prot´ eg´ e
Prot´eg´e is a free open source tool for developing and working with ontologies that has been developed at Stanford University. Like Eclipse, it has a pluginbased architecture which enables it to be extended easily. I used Prot´eg´e for development of LATEX document ontology. I used Prot´eg´e version 3.1.1 in my work. For more information on Prot´eg´e, refer to [45].
5.9.3
Exchanger XML Editor
Exchanger XML Editor is a powerful XML editor that I used it during my work. It has a free license for academic purposes and also commercial license for other goals. Doing several useful tasks, like checking validity and well-formness of XML files, applying XSL templates using three different processors (build-in, SAXON and XALAN) are several good features of this tool. I also used it to check XML, RDF and XSL-FO structures. For more information on Exchanger XML Editor, refer to [17].
5.9.4
TeXnicCenter
TeXnicCenter is a free tool for working with TEX and LATEX documents. It has a powerful editor with spell checking feature in different languages. It can download required LATEX packages and libraries automatically from server. I used it for making sample LATEX documents for testing purposes. I used it also for preparing my thesis. For more information on TeXnicCenter, refer to [47].
5.10
Main Third Party Packages
In this section, I explain main open source third party packages that I used during my thesis. Their licenses are accessible via their Web sites.
5.10.1
Jena
Jena is an open source Java framework for developing semantic Web applications. It has a rich API for accessing RDF and RDFS. It can simply build a RDF model and query it. Jena has been developed at Hewlett-Packard (HP) Labs for semantic Web research. For more information on Jena project, refer to [22].
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
5.10.2
71
ARQ
ARQ is a SPARQL processor for Jena. ARQ supports multiple query languages, like SPARQL, RDQL and an extended form of SPARQL. I used it for executing SPARQL queries. For more information on ARQ, refer to [21].
5.10.3
JDOM
Java Document Object Model (JDOM) is a free package for reading, writing and manipulating XML documents. In other words, it is an open source Java-based Document Object Model (DOM) for XML. For more information on JDOM, refer to [23].
5.10.4
Apache FOP
Apache FOP (Formatting Objects Processor) is a free open source Java package that enables end users to read XSL-FO documents and generate a specific output, like PDF or HTML. I used it to transform simple XSL-FO documents into PDF. For more information on Apache FOP, refer to [6].
5.10.5
Xalan-Java
Xalan-Java is an open source XSLT processor for transforming XML documents into another XML document. It uses Xerces for working with XML files. I should mention that in generated XSL templates, there is no processor-dependable tag; therefore, I can simply use other XSLT processors. For more information on Xalan-Java, refer to [7].
5.11
Testing Solutions with an Example
For testing methods and algorithms that were presented in this thesis, my main supervisor, Prof. Dr. Nicola Henze, gave me a document for testing purposes. From this LATEX source file, RDF document and respectively dynamic queries should be generated. After generating RDF and queries, different queries would be executed against RDF document and results would be presented. Finally, for visualization part, suitable PDF files by means of XSL-FO and LATEX source file would be generated.
5.11.1
Input File
In this part, I present several statistical information regarding input file. LATEX input file is a big LATEX source file, called Codes and Designs. The file size is almost
72
CHAPTER 5. IMPLEMENTATION
100 KB. It contains many mathematical formulas and is written in German language. The generated PDF file from source file, using pdflatex, has 48 pages. It contains four chapters and twenty two sections.
5.11.2
Results
In this part, results will be presented and demonstrated using several snapshots of application. For generating RDF, the input file should be translated to XML and then transformed into RDF using XSL templates. I generated the XML file of source file using latex2xml outside of application. Because after generating XML, I had to purify the generated XML file using several static XSL templates. After generating XML, I loaded it to main window. Figure 5.17 demonstrates the main window of application, after loading XML input file into it.
Figure 5.17: After Loading Source XML File
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
73
After pushing RDF button, RDF file will be generated. Firstly, dynamic XSL templates will be generated and then these templates will be applied on XML and RDF will be generated. The generated XSL templates will be shown in application log messages. It will be also stored in output folder. Figure 5.18 demonstrates the result of generating RDF from source file.
Figure 5.18: After Generating RDF Note that after pushing Generate Query button, dynamic queries (element and numeric) will be generated and saved in output folder.
74
CHAPTER 5. IMPLEMENTATION
After pushing XSL-FO button, XSL-FO file will be generated. Firstly, dynamic XSL templates will be generated and then these templates will be applied on XML and XSL-FO will be generated. The generated XSL templates will be shown in application log messages. It will be also stored in output folder. Figure 5.19 demonstrates the result of generating XSL-FO from source file.
Figure 5.19: After Generating XSL-FO
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
75
The generated RDF can be queried and the results will be presented in main and result windows. Figure 5.20 demonstrates the results of Give Me All Definitions query after executing it. In result frame, the results of query is presented.
Figure 5.20: After Executing a Pre-defined Query
76
CHAPTER 5. IMPLEMENTATION
Figure 5.21 demonstrates the result of document1 bodymatter1 chapter1 section1 stitle1 query that is a simple query after executing it. After executing the query, all RDF triples related to this ID will be shown in result box. The main result that is actually the title of first section of first chapter of document is recognized with an arrow in figure.
Figure 5.21: After Executing a Simple Query
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
77
Figure 5.22 demonstrates the general view of application after working with it. All parts of this user interface were introduced in previous sections.
Figure 5.22: General View
78
CHAPTER 5. IMPLEMENTATION
Finally, end users are able to save the results. Figure 5.23 demonstrates the process of saving results.
Figure 5.23: Saving Results to a File
5.11.2.1
A Deeper Look at One Element
In this section, I present a deeper look at RDF and also visualization part of an element in results. I will consider the first example of second paragraph of sixth section of first chapter of document. Its ID in RDF will be document1 bodymatter1 chapter1 section6 par2 example1. As I explained, generating RDF is configurable and a concise or detailed RDF can be generated. Listing 5.7 shows the concise RDF for above element. In this case, RDF will be generated till example element is met and it would not go deeper.
79
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
Listing 5.7: Small RDF Description of an Example in Document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
(Hadamard−−M a t r i z e n d e r Ordnung 2ˆm)
\ l e f t ( \ b e g i n { a r r a y }{ c } +1 \ end { a r r a y }\ r i g h t ) und S 1 :=\ l e f t ( \ b e g i n { a r r a y }{ c c } +1 +1 \\ +1 −1 \ end { a r r a y }\ r i g h t ) sind Hadamard−−M a t r i z e n . F¨ u r m \ geq 1 b e t r a c h t e man den Vektorraum V:=\ F 2 ˆm n :=2ˆm v i e l e n Elemente von V i n e i n e r f e s t e n Weise a n g e o r d n e t und s e t z e n f u ¨ r a , b \ in V amp ; ( −1) ˆ{ ab ˆ\ t o p }
Wir denken uns d i e H( a , b ) & ; := &
H i e r d u r c h w i r d e i n e n \ t i m e s n−Matrix H mit E i n t r ¨ a g e n +1 o d e r −1 bestimmt . F¨ ur a \ i n V i s t \sum\ l i m i t s { c \ i n V}H( a , c )H( c , a )=\sum\ l i m i t s { c \ i n V}( −1) ˆ{ ac ˆ\ t o p+ca ˆ\ t o p}=n . I s t a \ n o t= b , s o i s t ( a+b ) ( i )=1 f u ¨ r wenigstens ein i \ i n \N m . B e z e i c h n e n w i r den i −t e n E i n h e i t s v e k t o r von V mit e i , s o e r h a l t e n w i r \sum\ l i m i t s { c \ i n V}H( a , c )H( c , b )=\sum\ l i m i t s { c \ i n V} ( −1) ˆ { ( a+b ) c ˆ\ t o p }=\ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} (( −1) ˆ { ( a+b ) c ˆ\ t o p } + ( −1) ˆ { ( a+b ) ( c+ e i ) ˆ\ t o p } )=\h s p a c e ∗{−1em} \ sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} 0=0. A l s o i s t H Hˆ\ t o p = n E n und f o l g l i c h H e i n e Hadamard−−Matrix .
80
CHAPTER 5. IMPLEMENTATION
Listing 5.8 shows an excerpt of the detailed RDF file. The detailed RDF file has been generated with consideration of all elements. Listing 5.8: Excerpt of Full RDF Description of an Example in Document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
< h a s b l k l i s t r d f : r e s o u r c e=”# d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 6 p a r 2 e x a m p l e 1 p a r 1 b l k l i s t 1 ”/> (Hadamard−−M a t r i z e n d e r Ordnung 2ˆm)
\ l e f t ( \ b e g i n { a r r a y }{ c } +1 \ end { a r r a y }\ r i g h t ) und S 1 :=\ l e f t ( \ b e g i n { a r r a y }{ c c } +1 +1 \\ +1 −1 \ end { a r r a y }\ r i g h t ) sind Hadamard−−M a t r i z e n . F¨ u r m \ geq 1 b e t r a c h t e man den Vektorraum V:=\ F 2 ˆm n :=2ˆm v i e l e n Elemente von V i n e i n e r f e s t e n Weise a n g e o r d n e t und s e t z e n f u ¨ r a , b \ in V amp ; ( −1) ˆ{ ab ˆ\ t o p }
Wir denken uns d i e H( a , b ) & ; := &
H i e r d u r c h w i r d e i n e n \ t i m e s n−Matrix H mit E i n t r ¨ a g e n +1 o d e r −1 bestimmt . F¨ ur a \ i n V i s t \sum\ l i m i t s { c \ i n V}H( a , c )H( c , a )=\sum\ l i m i t s { c \ i n V}( −1) ˆ{ ac ˆ\ t o p+ca ˆ\ t o p}=n . I s t a \ n o t= b , s o i s t ( a+b ) ( i )=1 f u ¨ r wenigstens ein i \ i n \N m . B e z e i c h n e n w i r den i −t e n E i n h e i t s v e k t o r von V mit e i , s o e r h a l t e n w i r \sum\ l i m i t s { c \ i n V}H( a , c )H( c , b )=\sum\ l i m i t s { c \ i n V} ( −1) ˆ { ( a+b ) c ˆ\ t o p }=\ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} (( −1) ˆ { ( a+b ) c ˆ\ t o p } + ( −1) ˆ { ( a+b ) ( c+ e i ) ˆ\ t o p } )=\h s p a c e ∗{−1em} \ sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} 0=0. A l s o i s t H Hˆ\ t o p = n E n und f o l g l i c h H e i n e Hadamard−−Matrix . 2ˆm . . . .
81
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
document1 bodymatter1 chapter1 section6 par2 example1 par1 H i e r d u r c h w i r d e i n e n \ t i m e s n−Matrix H mit E i n t r ¨ agen o d e r −1 bestimmt . . . . . H Hˆ\ t o p = n E n H
5.11.2.2
b l k l i s t 1 i t e m 2 ”/>
blklist1 item2 par4 m1
blklist1 item2 par4 m2
blklist1 item2 par4 m3
blklist1 item2 par4 m4
+1
blklist1 item2 par7 m1
blklist1 item2 par7
blklist1 item2 par7 m2
blklist1 item2 par7
Visualization of an Element
In this part, the visualization aspects of document1 bodymatter1 chapter1 section6 par2 example1 is demonstrated. In this case, a small LATEX file, listing 5.9, using main LATEX source file is generated. Listing 5.9: LATEX Source of Example 1 2 3 4 5 6
\ documentclass { report } \ u s e p a c k a g e { german } \ u s e p a c k a g e {amssymb} \ include { epsf } \ s e t l e n g t h {\ p a r i n d e n t }{0em}
82
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
CHAPTER 5. IMPLEMENTATION
\ s e t l e n g t h {\ p a r s k i p } { 1 . 5 ex } \ newcounter { b l k c o u n t e r } [ s e c t i o n ] \newcommand{\ newblk } [ 2 ] % {\ newenvironment {#1}[1]{% \renewcommand {\ t h e b l k c o u n t e r }{\ a r a b i c { c h a p t e r } . \ a r a b i c { s e c t i o n } . \ a r a b i c { b l k c o u n t e r }}% \ r e f s t e p c o u n t e r { b l k c o u n t e r}% {\ b f \ t h e b l k c o u n t e r \ v s p a c e { 0 . 5 em} #2.}% {\ h s p a c e ∗ { 0 . 5 em}\em##1}\\∗[\ p a r s k i p ]}% {\ v s p a c e ∗ { 4 . 5 ex }}} \ newcounter { b l k l i s t c o u n t e r } [ b l k c o u n t e r ] \ newenvironment { b l k l i s t }% {\ b e g i n { l i s t }{{\ b f ( \ a l p h { b l k l i s t c o u n t e r } )}}% {\ renewcommand {\ t h e b l k l i s t c o u n t e r }{\ t h e b l k c o u n t e r ( \ a l p h { b l k l i s t c o u n t e r } )}% \ u s e c o u n t e r { b l k l i s t c o u n t e r }\ p a r s e p 1 . 5 ex \ i t e m s e p 0 e x \ t o p s e p 0 e x \ p a r t o p s e p 0 e x }}% {\ end { l i s t }} \ newenvironment { p r o o f}% {{\ b f Beweis .}}% {\ h s p a c e ∗{\ f i l l } $ \Box$\ v s p a c e ∗{3 ex }} \ newblk { c o r o l l a r y }{ K o r o l l a r } \ newblk { c o n j e c t u r e }{ Vermutung} \ newblk { d e f i n i t i o n }{ D e f i n i t i o n } \ newblk { example }{ B e i s p i e l } \ newblk {lemma}{Lemma} \ newblk { theorem }{ S a t z } \newcommand{\ defem }{\em} \newcommand{\C}{\ mathbb{C}} \newcommand{\F}{\ mathbb{F}} \newcommand{\N}{\ mathbb{N}} \newcommand{\Q}{\ mathbb{Q}} \newcommand{\R}{\ mathbb{R}} \newcommand{\Z}{\ mathbb{Z}} \newcommand{\ g g t }{\mbox{ g g t }} \newcommand{\ Kern }{\mbox{ Kern }} \newcommand{\mod}{\mbox{mod}} \newcommand{\ supp }{\mbox{ supp }} \newcommand{\ wt }{\mbox{wt }} \newcommand{\ m l d e r r }{\ h s p a c e ∗ { 0 . 3 em}{\ s c r i p t s t y l e ?}\ h s p a c e ∗{ −0.70em}\ b i g c i r c } \newcommand{\DONOTTEX} [ 1 ] { } \ b e g i n { document } \ b e g i n { example } { ( Hadamard−−M a t r i z e n d e r Ordnung $2 ˆm$) } \ v s p a c e ∗{ −4.5 ex } \ l a b e l { hadaexa } \ begin { b l k l i s t } \ item $ \ l e f t ( \ b e g i n { a r r a y }{ c } +1 \ end { a r r a y }\ r i g h t ) $ und $ S 1 :=\ l e f t ( \ b e g i n { a r r a y }{ c c } +1 +1 \\ +1 −1 \ end { a r r a y }\ r i g h t ) $ sind Hadamard−−M a t r i z e n . \ item F\” ur $m \ geq 1 $ b e t r a c h t e man den Vektorraum $V:=\ F 2 ˆm$ Wir denken uns d i e $n :=2ˆm$ v i e l e n Elemente von $V$ i n e i n e r f e s t e n Weise a n g e o r d n e t und s e t z e n f \” ur $a , b \ i n V$ \ b e g i n { e q n a r r a y ∗} H( a , b ) & := & ( −1) ˆ{ ab ˆ\ t o p } \ end { e q n a r r a y ∗} H i e r d u r c h w i r d e i n e $n \ t i m e s n$−Matrix $H$ mit E i n t r \” agen $+1$ o d e r $−1$ bestimmt .
5.11. TESTING SOLUTIONS WITH AN EXAMPLE
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
83
F\” ur $a \ i n V$ i s t $ \sum\ l i m i t s { c \ i n V}H( a , c )H( c , a ) $ $=$ $ \sum\ l i m i t s { c \ i n V}( −1) ˆ{ ac ˆ\ t o p+ca ˆ\ t o p } $ $=$ $n$ . I s t $a \ n o t= b$ , s o i s t $ ( a+b ) ( i )=1$ f \” ur w e n i g s t e n s e i n $ i \ i n \N m$ . B e z e i c h n e n w i r den $ i $ −t e n E i n h e i t s v e k t o r von $V$ mit $ e i $ , s o e r h a l t e n w i r $ \sum\ l i m i t s { c \ i n V}H( a , c )H( c , b ) $ $=$ $ \sum\ l i m i t s { c \ i n V} ( −1) ˆ { ( a+b ) c ˆ\ t o p } $ $=$ $ \ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} (( −1) ˆ { ( a+b ) c ˆ\ t o p } + ( −1) ˆ { ( a+b ) ( c+ e i ) ˆ\ t o p } ) $ $=$ $ \ h s p a c e ∗{−1em} \sum\ l i m i t s { c \ i n V \ wedge c ( i ) =0} \ h s p a c e ∗{−1em} 0 $ $=$ $0$ . A l s o i s t $H Hˆ\ t o p = n E n$ und f o l g l i c h $H$ e i n e Hadamard−−Matrix . \ end { b l k l i s t } \ end { example } \ end { document }
Figure 5.24 demonstrates the output PDF file of the example element that was generated using pdflatex from the source code in listing 5.9.
Figure 5.24: A Sample Example from Codes and Design Document
84
5.12
CHAPTER 5. IMPLEMENTATION
Discussion and Conclusion
In this chapter, I discussed implementation issues of my thesis. Additionally, I described the way that I developed latex2rdf application. I used IBM Rational Unified Process as software development process and made the application in several iterations. I explained the features that were added in each iteration and also the required time for each iteration. Two general approaches for generating RDQL queries (dynamic and static) have been described. The application has a Graphical User Interface (GUI) that has been developed using Java Swing technologies. I described different parts of GUI and its structure and relations. I explained the structure of output folders, the place where results will be stored into them. I covered configuration issues of latex2rdf and its configuration structure and the role of each element in configuration file. I presented suitable sequences for a better understanding of the processes of use cases. In this chapter, different parts of source code, packages and classes have been also described. I introduced main tools and packages that I employed during my work. Finally, I presented an example and demonstrated different parts of application during presentation of example.
Chapter 6
Summary
In my M.Sc. thesis, Building a Gateway from Text Editing in LATEX to RDF, I developed several algorithms and one application for generating RDF from LATEX documents. This application can be imagined as a black box with several kinds of input and output. The input can be a LATEX source file, an XML file or a query and the output would be RDF document, XSL-FO file, several small LATEX documents, PDF files, generated dynamic queries and the result of execution of queries. At beginning of my thesis, I never used LATEX typesetting system for preparing my reports and documents and my knowledge about this typesetting system was limited. Firstly, I understood its architecture and commands and got familiar with its environments and tools. In this period, I learned a lot about LATEX and its architecture. I also checked, whether there exist some tools in this domain that can help me. I found latex2xml, as a free LATEX to XML converter, and I used it in my work. One of the main tasks in my thesis was making a RDF document form A L TEX source file. However, the way that I developed for generating RDF from LATEX is not unique, but I think it is an efficient way for it. In my proposed approach, the process would be generating XML from LATEX source file and transforming this XML into RDF via XSL templates. One of the main problems during generating RDF rose due to dynamic structure and nature of LATEX and therefore generated XML; because it is clear that the grammar of LATEX documents is very complex and hard to manage. That means XSL templates can not be developed statically, because it is very inefficient and offers developing a huge number of XSL templates. In my proposed approach that I have presented it in this thesis, XSL templates would be generated dynamically. This approach is easily extensible. XSL templates would be generated according to XML and by means of configuration file. Generating a 85
86
CHAPTER 6. SUMMARY
meaningful URI for each element was another problem. This problem has been addressed by means of static XSL templates that generate unique meaningful ID for each element. For this purpose, an XSL template gets parents, grandparents, parents of grandparents etc. of an element and produces a meaningful unique ID for an element using the names of its ancestors and their orders in the document. The generated RDF is based on an ontology, called LATEX document ontology, that I developed it during my thesis. However, this ontology does not contain all possible LATEX commands in a document, but it includes a reasonable subset of LATEX commands that is used commonly in LATEX documents. This ontology defines LATEX commands as concepts and if there exists a possibility for a command to be appeared inside another command, there exists also a property indicating this relationship. After generating RDF, I provided a dynamic approach for generating RDQL queries from RDF. These queries can be categorized into two main classes: element queries and numeric queries. In element queries, the main idea is getting a specific or desired element from LATEX document. Numeric queries are based on number of results. In other words, they count the number of occurrence of an element in document. Due to lack of COUNT() and/or similar functions in RDQL specification, I proposed a protocol for COUNT() function and that is adding a special character (#) at front of a query. In my work, I also generated some general purpose RDQL queries. These static queries can be applied for many documents. After generating queries, end users are able to load them into main Graphical User Interface (GUI) and execute them against RDF. End users can themselves write their own RDQL/SPARQL queries and execute them. There exists also another way for making a query. This approach is suitable for users, who are not familiar with RDQL queries, but they know the structure of generated RDF document. In other words, this kind of queries is specified by the ID of elements and end users type only the desired ID, and latex2rdf will transform it into a RDQL query and execute it. After execution of a query, the results will be presented in GUI and also in a separate frame. For visualization aspects, I presented a dynamic approach for generating XSLFO from LATEX documents. This approach is very similar to the already described method for generating RDF from LATEX source documents. XSL-FO files can be converted to other user-friendly formats like HTML or PDF using different tools and packages. I chose PDF for visualization part, because it is portable and easy to handle. Additionally, there exists also a vast number of tools for generating PDF files. For this purpose, two approaches have been proposed; the first one was using a third party package called Apache FOP and the other was pdflatex. The first approach is fast and secure, but the visualization is not clear. In other words, Apache FOP does not understand LATEX commands and they will be put in output PDF file with no care of semantic parts of commands. The other
87
approach uses pdflatex as a tool for generating PDF files. It is not as fast as first approach, but the visualization is much better than first one. For this approach, some text processing algorithms were developed. The text processing algorithms for generating PDF files using pdflatex aim to find a desired part in source file by aid of LATEX source and XSL-FO document. Some important issues were handling verbatim environments and comments in source file and also finding a specified chapter and section in it. latex2rdf is the name that I chose for my application, as it acts like a converter from LATEX to RDF. I should mention that this name does not reflect all functionalities of latex2rdf. latex2rdf has been developed using Java programming language, therefore it is platform-independent. latex2rdf has been developed and tested under Windows XP Service Pack II. latex2rdf is configurable by aid of an XML configuration file. Some issues like output (RDF, XSL-FO) configuration, path to pdflatex application, path to source required files, path of JDK 1.4 and so on can be configured by aid of configuration file. latex2rdf offers a Graphical User Interface (GUI) that contains all functionalities of it. The GUI is based on Java Swing technologies. The GUI contains also a status box (window). This window acts as a one-way user interaction message window. If something happens or a problem exists, this box will be updated. As extensions to latex2rdf application, I plan to add several extra features to it. Some features like updating the LATEX document ontology automatically, if there exists a new command; and auto-configuration of latex2xml are two cool capabilities that I have thought about. These features can help end users to have a more comfortable experience with latex2rdf. Finally, I would like to say that during my thesis, I learned much in different domains. I got a more detailed understanding regarding RDF and XSL-FO; I got familiar with LATEX typesetting system and many tools and packages that I have never used before.
Bibliography
[1]
A small collection of OWL Ontologies. http://protege.stanford.edu/plugins/owl/owllibrary/. [cited at p. 7]
[2]
W3C - The World Wide Web Consortium. http://www.w3.org/.
[3]
Adobe Company. XMP Extensible http://www.adobe.com/products/xmp/. [cited at p. 34]
[4]
Andreas Hirter, and Olivier Fankhauser, and Stefan von Niederh¨ausern. latex2xml: a LaTeX to XML translator. http://www.latex2xml.org/. [cited at p. 26]
[5]
Andreas Hirter, and Olivier Fankhauser, and Stefan von Niederh¨ausern. latex2xml project documentation. http://www.latex2xml.org/downloads/Projektbericht.pdf.
[cited at p. 8, 12]
Metadata
Platform.
[cited at p. 27]
[6]
Apache team. Apache FOP (Formatting http://xmlgraphics.apache.org/fop/. [cited at p. 71]
[7]
Apache team.
Apache Xalan-Java project.
Objects
Processor)
project.
http://xml.apache.org/xalan-j/.
[cited at p. 71]
[8]
Apostolos Syropoulos, and Antonis Tsolomitis, and Nick Sofroniou. Digital Typography using LaTeX. Springer, first edition, 2002. [cited at p. 18]
[9]
Bob DuCharme. XSLT Quickly. Manning, first edition, 2001.
[10] Cool Text team. Cool Text: http://www.cooltext.com/. [cited at p. 54]
Logo
[11] Cygwin team. Cygwin: a Linux-like http://www.cygwin.com/. [cited at p. 26]
and
[cited at p. 13, 15]
Graphics
environment
´ [12] Dave Pawson. XSL-FO. OReilly Media, first edition, 2002.
Generator.
for
Windows.
[cited at p. 14]
[13] David Taniar, and Johanna Wenny Rahayu. Web Semantics Ontology. Idea Group Publishing, 2006. [cited at p. 7] [14] Dublin Core Metadata Initiative Group. http://dublincore.org/. [cited at p. 6, 7]
89
Dublin Core Metadata Initiative.
90
BIBLIOGRAPHY
[15] Eclipse team - Sponsor:
IBM.
Eclipse project.
http://www.eclipse.org/.
[cited at p. 70]
´ [16] Erik T Ray. Learning XML. OReilly Media, second edition, 2003. [17] Exchanger XML Editor team. http://www.exchangerxml.com/. [cited at p. 70] [18] Free Software Foundation. http://www.gnu.org/copyleft/gpl.html.
Exchanger
GNU
[cited at p. 8]
XML
General
Public
Editor.
License.
[cited at p. 68]
[19] Grigoris Antoniou, and Frank van Harmelen. A Semantic Web Primer (Cooperative Information Systems). The MIT Press, 2004. [cited at p. 9] [20] HEVEA team. HEVEA - a LaTeX to HTML translator. http://hevea.inria.fr/. [cited at p. 21]
[21] Hewlett-Packard Development Company. ARQ: Query engine for Jena. http://jena.sourceforge.net/ARQ/. [cited at p. 71] [22] Hewlett-Packard Development Company. Jena: a Java framework for building Semantic Web applications. http://jena.sourceforge.net/. [cited at p. 70] [23] JDOM team. JDOM project. http://www.jdom.org/.
[cited at p. 71]
[24] Jean-Christophe Filliˆ atre, and Claude March´e. BibTeX2HTML - BibTeX to HTML. http://www.lri.fr/ filliatr/bibtex2html/. [cited at p. 21] [25] jlatex team. jlatex - an editor for latex2e. http://jlatex.free.fr/.
[cited at p. 20]
[26] Johannes Henkel. javabib - a bibtex parser written in Java. plan.cs.colorado.edu/henkel/stuff/javabib/. [cited at p. 25]
http://www-
[27] John Zukowski. The Definitive Guide to Java Swing. Apress, third edition, 2005. [cited at p. 49]
[28] Leslie Lamport. LaTeX: A Document Preparation System, User’s Guide and Reference Manual. Addison-Wesley, second edition, 1994. [cited at p. 17, 19, 20] [29] LaTeX2HTML team. LaTeX2HTML - a LaTeX to HTML convertor. http://www.latex2html.org/. [cited at p. 21] [30] Mark-Jason Dominus. vulcanize - a LaTeX to http://www.plover.com/ mjd/vulcanize.html. [cited at p. 21]
HTML
[31] Miaou team. Tralics: a LaTeX to XML translator. sop.inria.fr/apics/tralics/. [cited at p. 26, 69]
convertor.
http://www-
[32] Michael C. Daconta, and Leo J. Obrst, and Kevin T. Smith. The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management. Wiley, 2003. [cited at p. 9] [33] Michel Goossens, and Sebastian Rahtz, and Eitan Gurari, and Ross Moore, and Robert Sutor. The LaTeX Web Companion: Integrating TeX, HTML, and XML. Addison-Wesley, first edition, May 1999. [cited at p. 19, 20]
91
[34] Michel Klein. BibTeX-2-RDF translator. http://www.cs.vu.nl/ mcaklein/bib2rdf/. [cited at p. 20]
[35] Microsoft Corporation. Microsoft Office. http://office.microsoft.com/. [36] MiKTeX team. MiKTeX project. http://www.miktex.org/.
[cited at p. 17]
[cited at p. 19]
[37] Natalya F. Noy, and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology. http://protege.stanford.edu/publications/ontology development/ontology101noy-mcguinness.html. [cited at p. 7] [38] OCLC Research. Persistent Uniform Resource Locator. http://www.purl.org/. [cited at p. 19]
[39] Open Office team.
Free open source office suite.
http://www.openoffice.org/.
[cited at p. 17]
[40] Otfried Cheong. Hyperlatex. http://hyperlatex.sourceforge.net/.
[cited at p. 21]
[41] Philippe Kruchten. Rational Unified Process, The: An Introduction. Addison Wesley Professional, third edition, 2003. [cited at p. 45] [42] Raymond Seroul, and Silvio Levy, and D. Foata. A Beginner’s Book of TEX. Springer, first edition, 1991. [cited at p. 17] [43] Sean Bechhofer, and Frank van Harmelen, and Jim Hendler, and Ian Horrocks, and Deborah L. McGuinness, and Peter F. Patel-Schneider, and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/, February 2004. [cited at p. 9] ´ [44] Shelley Powers. Practical RDF. OReilly Media, first edition, 2003. [45] Stanford University. Prot´eg´e. http://protege.stanford.edu/.
[cited at p. 12]
[cited at p. 70]
[46] Sun Microsystems. Java Development Kit. http://java.sun.com/.
[cited at p. 64]
[47] TeXnicCenter team. TeXnicCenter. http://sourceforge.net/projects/texniccenter/. [cited at p. 70]
[48] The World Wide Web Consortium members. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/. [cited at p. 12] [49] Tim Berners-Lee. Why RDF model is different from the XML model. http://www.w3.org/DesignIssues/RDF-XML.html. [cited at p. 10] [50] Tim Berners-Lee. What the Semantic Web can represent. http://www.w3.org/DesignIssues/RDFnot.html, September 1998. [cited at p. 5] [51] Tim Berners-Lee, and James Hendler, and Ora Lassila. The Semantic Web, A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, May 2001. [cited at p. 6, 8] [52] Tim Hoffmann. jDvi - a viewer for dvi files. berlin.de/jdvi/. [cited at p. 20]
http://www-sfb288.math.tu-
92
[53] TtH team. TtH a TeX http://hutchinson.belmont.ma.us/tth/. [cited at p. 21]
BIBLIOGRAPHY
to
HTML
convertor.
[54] University of Maryland, Baltimore County. Swoogle Semantic Web Search Engine. http://swoogle.umbc.edu/. [cited at p. 7] [55] Using Dublin Core - The Elements. Dublin Core Metadata Initiative. http://dublincore.org/documents/usageguide/elements.shtml. [cited at p. 6, 7] [56] Victor Eijkhout. TeX by Topic, A TeXnician’s Reference. Addison-Wesley, first edition, 1992. [cited at p. 17] [57] William F. Hammond. GELLMU - A Bridge for Authors from LaTeX to XML that includes \newcommand with arguments. http://www.albany.edu/ hammond/gellmu/. [cited at p. 21] [58] Wine Ontology development team. Wine Ontology. http://www.w3.org/TR/owlguide/wine.rdf. [cited at p. 7] [59] Wolf Siberski. bibtex2rdf - A configurable BibTeX to RDF Converter. http://www.l3s.de/ siberski/bibtex2rdf/. [cited at p. 20] [60] World Wide Web Consortium engineers. http://www.w3.org/RDF/Validator/. [cited at p. 11]
RDF
validator.
Appendices
93
Appendix A
General Example
In this appendix, I present an example regarding the whole process, from LATEX source file to RDF and XSL-FO. Listing A.1 demonstrates a sample LATEX document. This document will be translated to XML. Listing A.1: A sample TeX file 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
% T h i s i s a sample LaTeX i n p u t f i l e . ( V e r s i o n o f 11 A p r i l 1 9 9 4 . ) % % A ’% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e a l l r e m a i n i n g t e x t on t h e l i n e , % and i s u s e d f o r comments l i k e t h i s one . \ documentclass { a r t i c l e }
% S p e c i f i e s t h e document c l a s s
\ t i t l e {An Example Document} \ a u t h o r { L e s l i e Lamport } \ d a t e { January 2 1 , 1994}
% % % %
The p r e a m b l e b e g i n s h e r e . D e c l a r e s t h e document ’ s t i t l e . D e c l a r e s t h e a u t h o r ’ s name . D e l e t i n g t h i s command p r o d u c e s today ’ s d a t e .
\newcommand{\ i p } [ 2 ] { ( # 1 , #2)} % D e f i n e s \ i p { a r g 1 }{ a r g 2 } t o mean % ( arg1 , a r g 2 ) . %\newcommand{\ i p } [ 2 ] { \ l a n g l e #1 | #2\ r a n g l e } % T h i s i s an a l t e r n a t i v e d e f i n i t i o n % \ i p t h a t i s commented o u t .
of
\ b e g i n { document }
% End o f p r e a m b l e and b e g i n n i n g o f t e x t .
\ maketitle
% Produces the
title .
T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own . \ s e c t i o n { O r d i n a r y Text }
% Produces s e c t i o n heading . Lower−l e v e l % s e c t i o n s a r e begun w i t h s i m i l a r % \ s u b s e c t i o n and \ s u b s u b s e c t i o n commands .
The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e counts as a space . One
o r more
blank l i n e s denote the
end
95
96
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
of
APPENDIX A. GENERAL EXAMPLE
a paragraph .
S i n c e any number o f c o n s e c u t i v e s p a c e s a r e t r e a t e d l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o \LaTeX , % The \LaTeX command g e n e r a t e s t h e LaTeX l o g o . but i t makes a d i f f e r e n c e t o you . When you u s e \LaTeX , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t file . B e c a u s e p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ % \ , s e p a r a t e s t h e d o u b l e and s i n g l e q u o t e . i s what I j u s t wrote , n o t ‘ that ’ \ , ’ ’ . Dashes come i n t h r e e s i z e s : an i n t r a −word dash , a medium dash f o r number r a n g e s 1−−2, and a p u n c t u a t i o n dash−−−l i k e this .
like
A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . \ a l l % ‘ \ ’ makes an i n t e r −word s p a c e . b e g i n w i t h G\@. % \@ marks end−o f −s e n t e n c e p u n c t u a t i o n . You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis \ldots\ % ‘ \ ’ i s needed a f t e r ‘ \ l d o t s ’ b e c a u s e TeX % i g n o r e s s p a c e s a f t e r command names l i k e \ l d o t s % made from \ + l e t t e r s . % % Note how a ‘% ’ c h a r a c t e r c a u s e s TeX t o i g n o r e % t h e end o f t h e i n p u t l i n e , s o t h e s e b l a n k l i n e s % do n o t s t a r t a new p a r a g r a p h . % w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command . \LaTeX\ i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& \% \# \{ and \ } . I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an \emph{ i t a l i c } type s t y l e . \ b e g i n {em} A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be g i v e n \emph{ a d d i t i o n a l } e m p h a s i s . \ end {em}
97
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
I t i s s o m e t i m e s n e c e s s a r y t o p r e v e n t \LaTeX\ from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . ˜ Jones ’ ’ , % ˜ p r o d u c e s an u n b r e a k a b l e i n t e r w o r d s p a c e . o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e \mbox{\emph{itemnum }} t h a t makes l i t t l e s e n s e when hyphenated a c r o s s lines . F o o t n o t e s \ f o o t n o t e { T h i s i s an example o f a f o o t n o t e . } p o s e no problem . \LaTeX\ i s good a t t y p e s e t t i n g m a t h e m a t i c a l f o r m u l a s like \ ( x−3y + z = 7 \ ) or \ ( a {1} > x ˆ{2 n} + y ˆ{2 n} > x ’ \ ) or \ ( \ i p {A}{B} = \ sum { i } a { i } b { i } \ ) . The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ % $ . . . $ and \( . . . \) i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be t yped a s one .
are equivalent
\ s e c t i o n { D i s p l a y e d Text } Text i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations \ begin { quote } This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . S e e how i t i s f o r m a t t e d . \ end { q u o t e } and l o n g e r o n e s . \ begin { quotation } This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e particularly interesting . This i s the second paragraph o f the q u o t a t i o n . It i s j u s t as d u l l as the f i r s t paragraph . \ end { q u o t a t i o n } Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an \emph{ i t e m i z e d } list . \ begin { itemize } \ i t e m T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k mark i s u s e d . \ item This i s the second item o f the l i s t . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an \emph{ enumerated } l i s t . \ b e g i n { enumerate } \ i t e m T h i s i s t h e f i r s t i t e m o f an enumerated l i s t that i s nested within the itemized l i s t . \ item This i s the second item o f the i n n e r l i s t . \LaTeX\ a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . \ end { enumerate } This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . \ item This i s the t h i r d item o f the l i s t .
98
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
APPENDIX A. GENERAL EXAMPLE
\ end { i t e m i z e } You can even d i s p l a y p o e t r y . \ begin { verse } There i s an e n v i r o n m e n t f o r v e r s e \\ % The \\ command s e p a r a t e s Whose f e a t u r e s some p o e t s % w i t h i n a s t a n z a . w i l l curse . % One o r more b l a n k l i n e s
lines
separate stanzas .
For i n s t e a d o f making \\ Them do \emph{ a l l } l i n e b r e a k i n g , \\ I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e . \ end { v e r s e } M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . \[ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n } \ ] Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . Here i s a sample t a b l e added by me : \ begin { tabular }{| l | | l |} TableHead1&TableHead2 \\ \ hline T e s t C e l l 1&T e s t C e l l 2 \\ T e s t C e l l 3&T e s t C e l l 4 \\ \ end { t a b u l a r } \ end { document }
% End o f document .
Listing A.2 shows the generated XML from source file using latex2xml. Listing A.2: XML File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own. O r d i n a r y Text The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e c o u n t s a s a s p a c e . One o r more blank l i n e s denote the end of a p a r a g r a p h . S i n c e any number o f c o n s e c u t i v e s p a c e s a r e t r e a t e d l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o , but i t makes a d i f f e r e n c e t o you . When you u s e , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t f i l e . B e ca u se p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ i s what I j u s t wrote , n o t ‘ that ’ \ , ’ ’ . Dashes come i n t h r e e s i z e s : an i n t r a −word
99
29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
dash , a medium dash f o r number r a n g e s l i k e 1−−2, and a p u n c t u a t i o n dash−−−l i k e t h i s . A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . a l l b e g i n w i t h G. You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis ... w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command. i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& ; \% \# \{ and \}. I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an i t a l i c t y p e s t y l e . A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be g i v e n a d d i t i o n a l e m p h a s i s . I t i s s o m e t i m e s n e c e s s a r y t o p r e v e n t from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . Jones ’ ’ , o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e itemnum t h a t makes l i t t l e s e n s e when hyphenated a c r o s s l i n e s . F o o t n o t e s T h i s i s an example o f a f o o t n o t e . p o s e no problem . i s good a t t y p e s e t t i n g m a t h e m a t i c a l f o r m u l a s like $ x−3y + z = 7 $ or $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ or $ \ i p {A}{B} = \ sum { i } a { i } b { i } $. The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be typed a s one . D i s p l a y e d TextText i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . S e e how i t i s f o r m a t t e d .
and l o n g e r o n e s . This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e p a r t i c u l a r l y i n t e r e s t i n g . This i s the second paragraph o f the quotation . It i s j u s t as d u l l as the f i r s t paragraph .
Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an i t e m i z e d list . - T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k
100
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
APPENDIX A. GENERAL EXAMPLE
mark i s u s e d . - T h i s i s t h e s e c o n d i t e m o f t h e list . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an enumerated l i s t .
- T h i s i s t h e f i r s t i t e m o f an enumerated l i s t t h a t i s n e s t e d w i t h i n t h e i t e m i z e d l i s t .
< item>T h i s i s t h e s e c o n d i t e m o f t h e i n n e r l i s t . a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . - T h i s i s t h e t h i r d i t e m o f t h e l i s t .
You can even d i s p l a y p o e t r y . There i s an e n v i r o n m e n t f o r v e r s e
Whose f e a t u r e s some p o e t s w i l l curse . For i n s t e a d o f making
Them do a l l l i n e b r e a k i n g ,
I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e .
M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n}$ Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . Here i s a sample t a b l e added by me : TableHead1 TableHead2
T e s t C e l l 1 T e s t C e l l 2
T e s t C e l l 3 T e s t C e l l 4
Listing A.3 demonstrates XSL templates that have been generated dynamically for producing RDF. Listing A.3: Dynamic XSL File for Generating RDF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>
101
25 26 27 28
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75
76 77 78 79 80
< x s l : with−param name=” elementName ” s e l e c t =” ’m’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> 1”>
102
81 82
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
106 107 108 109 110 111 112
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
APPENDIX A. GENERAL EXAMPLE
< x s l : with−param name=” elementName ” s e l e c t =” ’ t a b u l a r ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ l a t e x ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> < s t i t l e xmlns = ” h t t p : / / l a t e x o n t o l o g y . o r g / l a t e x #” r d f : about =”#{ s u b s t r i n g ( $fName , 1 , s t r i n g −l e n g t h ( $fName ) −1)}”> 1”>
103
135 136
137 138 139 140 141 142 143
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
168 169 170 171 172 173 174
175 176 177 178 179 180 181 182 183 184 185 186 187
< h a s s t i t l e r d f : r e s o u r c e = ”#{$fName} s t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ s t i t l e ’ ] ) + 1}”> < x s l : with−param name=” elementName ” s e l e c t =” ’ s t i t l e ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’emph ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” />
104
188 189 190 191 192 193 194 195 196 197 198
199 200 201 202 203 204 205
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229
230 231 232 233 234 235 236
237 238 239 240 241
APPENDIX A. GENERAL EXAMPLE
1”> < x s l : with−param name=” elementName ” s e l e c t =” ’emph ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>
105
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260
261 262 263 264 265 266 267
268 269 270 271 272 273 274 275 276
277 278 279 280 281 282 283
284 285 286 287 288 289 290 291 292 293 294
< x s l : with−param name=” elementName ” s e l e c t =” ’ bodymatter ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ document ’ ” />
106
295 296 297 298 299 300 301 302 303 304 305 306 307
308 309 310 311 312 313 314
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337
338 339 340 341 342 343 344
345 346 347
APPENDIX A. GENERAL EXAMPLE
< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ quote ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < h a s s t i t l e r d f : r e s o u r c e = ”#{$fName} s t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ s t i t l e ’ ] ) + 1}”>
107
348 349 350 351 352 353
354 355 356 357 358 359 360
361 362 363 364 365 366 367 368 369
370 371 372 373 374 375 376
377 378 379 380 381 382 383 384 385
386 387 388 389 390 391 392
393 394 395 396 397 398 399 400 401
402 403
1”> 1”> 1”> 1”>
108
404 405 406 407 408
409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432
433 434 435 436 437 438 439
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457
APPENDIX A. GENERAL EXAMPLE
< x s l : with−param name=” elementName ” s e l e c t =” ’ s e c t i o n ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ par ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> < t i t l e xmlns = ” h t t p : / / l a t e x o n t o l o g y . o r g / l a t e x #” r d f : about =”#{ s u b s t r i n g ( $fName , 1 , s t r i n g −l e n g t h ( $fName ) −1)}”>
109
458 459 460 461 462 463
464 465 466 467 468 469 470
471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494
495 496 497 498 499 500 501
502 503 504 505 506 507 508 509 510 511 512
1”> < h a s t i t l e r d f : r e s o u r c e = ”#{$fName} t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ t i t l e ’ ] ) + 1}”> < x s l : with−param name=” elementName ” s e l e c t =” ’ t i t l e ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ i t e m i z e ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” />
110
513 514 515 516 517 518 519 520 521 522 523 524
525 526 527 528 529 530 531
532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555
556 557 558 559 560 561 562
563 564 565 566
APPENDIX A. GENERAL EXAMPLE
< x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> < h a s t i t l e r d f : r e s o u r c e = ”#{$fName} t i t l e { c o u n t ( p r e c e d i n g −s i b l i n g : : ∗ [ name ( ) =’ t i t l e ’ ] ) + 1}”> < x s l : with−param name=” elementName ” s e l e c t =” ’ f r o n t m a t t e r ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>
111
567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586
587 588 589 590 591 592 593
594 595 596 597 598 599 600 601 602
603 604 605 606 607 608 609
610 611 612 613 614 615 616 617 618
619 620 621
< x s l : with−param name=” elementName ” s e l e c t =” ’ note ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”> 1”> 1”>
112
622 623 624 625
626 627 628 629 630 631 632 633 634
635 636 637 638 639 640 641
642 643 644 645 646 647 648 649 650
651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677
APPENDIX A. GENERAL EXAMPLE
1”> 1”> < x s l : with−param name=” elementName ” s e l e c t =” ’ par ’ ” /> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> < x s l : c a l l −t e m p l a t e name=”parentNameGenerator ” /> 1”>
113
678 679 680 681
682 683 684 685 686 687 688
689 690 691 692 693 694 695 696 697 698 699 700 701 702
703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718
719 720 721 722 723 724 725
726 727 728
< x s l : with−param name=” elementName ” s e l e c t =” ’ c h a p t e r ’ ” /> & l t ; < x s l : t e x t d i s a b l e −output−e s c a p i n g =”y e s”>&g t ; < x s l : t e x t d i s a b l e −output− e s c a p i n g =”y e s”>& l t ;/ < x s l : t e x t d i s a b l e −output−e s c a p i n g =”y e s”>&g t ; 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l :
114
APPENDIX A. GENERAL EXAMPLE
v a l u e −o f s e l e c t =”.”> 729 730 731
732 733 734
735 736 737
738 739 740
741 742 743
744 745 746
747 748 749
750 751 752
753 754 755 756 757 758 759 760 761 762 763
< x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : v a l u e −o f s e l e c t =”.”>
115
764 765 766 767 768 769 770 771 772 773 774 775 776 777 778
s t r i n g ( $testCounter ) ,
’
Listing A.4 shows the generated RDF after applying generated XSL templates into XML. Listing A.4: RDF File 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
< h a s t i t l e r d f : r e s o u r c e=”#d o c u m e n t 1 f r o n t m a t t e r 1 t i t l e 1 ”/> < t i t l e r d f : about=”#d o c u m e n t 1 f r o n t m a t t e r 1 t i t l e 1 ”> T h i s i s an example i n p u t f i l e . Comparing i t w i t h t h e o u t p u t i t g e n e r a t e s can show you how t o p r o d u c e a s i m p l e document o f your own.
116
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
APPENDIX A. GENERAL EXAMPLE
< h a s s t i t l e r d f : r e s o u r c e=”# d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 1 s t i t l e 1 ”/> < s t i t l e r d f : about=”#d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 1 s t i t l e 1 ”> O r d i n a r y Text The e n d s o f words and s e n t e n c e s a r e marked by spaces . It doesn ’ t m a t t e r how many spaces you t y p e ; one i s a s good a s 1 0 0 . The end o f a l i n e c o u n t s a s a s p a c e .
117
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158
One o r more blank l i n e s denote the end of a p a r a g r a p h .
S i n c e any number o f c o n s e c u t i v e s p a c e s a r e t r e a t e d l i k e a s i n g l e one , t h e f o r m a t t i n g o f t h e i n p u t f i l e makes no d i f f e r e n c e t o , but i t makes a d i f f e r e n c e t o you . When you u s e , making your i n p u t f i l e a s e a s y t o r e a d a s p o s s i b l e w i l l be a g r e a t h e l p a s you w r i t e your document and when you change i t . T h i s sample f i l e shows how you can add comments t o your own i n p u t f i l e . B ec a u se p r i n t i n g i s d i f f e r e n t from t y p e w r i t i n g , t h e r e a r e a number o f t h i n g s t h a t you have t o do d i f f e r e n t l y when p r e p a r i n g an i n p u t f i l e than i f you were j u s t t y p i n g t h e document d i r e c t l y . Q u o t a t i o n marks l i k e ‘ ‘ this ’ ’ have t o be h a n d l e d s p e c i a l l y , a s do q u o t e s w i t h i n quotes : ‘ ‘\ , ‘ this ’ i s what I j u s t wrote , no t ‘ that ’ \ , ’ ’ .
Dashes come i n t h r e e s i z e s : an i n t r a −word
118
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
APPENDIX A. GENERAL EXAMPLE
dash , a medium dash f o r number r a n g e s l i k e 1−−2, and a p u n c t u a t i o n dash−−−l i k e t h i s . A s e n t e n c e −e n d i n g s p a c e s h o u l d be l a r g e r than t h e s p a c e between words w i t h i n a s e n t e n c e . You s o m e t i m e s have t o t y p e s p e c i a l commands i n c o n j u n c t i o n with punctuation c h a r a c t e r s to get t h i s right , as in the f o l l o w i n g sentence . Gnats , gnus , e t c . a l l b e g i n w i t h G. You s h o u l d c h e c k t h e s p a c e s a f t e r p e r i o d s when r e a d i n g your o u t p u t t o make s u r e you haven ’ t f o r g o t t e n any s p e c i a l c a s e s . G e n e r a t i n g an ellipsis ... w i t h t h e r i g h t s p a c i n g around t h e p e r i o d s r e q u i r e s a s p e c i a l command. i n t e r p r e t s some common c h a r a c t e r s a s commands , s o you must t y p e s p e c i a l commands t o g e n e r a t e them . These c h a r a c t e r s i n c l u d e t h e following : \ $ \& ; \% \# \{ and \}. I n p r i n t i n g , t e x t i s u s u a l l y e m p h a s i z e d w i t h an italic t y p e s t y l e . i t a l i c
119
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278
A l o n g segment o f t e x t can a l s o be e m p h a s i z e d i n t h i s way . Text w i t h i n s u c h a segment can be given a d d i t i o n a l emphasis . a d d i t i o n a l
I t i s s o m e t i m e s n e c e s s a r y t o p r e v e n t from b r e a k i n g a l i n e where i t might o t h e r w i s e do s o . T h i s may be a t a s p a c e , a s between t h e ‘ ‘ Mr . ’ ’ and ‘ ‘ Jones ’ ’ i n ‘ ‘ Mr . Jones ’ ’ , o r w i t h i n a word−−−e s p e c i a l l y when t h e word i s a symbol l i k e itemnum t h a t makes l i t t l e s e n s e when hyphenated a c r o s s l i n e s . itemnum
120
279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336
APPENDIX A. GENERAL EXAMPLE
F o o t n o t e s T h i s i s an example o f a f o o t n o t e . p o s e no problem . T h i s i s an example o f a f o o t n o t e . i s good a t t y p e s e t t i n g m a t h e m a t i c a l f o r m u l a s like $ x−3y + z = 7 $ or $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ or $ \ i p {A}{B} = \ sum { i } a { i } b { i } $ . The s p a c e s you t y p e i n a f o r m u l a a r e ignored . Remember t h a t a l e t t e r l i k e $x$ i s a f o r m u l a when i t d e n o t e s a m a t h e m a t i c a l symbol , and i t s h o u l d be t yped a s one .
121
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390
$ x−3y + z = 7 $ $ a {1} &g t ; x ˆ{2 n} + y ˆ{2 n} &g t ; x ’ $ $ \ i p {A}{B} = \ sum { i } a { i } b { i } $ $x$ < h a s s t i t l e r d f : r e s o u r c e=”# d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 2 s t i t l e 1 ”/>
122
391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450
APPENDIX A. GENERAL EXAMPLE
< s t i t l e r d f : about=”#d o c u m e n t 1 b o d y m a t t e r 1 c h a p t e r 1 s e c t i o n 2 s t i t l e 1 ”> D i s p l a y e d Text Text i s d i s p l a y e d by i n d e n t i n g i t from t h e l e f t margin . Q u o t a t i o n s a r e commonly d i s p l a y e d . There are short quotations This i s a s h o r t a q u o t a t i o n . It consists of a s i n g l e paragraph o f t e x t . Se e how i t i s f o r m a t t e d . and l o n g e r o n e s . This i s a l o n g e r q u o t a t i o n . I t c o n s i s t s o f two p a r a g r a p h s o f t e x t , n e i t h e r o f which a r e particularly interesting . This i s the second paragraph o f the q u o t a t i o n . It i s j u s t as d u l l as the f i r s t paragraph .
123
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
Another f r e q u e n t l y −d i s p l a y e d s t r u c t u r e i s a l i s t . The f o l l o w i n g i s an example o f an i t e m i z e d list . i t e m i z e d T h i s i s t h e f i r s t i t e m o f an i t e m i z e d l i s t . Each i t e m i n t h e l i s t i s marked w i t h a ‘ ‘ t i c k ’ ’ . You don ’ t have t o worry about what k i n d o f t i c k mark i s u s e d . T h i s i s t h e s e c o n d i t e m o f t h e l i s t . It contains another l i s t nested i n s i d e i t . The i n n e r l i s t i s an enumerated l i s t . T h i s i s t h e f i r s t i t e m o f an enumerated l i s t that i s nested w i t hi n the i t e m i z e d l i s t . This i s the second item o f the i n n e r l i s t . a l l o w s you t o n e s t l i s t s d e e p e r than you r e a l l y s h o u l d . This i s the r e s t o f the second item o f the o u t e r list . I t i s no more i n t e r e s t i n g than any o t h e r part o f the item . This i s the t h i r d item o f the l i s t .
You can even d i s p l a y p o e t r y .
124
512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569
APPENDIX A. GENERAL EXAMPLE
There i s an e n v i r o n m e n t f o r v e r s e Whose f e a t u r e s some p o e t s w i l l curse . For i n s t e a d o f makingThem do a l l l i n e b r e a k i n g , I t a l l o w s them t o put t o o many words on a l i n e when they ’ d r a t h e r be f o r c e d t o be t e r s e . M a t h e m a t i c a l f o r m u l a s may a l s o be d i s p l a y e d . A displayed formula is one−l i n e l o n g ; m u l t i l i n e formulas require s p e c i a l formatting i n s t r u c t i o n s . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n} $ Don ’ t s t a r t a p a r a g r a p h w i t h a d i s p l a y e d e q u a t i o n , n o r make one a p a r a g r a p h by i t s e l f . $ \ i p {\Gamma}{\ p s i ’ } = x ’ ’ + y ˆ{2} + z { i }ˆ{ n}$ Here i s a sample t a b l e added by me: | l | | l |
125
570 571 572 573
T a b l e H e a d 1 T a b l e H e a d 2 T e s t C e l l 1 T e s t C e l l 2 T e s t C e l l 3 T e s t C e l l 4
Listing A.5 demonstrates XSL templates that have been generated dynamically for producing XSL-FO. Listing A.5: Dynamic XSL File for Generating XSL-FO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
32
33 34 35 36 37 38 39 40 41 42 43
44
45 46 47
< x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”>
126
48
49
50 51 52 53 54 55 56 57 58 59
60
61 62 63 64 65 66 67 68 69 70 71
72
73 74 75 76 77 78 79 80 81 82 83
84
85 86 87 88 89 90 91 92 93 94 95
APPENDIX A. GENERAL EXAMPLE
< x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” /> 0”> < x s l : v a l u e −o f s e l e c t =”.”> < x s l : c a l l −t e m p l a t e name=”f u l l N a m e G e n e r a t o r ” />