Computer Science E-259 XML with Java

Lecture 2: XML 1.1 and SAX 2.0.2 24 September 2007 David J. Malan [email protected]

1 Copyright © 2007, David J. Malan . All Rights Reserved.

Computer Science E-259 Last Time ƒ ƒ ƒ

ƒ

Computer Science E-259 J2EE XML ƒ What ƒ Who ƒ When ƒ How ƒ Why Computer Science E-259

2 Copyright © 2007, David J. Malan . All Rights Reserved.

Computer Science E-259 This Time ƒ ƒ ƒ ƒ ƒ

XML 1.1 SAX 2.0.2 JAXP 1.3 and Xerces 2.7.1 (2.9.1) Parsing My First XML Parser

3 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 A Representative Document

4

Jim Bob graduate Computer Science & Music Jim Bob! Hi my name is jim. I look like ]]> ... Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 XML Declaration ƒ ƒ ƒ

ƒ



Optional Must appear at the very top of an XML document Used to indicate the version of the specification to which the document conforms (and whether the document is “standalone”) Used to indicate the character encoding of the document ƒ UTF-8 ƒ UTF-16 ƒ iso-8859-1 ƒ …

5 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 DOCTYPE ƒ ƒ ƒ

ƒ



References a Document Type Definition (DTD) Can refer to an external DTD file or include some DTD information within the tag itself DTD is the original mechanism for specifying the schema of an XML document ƒ Inherited in part from SGML ƒ Arcane syntax ƒ Limited expressive functionality More in Lecture 8...

6 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 Elements ƒ ƒ ƒ

Jim Bob

Main structure in an XML document Only one root element allowed Start Tag ƒ Allows specification of zero or more attributes

ƒ

End Tag ƒ Must match name, case, and nesting level of start tag


ƒ

Name must start with letter or underscore and can contain only letters, numbers, hyphens, periods, and underscores

7 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 Element ƒ

Element Content ...

ƒ

Parsed Character Data (aka PCDATA, aka Text) Jim Bob

ƒ

Mixed Content Jim J Bob

ƒ

No Content

8 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 Attributes



ƒ

Name ƒ Must start with letter or underscore and can contain only letters, numbers, hyphens, periods, and underscores

ƒ

Value ƒ Can be of several types, but is almost always a string ƒ Must be quoted ƒ title="Lecture 2" ƒ match='item="baseball bat"'

ƒ Cannot contain < or & (by itself)

9 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 PCDATA ƒ ƒ ƒ

Jim Bob

Text that appears as the content of an element Can reference entities Cannot contain < or & (by itself)

10 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 Entities ƒ

ƒ

Used to “escape” content or include content that is hard to enter or repeated frequently ƒ Somewhat like macros Five pre-defined entities ƒ

ƒ

e.g., © is ©

Must be declared to be legal ƒ

ƒ

& < > ' "

Character entities can refer to a single character by unicode number ƒ

ƒ

&



Cannot refer to themselves

11 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 CDATA ƒ ƒ ƒ ƒ

Jim Bob! ... ]]>

Parsed in “one chunk” by the XML parser Data within is not checked for subelements, entities, etc. Allows you to include badly formed markup or character data that would cause a problem during parsing Examples ƒ Including HTML tags in an XML document ƒ Used in XSLT to write out non-XML text

12 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 Comments ƒ ƒ ƒ ƒ



Can include any text inside a comment to make it easier for human readers to understand your document Generally not available to applications reading the document Always begin with Cannot contain --

13 Copyright © 2007, David J. Malan . All Rights Reserved.

XML 1.1 Processing Instructions ƒ ƒ

ƒ ƒ



“Sticky notes” to applications processing an XML document that explain how to handle content The target portion (e.g., studentdb) of a PI indicates the application that is to process this instruction; cannot start with “xml” The remainder of the PI can be any text that gives instructions to the application Examples ƒ Instructions to an application to display different versions of an image ƒ Instructions to an application to suppress display of certain content ƒ ...

14 Copyright © 2007, David J. Malan . All Rights Reserved.

SAX 2.0.2 A Sample Document



15 Copyright © 2007, David J. Malan . All Rights Reserved.

SAX 2.0.2 Event-Based Parsing Document < student id="0001"/> < /students>

ContentHandler startDocument(); startElement("students", {}); characters("\n "); startElement("student", {("id", "0001")}; endElement("student"); characters("\n"); endElement("students"); endDocument();

16 Copyright © 2007, David J. Malan . All Rights Reserved.

JAXP 1.3 and Xerces 2.7.1 SAXDemo javax.xml.parsers.SAXParserFactory javax.xml.parsers.SAXParser org.xml.sax.* org.xml.sax.helpers.* ...

17 Copyright © 2007, David J. Malan . All Rights Reserved.

Parsing Definition ƒ

ƒ

In linguistics, to divide language into small components that can be analyzed. For example, parsing this sentence would involve dividing it into words and phrases and identifying the type of each component (e.g., verb, adjective, or noun) For XML, parsing means reading an XML document, identifying the various components, and making it available to an application

18 Copyright © 2007, David J. Malan . All Rights Reserved.

Parsing Grammars in Backus-Naur Form ƒ ƒ ƒ

In order to parse a document, you need to be able to specify exactly what it contains XML specification does this for XML using a grammar in Backus-Naur Form (BNF) A grammar describes a language through a series of rules ƒ A rule describes how to produce a something (e.g., a start tag) by assembling characters and other non-terminal symbols ƒ Made up of ƒ non-terminal symbols ƒ terminal symbols (data that is taken literally)

19 Copyright © 2007, David J. Malan . All Rights Reserved.

Parsing Arithmetic ƒ

A grammar for arithmetic equations Eqn Term Op Value

ƒ

::= ::= ::= ::=

Term '=' Term '(' Term Op Term ')' | Value '+' | '-' | '/' | '*'

Produces ƒ (4 + 3) = 7 ƒ (1 + 2) = (3 – 0) ƒ ((10 / 2) + 1) = (3 * 2) ƒ 4=5 ƒ ...

20 Copyright © 2007, David J. Malan . All Rights Reserved.

Parsing XML ƒ

A (much simplified) grammar for XML element content STag ETag

::= ::= ::= ::=

STag content Etag (element | CharData)* '<' Name '>' '<' '/' Name '>'

where Name is one or more characters excluding > and CharData is zero or more characters excluding <.

21 Copyright © 2007, David J. Malan . All Rights Reserved.

My First XML Parser Tokenizing and Recognizing ƒ

ƒ

ƒ

Tokenizing ƒ Creates tokens from the character stream ƒ Element name, equal sign, start tag Recognizing ƒ Understands the syntax of the document and checks for correctness ƒ Builds a syntax tree In mf.XMLParser, there will be no clear distinction between tokenizing and recognizing

22 Copyright © 2007, David J. Malan . All Rights Reserved.

My First XML Parser Recursive Descent Parsing ƒ ƒ

XML's grammar works well with a parsing technique known as recursive descent parsing Basically: ƒ You write a function that is responsible for parsing every non-terminal in the grammar ƒ You assume that the document matches the grammar ƒ The correct alternation in a rule can be determined by examining a few tell-tale starting characters (lookahead) ƒ You recursively parse the document, calling each nonterminal parsing function as dictated by the grammar ƒ Use exception handling to handle errors when they occur deep in the recursive call tree

23 Copyright © 2007, David J. Malan . All Rights Reserved.

My First XML Parser Source Code cscie259.project1.mf.*

24 Copyright © 2007, David J. Malan . All Rights Reserved.

Computer Science E-259 Next Time ƒ

ƒ

ƒ

The SAX API has a number of important advantages… ƒ You can write very fast SAX parsers ƒ No memory to allocate, data structures to link ƒ “Fire and forget” ƒ It is useful for large documents ƒ Loading the whole document into memory is prohibitive ƒ It is easy to use …but it doesn't solve every problem ƒ Need to have an internal data structure for some applications ƒ To follow links in information (especially backwards ones) ƒ To perform operations that require having multiple pieces of the document at the same time Enter the Document Object Model (DOM)…

25 Copyright © 2007, David J. Malan . All Rights Reserved.

Computer Science E-259 XML with Java

Lecture 2: XML 1.1 and SAX 2.0.2 24 September 2007 David J. Malan [email protected]

26 Copyright © 2007, David J. Malan . All Rights Reserved.

Computer Science E-259 - Fas Harvard

All Rights Reserved. Computer Science E-259. XML with Java. Lecture 2: XML 1.1 and SAX 2.0.2. 24 September 2007. David J. Malan [email protected] ...

62KB Sizes 1 Downloads 187 Views

Recommend Documents

Computer Science E-259 - Fas Harvard
Apr 13, 2005 - 1. Copyright © 2007, David J. Malan . All Rights Reserved. Computer Science E-259. XML with Java. Lecture 11:.

Computer Science E-259 - Fas Harvard
“SVG is a language for describing two-dimensional graphics in XML. SVG allows ... Adapted from http://www.adobe.com/svg/basics/getstarted2.html. Viewable at ...

Computer Science E-259 - Fas Harvard
Computer Science E-259. XML with Java, Java ... Computer. Laptop. PDA. Web Server. JSP/servlet. JSP/servlet. EJB Server. EJB. EJB. DB. XML? ... Page 10 ...

Computer Science E-259 - Fas Harvard
Cut through the hype and get to the value. ▫ Focus on. ▫ practicality: what you need to know to do real work. ▫ applications: what are the tools and technologies.

Computer Science E-259 - Fas Harvard
Computer Science E-259. XML with Java. Lecture 5: XPath 1.0 (and 2.0) and XSLT ... All Rights Reserved. Computer Science E-259. Last Time. ▫ CSS Level 2.

Computer Science E-259 - Fas Harvard
Page 2 .... . . .... unidirectional hyperlinks of today's HTML, as well as.

anchor.svg 1/1 - Fas Harvard
anchor.svg. 1/1 examples6/svg/. 1: 2:

foo.xml 1/1 - Fas Harvard
15: * characters are displayed as "\n", "\t", and "\r". 16: * explicitly, and line numbers are reported in errors. 17: *. 18: * @author Computer Science E-259. 19: **/.

books.xml 1/1 - Fas Harvard
2: . 3: DOCTYPE emails SYSTEM "emails.dtd">. 4: . 5:

README.txt 1/2 - Fas Harvard
3: Computer Science E-259. 4: 5: 6: OVERVIEW. 7: 8: In these directories are ... 13: * @author Computer Science E-259. 14: **/. 15: 16: public class TaxClient.

DiagServer.java 1/2 - Fas Harvard
DiagServer.java. 2/2 examples7/. 46: byteCount++;. 47: System.out.write(b); ... 2:

XML Schema (Second Edition) - Fas Harvard
After the release of XML 1.0, DTDs were soon recognized as insufficient. ▫ Work towards new schema standards began in early 1998. ▫ Different companies all ...

AttributeConverter1.xsl 1/2 - Fas Harvard
2/2 examples5/. 46: . 47: .... 2/2 examples5/. 44: 45: . 46: 47: . 48:  ...

XML Schema (Second Edition) - Fas Harvard
Establish a contract with trading partners. ▫ Documentation. ▫ Augmentation of instance with default values. ▫ Storage of application information ...

XPath 1.0 (and 2.0) - Fas Harvard
Computer Science E-259. This Time. ▫ CSS Level 2. ▫ XPath 1.0 (and 2.0). ▫ XSLT 1.0 (and 2.0). ▫ TrAX. ▫ Project 2 .... Displaying XML data on the Web as HTML.

Profiling a warehouse-scale computer - Harvard University
presents a detailed microarchitectural analysis of live data- center jobs, measured on more than 20,000 .... Continuous profiling We collect performance-related data from the many live datacenter workloads using ..... base with a well-defined feature

Profiling a warehouse-scale computer - Harvard University
Google to increasingly invest effort in automated, compiler- ... services that spend virtually all their time paying tax, and ...... Transactions of Computer. Systems ...

Horario FAS Docente.pdf
TALLER DE LENGUA. Y COMUNICACIÓN I. Código 0103011. Agustín Prado. SEMINARIO DE. PENSAMIENTO LÓGICO. MATEMÁTICO. Código 0207010.

Horario FAS Docente.pdf
Rosa Chavarría. DISEÑO DE. VESTUARIO. Código 0306051. Aurora Ayala. METODOLOGÍA DE. LA DANZA III. Código 0308053. Manuel Stagnaro. KINESIOLOGÍA II. Código 0317052. Moises Del Castillo. PSICOLOGÍA. GENERAL. Código 0201050. Doris Ramírez. HOR

The Future of Computer Science - Cornell Computer Science
(Cornell University, Ithaca NY 14853, USA). Abstract ... Where should I go to college? ... search engine will provide a list of automobiles ranked according to the preferences, .... Rather, members of a community, such as a computer science.

Computer Science E-259 Lectures - Computer Science E-259: XML ...
Sep 17, 2007 - most important new technology development of the last two years." Michael Vizard ... applications: what are the tools and technologies necessary to put ... XML. When. ▫ The World Wide Web Consortium (W3C) formed an XML.

Computer Science E-259
Jan 7, 2008 - Yahoo! UI Library http://developer.yahoo.com/yui/ ..... how to program in JavaScript and PHP, how to configure. Apache and MySQL, how to ...

Computer Science E-259
Nov 19, 2007 - labeling the information content of diverse data sources .... .... ELEMENT article (url, headline_text, source, media_type, cluster,.