Java6.0 new features of StAX - a comprehensive analysis of Java XML parsing technologies

Mustang (Mustang, Java 6.0 code) compared to Tiger (Tiger, Java 5.0, code-named) who, from the performance improvements, scripting languages (Javascript, JRuby, Groovy) the support of the extension of java.io.File to the desktop applications enhance other aspects, really big a lot of skills.

XML support for Java 6.0 On many aspects of the new features. Such as StAX, XML-Web services for Java framework (JAX-WS) 2.0, for XML Binding API (JAXB) 2.0, XML digital signature API, and even support for SQL: 2003 'XML' data type. In this article we will introduce the StAX technology in our development because it will be used to more often.

Streaming API for XML StAX is an abbreviation, is a streaming pull XML for Analysis API. On XML for analysis (or parsing) technology, we must not be familiar. Before in Java 6.0, it has four:

1.DOM: Document Object Model
2.SAX: Simple API for XML
3.JDOM: Java-based Document Object Model
4.DOM4J: Document Object Model for Java

Analysis on their principles, and performance advantages and disadvantages, I will end this article a brief introduction. This article, we mainly talk about this new analytical method StAX.

First let us clear two concepts: push and pull of analysis.

In the program to access and manipulate XML documents there are two models: DOM (Document Object Model) and the flow model. Their advantages and disadvantages are as follows:

Quote
DOM benefits: allows you to edit and update the XML document, you can randomly access the data in the document, you can use XPath (XML Path Language, is a node from the XML document search query language) query.
DOM Disadvantages: one-time loading the entire document into memory for large documents can cause performance problems.

Reference flow model advantages: access to the XML file using the concept of flow, at any time in memory only the current node, DOM performance problems solved.
Flow model Disadvantages: is read-only, and can only move forward, not backward navigation in the document to perform operations.

What is on the DOM, will be introduced at the end of the article. Here we briefly about flow: it is a continuous sequence of bytes, can be understood as constantly moving from the source to the target object with a special byte.

Let us return to the theme. Flow model in each iteration a node XML document, suitable for handling large documents, the small memory consumption. It has two variants - "push" model and "pull" model.

Push model of reference: what we always said SAX, which is a by event-driven model. When it finds a node on each trigger an event, and we need to write these events handler. This is cumbersome, and not flexible.

Reference pull model: the traversal document, will be interested in the part of the pull from the reader, do not trigger the event, allowing us to selectively processing nodes. This greatly increases flexibility, and overall efficiency.

This, we will figure out a "push analysis" and "pull of" concept:

Reference flow model based on the analysis of the push model is called push analysis; flow model based on the analysis of the Latin American model is called the pull of way.

StAX is a pull-based XML parsing technology analysis. It also supports the generation of the XML file operations, but this paper, we only introduce the analysis of knowledge.

From the beginning, JAXP (Java API for XML Processing) provides two ways to handle XML: DOM and SAX. StAX is a new method for the flow, the final version released in March 2004 and became the JAXP 1.4 (included in Java 6.0 in) part. StAX is implemented using the JWSDP (Java Web Services Development Pack) 1.6, and a combination of SJSXP (Sun Java System XML Streaming Parser, located javax.xml.stream .* package).

JWSDP is used to develop Web Services, Web applications and Java applications (mainly XML processing) development package. It contains the Java API are:

? JAXP: Java API for XML Processing
? JAXB: Java Architecture for XML Binding
? JAX-RPC: Java API for XML-based Remote Procedure Calls
? JAX-WS: Java API for XML Web Services
? SAAJ: SOAP with Attachments API for Java
? JAXR: Java API for XML Registries
? Web Services Registry

JWSDP earlier versions include:

? Java Servlet
? JSP: JavaServer Pages
? JSF: JavaServer Faces

Now, JWSDP has been replaced by GlassFish.

Including two sets of processing XML, StAX API, each provides different levels of abstraction. They are: pointer-based API and the iterator-based API.

We start to understand the pointer-based API. It is to XML as a marker (or events) flow to handle, the application can check the status of the parser to obtain a tag parsing the information, and then with the next mark, and so on.

API at the beginning of exploration, we first create a file named users.xml of XML documents for testing, which reads as follows:

Xml Code

<?xml version="1.0" encoding="UTF-8"?>
<company>
 <depart title="Develop Group">
  <user name="Tom" age="28" gender="male" >Manager</user>
  <user name="Lily" age="26" gender="female" />
 </depart>
 <depart title="Test Group">
  <user name="Frank" age="32" gender="male" >Team Leader</user>
  <user name="Bob" age="45" gender="male" />
  <user name="Kate" age="25" gender="female" />
 </depart>
</company>


Allows us to use the cursor-based API interface is javax.xml.stream.XMLStreamReader (sorry, you can not directly instantiate it), to get an instance of it, we need the help of javax.xml.stream.XMLInputFactory class. According to the traditional style of JAXP, where the use of the Abstract Factory (Abstract Factory) mode. If you are familiar with this pattern, then, we can imagine in my mind we will be prepared by the general framework of the code.

First, get a XMLInputFactory instance. Is:

Java code

XMLInputFactory factory = XMLInputFactory.newInstance ();

Or:

Java code
XMLInputFactory factory = XMLInputFactory.newFactory ();

These two methods are equivalent, they are creating a new instance, or even the type of instance are exactly the same. Because their internal implementation is:

Java code

(
return (XMLInputFactory) FactoryFinder.find ("javax.xml.stream.XMLInputFactory", "com.sun.xml.internal.stream.XMLInputFactoryImpl");
)

Then we can create an instance of XMLStreamReader. We have such a method can be selected:

Java code
XMLStreamReader createXMLStreamReader (java.io.Reader reader) throws XMLStreamException;

XMLStreamReader createXMLStreamReader (javax.xml.tranform.Source source) throws XMLStreamException;

XMLStreamReader createXMLStreamReader (java.io.InputStream stream) throws XMLStreamException;

XMLStreamReader createXMLStreamReader (java.io.InputStream stream, String encoding) throws XMLStreamException;

XMLStreamReader createXMLStreamReader (String systemId, java.io.InputStream stream) throws XMLStreamException;

XMLStreamReader createXMLStreamReader (String systemId, java.io.Reader reader) throws XMLStreamException;

These methods are based on the given stream to create an XMLStreamReader instance, You can rely on the type of flow, whether Xuyao specify the encoding or parsing XML method select the systemId Lai .

Here, we say a few words on the systemId, and briefly explain the difference between it and the publicId.

systemId, and publicId is the DOCTYPE element in XML documents often appear in two properties. They are all references to external resources, reference resources for the specified address. is a direct reference resources systemId, publicId indirectly locate external resources. Specifically, is this:

Quote
systemId: external resources (mostly DTD file) of the URI. Such as local file file: / / / user / dtd / users.dtd an address or network file http://www.w3.org/dtd/users.dtd .

Quote
publicId: the equivalent of a name, the name represents an external resource. For example, we require "W3C HTML 4.0.1" This string corresponds to " http://www.w3.org/dtd/users.dtd "this resource. So, publicId = "W3C HTML 4.0.1" and systemId = " http://www.w3.org/dtd/users.dtd "The role is the same.

Well, we went with the first interface listed above to create a XMLStreamReader instance:

Java code
try (
XMLStreamReader reader = factory.createXMLStreamReader (new FileReader ("users.xml"));
) Catch (FileNotFoundException e) (
e.printStackTrace ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)

To traverse the XML document, need to use several methods XMLStreamReader the following:

Java code

int getEventType ();

boolean hasNext () throws XMLStreamException;

int next () throws XMLStreamException;

getEventType () method returns XMLStreamConstants a marker interface defined constants that the current pointer points to labeling (or events) type. According to the different types of current events, the application can make a difference in treatment. Mark the type and meaning of the following constants:

1.START_DOCUMENT: the beginning of the document
2.END_DOCUMENT: the end of the document
3.START_ELEMENT: element start
4.END_ELEMENT: the end of the element
5.PROCESSING_INSTRUCTION: processing instruction
6.CHARACTERS: Characters (text or spaces)
7.COMMENT: Notes
8.SPACE: negligible space
9.ENTITY_REFERENCE: entity reference
10.ATTRIBUTE: element attributes
11.DTD: DTD
12.CDATA: CDATA block
13.NAMESPACE: namespace declaration
14.NOTATION_DECLARATION: mark the statement
15.ENTITY_DECLARATION: entity declaration

next () method to move the pointer to the next mark, it also returns the tag (or events) type. At this point if the next call getEventType () method returns the same value.

hasNext () is used to determine whether there is a marked under. When it returns true only if you can call next () and other methods of moving pointer.

Several ways to read the above description, we will find that traverse the XML document using the XMLStreamReader is very easy, because it is used and everyone is familiar with the Java iterator (Iterator) is the same. Here we used several methods already know that given above, the XML document to do a test. I hope you still remember its contents, if you forget, please turn back to re-look at.

Our test code is as follows:

Java code

/**
 *  List all users  
 * 
 * @author zangweiren 2010-4-17
 * 
 */
public class ListUsers {
 //  Get the parser  
 public static XMLStreamReader getStreamReader() {
  String xmlFile = ListUsers.class.getResource("/").getFile()
    + "users.xml";
  XMLInputFactory factory = XMLInputFactory.newFactory();
  try {
   XMLStreamReader reader = factory
     .createXMLStreamReader(new FileReader(xmlFile));
   return reader;
  } catch (FileNotFoundException e) {
   e.printStackTrace();
  } catch (XMLStreamException e) {
   e.printStackTrace();
  }
  return null;
 }

 //  List all user name  
 public static void listNames() {
  XMLStreamReader reader = ListUsers.getStreamReader();
  //  Loop through the XML document  
  try {
   while (reader.hasNext()) {
    int event = reader.next();
    //  If this is the element's opening  
    if (event == XMLStreamConstants.START_ELEMENT) {
     //  List all user name  
     if ("user".equalsIgnoreCase(reader.getLocalName())) {
      System.out.println("Name:"
        + reader.getAttributeValue(null, "name"));
     }
    }
   }
   reader.close();
  } catch (XMLStreamException e) {
   e.printStackTrace();
  }
 }

 public static void main(String[] args) {
  ListUsers.listNames();
 }
}



Run Results:

Quote
Name: Tom
Name: Lily
Name: Frank
Name: Bob
Name: Kate

In the above example code, we use the XMLStreamReader two new methods:

Java code

String getLocalName ();

String getAttributeValue (String namespaceURI, String localName);

Related to this there is a method:

Java code

QName getName ();

This method involves three XML-namespace (namespace), localName (local name), QName (Qualified Name, qualified name) three concepts, the way we explain:

Namespace is the same with different names to support the meaning of XML tags generated, it can be so defined:

Xml Code

<COM: company xmlns: COM = " http://www.zangweiren.com/company ">
<! - Here is other tags ->
</ Com: company>
Which, com is the namespace prefix, company is a namespace tag, http://www.zangweiren.com/company is the namespace identifier, the same identity is considered the same namespace. Logo known as URI, is the only, there is URL (Uniform Resource Locator) and URN (Uniform Resource Name) two. Namespace prefix is shorthand for the sake of convenience. Namespace declarations can be used after:

Xml Code

<COM: company xmlns: COM = " http://www.zangweiren.com/company ">
<com:depart name="Develop Group" />
</ Com: company>

In the example of <com:depart /> tab, is the namespace prefix com, depart is localName, these two together is the QName.

Understand XML in three basic concepts, thus understood getLocalName () and getAttributeValue (String namespaceURI, String localName) method of meaning.

Now, we have learned to traverse the XML document using the XMLStreamReader, and analysis of specific tags.

Let us look at the following two methods:

Java code

String getElementText () throws XMLStreamException;

int nextTag () throws XMLStreamException;

getElementText () method returns the element start tag (START_ELEMENT) and closing tag (END_ELEMENT) all the text between, when the nested element throws an exception.

nextTag () method will skip all the blank, comment or processing instruction, until the encounter START_ELEMENT or END_ELEMENT. It contains elements of content analysis is only useful when XML documents. Otherwise, before the discovery of markers encountered non-blank text (not including comments and processing instructions), it will throw an exception.

For instance, we modify a test program, adding a new method:

Java code

//  Lists all the user's name and age  
 public static void listNamesAndAges() {
  XMLStreamReader reader = ListUsers.getStreamReader();
  try {
   while (reader.hasNext()) {
    //  Skip all blank comment or processing instruction.  , To the next  START_ELEMENT
    int event = reader.nextTag();
    if (event == XMLStreamConstants.START_ELEMENT) {
     if ("user".equalsIgnoreCase(reader.getLocalName())) {
      System.out.println("Name:"
        + reader.getAttributeValue(null, "name")
        + ";Age:"
        + reader.getAttributeValue(null, "age"));
     }
    }
   }
   reader.close();
  } catch (XMLStreamException e) {
   e.printStackTrace();
  }
 }

And then add it to the main method:

Java code

public static void main (String [] args) (
ListUsers.listNames ();
ListUsers.listNamesAndAges ();
)

Try running it in the resolution to <user name="Tom" age="28" gender="male"> Manager </ user> time will get an error, so you will get a error message like this:

javax.xml.stream.XMLStreamException: ParseError at [row, col]: [4,53]
Message: found: CHARACTERS, expected START_ELEMENT or END_ELEMENT

XMLStreamReader for the pointer-based, although the API documentation says that the "incident", but we see it as a "marker" easier to understand, and another will not be confused with event-based API.

XMLStreamReader some of the methods, regardless of the current tag (or events) is what type of, can be called. Their definition and role of the following:

? String getVersion ();// get the version information in XML documents
? String getEncoding ();// get the specified XML document encoding
? Javax.xml.namespace.NamespaceContext getNamespaceContext ();// get the current namespace context effectively, including the prefix, URI and other information
? String getNamespaceURI ();// get current valid namespace URI
? Javax.xml.stream.Location getLocation ();// get current marker location information, including line number, column number, etc.
? Boolean hasName ();// determine whether the current tag name, such as element or attribute
? Boolean hasText ();// check mark whether the current text, such as notes, characters or CDATA
? Boolean isStartElement ();// determine whether the current tag is tag the beginning
? Boolean isEndElement ();// determine whether the tags at the end of the current tag
? Boolean isCharacters ();// to judge whether the characters in the current tag
? Boolean isWhiteSpace ();// blank to judge whether the current tag

The above method is very easy to understand and remember, we do not write code display their results.

Let us look at the property to do this. Or the first familiarize yourself with their definitions:

Java code
int getAttributeCount ();

String getAttributeLocalName (int index);

QName getAttributeName (int index);

String getAttributeNamespace (int index);

String getAttributePrefix (int index);

String getAttributeType (int index);

String getAttributeValue (int index);

String getAttributeValue (String namespaceURI, String localName);

These methods are very easy to understand, basically looking method name and parameters used to know what it was. And the last method in the above example we have used the. Let us re-use a simple example program to further deepen the understanding of these methods.

Java code
/ / List all the user's name and age
public static void listNamesAndAges () (
XMLStreamReader reader = ListUsers.getStreamReader ();
try (
while (reader.hasNext ()) (
/ / Skip all white space, comments or processing instructions, to the next START_ELEMENT
int event = reader.nextTag ();
if (event == XMLStreamConstants.START_ELEMENT) (
if ("user". equalsIgnoreCase (reader.getLocalName ())) (
System.out.println ("Name:"
+ Reader.getAttributeValue (null, "name")
+ "; Age:"
+ Reader.getAttributeValue (null, "age"));
)
)
)
reader.close ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
)

Add it to the main method:

Java code
public static void main (String [] args) (
ListUsers.listNames ();
/ / ListUsers.listNamesAndAges ();
ListUsers.listAllAttrs ();
)

Run Results:

Quote
1.name = Tom; age = 28; gender = male;
2.name = Lily; age = 26; gender = female;
3.name = Frank; age = 32; gender = male;
4.name = Bob; age = 45; gender = male;
5.name = Kate; age = 25; gender = female;

As you can see here, has been able to successfully complete the XML document using the XMLStreamReader the parsed.

Above we introduced the pointer-based StAX API. Despite the high efficiency in this way, but did not provide an abstract XML structure, it is a low-level API.

More advanced iterator-based API allows applications to XML as a series of event objects, and each object, and applications to exchange XML part of the structure. Analysis of the application only need to determine the type of event, be converted to the corresponding specific type, and then use its method get the information belongs to the event object.

StAX iterator-based API is an object-oriented way, which is a pointer-based API is the biggest difference. It passed into the event object, enabling applications to object-oriented program can approach them, which is conducive to modular and the different components of Dai Ma Zhijian reuse.

The main event iterator API interface is javax.xml.stream.XMLEventReader and javax.xml.stream.events.XMLEvent. XMLStreamReader XMLEventReader and simpler than the more, because all the information on the analysis of events are encapsulated in the event object (XMLEvent) in.

Create XMLEvent object instance of the former also requires a XMLInputFactory. It has the following examples of such methods to create XMLEvent:

Java code

XMLEventReader createXMLEventReader (java.io.InputStream stream) throws XMLStreamException;

XMLEventReader createXMLEventReader (java.io.InputStream stream, String encoding) throws XMLStreamException;

XMLEventReader createXMLEventReader (java.io.Reader reader) throws XMLStreamException;

XMLEventReader createXMLEventReader (String systemId, java.io.InputStream stream) throws XMLStreamException;

XMLEventReader createXMLEventReader (String systemId, java.io.Reader reader) throws XMLStreamException;

XMLEventReader createXMLEventReader (Source source) throws XMLStreamException;

XMLEventReader createXMLEventReader (XMLStreamReader reader) throws XMLStreamException;

Finally a way different from the others, it is a XMLStreamReader object into a XMLEventReader object. It is worth noting, XMLInputFactory did not provide the XMLEventreader object into XMLStreamreader object. I think that in our development process, there should not be high-level API that need to be converted into low-level API to use the situation.

XMLEventReader Interface extends java.util.Iterator interface, which defines the following methods:

Java code
String getElementText () throws XMLStreamException;

boolean hasNext ();

XMLEvent nextEvent () throws XMLStreamException;

XMLEvent nextTag () throws XMLStreamException;

XMLEvent peek () throws XMLStreamException;

Which, getElementText (), hasNext (), nextTag () the meaning and usage of the three methods is similar to XMLStreamReader, which nextEvent () method is similar to XMLStreamReader the next () method. So, here only peed () method to do some shows.

Call peek () method, you will get next event object. It nextEvent () method of difference is that when two or more consecutive you call it, you get all the same event object.

Let us look at the methods defined XMLEvent interface. These methods can be divided into three categories. The first is to judge for the event type:

? Boolean isAttribute ();// determine whether the event object is an attribute
? Boolean isCharacters ();// to judge whether the character of the event object
? Boolean isStartDocument ();// determine whether the object is to document the event started
? Boolean isEndDocument ();// determine whether the event object is the end of the document
? Boolean isStartElement ();// determine whether the event object is the element start
? Boolean isEndElement ();// determine whether the event object is the element at the end of
? Boolean isEntityReference ();// determine whether the event object is a reference entity
? Boolean isNamespace ();// determine whether the event object is a namespace
? Boolean isProcessingInstruction ();// determine whether the event object is a processing instruction

The second group will XMLEvent into specific sub-class objects:

? Characters asCharacters ();// event object is converted to character
? StartElement asStartElement ();// start the event object to Tags
? EndElement asEndElement ();// end of the event object to Tags

The third category is for general information on the event object:

? Javax.xml.stream.Location getLocation ();// get event object's location information, similar to the XMLStreamReader the getLocation () method
? Int getEventType ();// get event object's type, similar to the XMLStreamReader of getEventType () method

Which, getEventType () method return value is defined XMLStreamConstants constant, its type and meaning and XMLStreamReader of getEventType () method returns the value of exactly the same.

Let's use some sample code to get familiar with the StAX API based on iterators to use, and then leads XMLEvent interface sub-interface type. We still use the users.xml file for testing:

Java code

/ / List all the information
@ SuppressWarnings ("unchecked")
public static void listAllByXMLEventReader () (
String xmlFile = ListUsers.class.getResource ("/"). getFile ()
+ "Users.xml";
XMLInputFactory factory = XMLInputFactory.newInstance ();
try (
/ / Create an event iterator-based reader object
XMLEventReader reader = factory
. CreateXMLEventReader (new FileReader (xmlFile));
/ / Traverse the XML document
while (reader.hasNext ()) (
XMLEvent event = reader.nextEvent ();
/ / If the event object is the start element
if (event.isStartElement ()) (
/ / Start element event object into
StartElement start = event.asStartElement ();
/ / Print the local name of the element tag
System.out.print (start.getName (). GetLocalPart ());
/ / Get all properties
Iterator attrs = start.getAttributes ();
while (attrs.hasNext ()) (
/ / Print all the attribute information
Attribute attr = (Attribute) attrs.next ();
System.out.print (":" + attr.getName (). GetLocalPart ()
+ "=" + Attr.getValue ());
)
System.out.println ();
)
)
reader.close ();
) Catch (FileNotFoundException e) (
e.printStackTrace ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
)

Add it to host program:

Java code
public static void main (String [] args) (
ListUsers.listNames ();
/ / ListUsers.listNamesAndAges ();
ListUsers.listAllAttrs ();
ListUsers.listAllByXMLEventReader ();
)

After running the following result:

Quote
company
depart: title = Develop Group
user: age = 28: name = Tom: gender = male
user: age = 26: name = Lily: gender = female
depart: title = Test Group
user: age = 32: name = Frank: gender = male
user: age = 45: name = Bob: gender = male
user: age = 25: name = Kate: gender = female

This example, we use the iterator-based StAX API to print out all elements of the local names and their full property information. We can see that it is used with the pointer-based StAX API is very similar to the usage. However, the use of object-oriented thinking, and more easy to understand.

We used two new interfaces: StartElement and Attribute. They are sub-interfaces XMLEvent interface, and are javax.xml.stream.events .* package. They are more specific type of event object. Actually javax.xml.stream.events, in addition to their own XMLEvent interfaces, other interfaces are its sub-interfaces. Their name and on behalf of a specific event object type as follows:

1.Attribute: element attributes
2.Characters: Character
3.Comment: Notes
4.DTD: DTD
5.StartDocument: the beginning of the document
6.EndDocument: the end of the document
7.StartElement: element start
8.EndElement: the end of the element
9.EntityDeclaration: Entity Declaration
10.EntityReference: entity reference
11.Namespace: namespace declaration
12.NotationDeclaration: mark the statement
13.ProcessingInstruction: processing instruction

You may find these classes look very familiar, because they XMLStreamReader of getEventType () method return value, which is defined constants XMLStreamConstants can find the corresponding one by one. All but the SAPCE (negligible blank) and CDATA (CDATA block). In other words, the pointer on the StAX API defined in the event type, based on the StAX API in the iterator object is to provide to the application form, which is why the latter is a more object-oriented high-level API ideological reasons.

These events represent the object interface is not only an event type, also contains information corresponding to the event object. The way they have most of the event object for access to information device, its meaning and the specific use, are easy to understand and use, so no details.

You may note, XMLEvent only a three asXXX () form of Fangfa it converted to the specific of the sub-type, if you want to work with these three kinds of event object types other than type Zai, Zhijie Shiyongjiangzhi type conversion Jiu can the.

Now that we grasp the StAX pull-based pointer analysis iterator-based API and the API pull of the basic application. We look at a slightly higher usage, it can help us better analytical work to complete XML document.

XMLInputFactory There are two ways to create stream reader:

Java code
XMLStreamException;
XMLStreamReader createFilteredReader (XMLStreamReader reader, StreamFilter filter) throws XMLStreamException;

XMLEventReader createFilteredReader (XMLEventReader reader, EventFilter filter) throws XMLStreamException;

They are XMLStreamReader and XMLEventReader add a filter, filter unwanted content analysis, leaving only the application of information used to resolve concerns. While we can do the same in the application of the filter to work as before as the sample program written, but the filters work to the advantage of the filter so that applications can focus on analytical work, and for general filter (such as notes), it filters into the filter logic can achieve some code reuse. This is consistent with the principles of software design.

If you write-off filter java.io.FileFilter file, then write StreamFilter and EventFilter on easier. Let's look at the definition of two interfaces:

Java code

public interface StreamFilter (
public boolean accept (XMLStreamReader reader);
)

public interface EventFilter (
public boolean accept (XMLEvent event);
)

We have to StreamFilter example to demonstrate the use of filters. To this end, we use the users.xml document prepared for the test on a new program:

Java code
/ **
* StreamFilter sample programs
*
* @ Author zangweiren 2010-4-19
*
* /
public class TestStreamFilter implements StreamFilter (

public static void main (String [] args) (
TestStreamFilter t = new TestStreamFilter ();
t.listUsers ();
)

@ Override
public boolean accept (XMLStreamReader reader) (
try (
while (reader.hasNext ()) (
int event = reader.next ();
/ / Accept only the beginning of element
if (event == XMLStreamConstants.START_ELEMENT) (
/ / Keep only user element
if ("user". equalsIgnoreCase (reader.getLocalName ())) (
return true;
)
)
if (event == XMLStreamConstants.END_DOCUMENT) (
return true;
)
)
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
return false;
)

public XMLStreamReader getFilteredReader () (
String xmlFile = TestStreamFilter.class.getResource ("/"). getFile ()
+ "Users.xml";
XMLInputFactory factory = XMLInputFactory.newFactory ();
XMLStreamReader reader;
try (
reader = factory.createXMLStreamReader (new FileReader (xmlFile));
/ / Create the reader with examples of filters
XMLStreamReader freader = factory
. CreateFilteredReader (reader, this);
return freader;
) Catch (FileNotFoundException e) (
e.printStackTrace ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
return null;
)

public void listUsers () (
XMLStreamReader reader = getFilteredReader ();
try (
/ / List all the user's name
while (reader.hasNext ()) (
/ / Filter has been completed by the filter, do not need to do here
System.out.println ("Name ="
+ Reader.getAttributeValue (null, "name"));

if (reader.getEventType ()! = XMLStreamConstants.END_DOCUMENT) (
reader.next ();
)
)
reader.close ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
)

)

Test results:

Quote
Name = Tom
Name = Lily
Name = Frank
Name = Bob
Name = Kate

You may have found, there is a difference with the previous cases, that is, let's print the user's information, and then call next () method; This is the first call java.util.Iterator next () method, and then obtain the object information different. Previously we have used the same treatment with the Iterator code. Here, there is a need to explain the problem.

The XMLStreamReader the next () method, the first time when the return call is the second mark (or events). To obtain the first tag, you need to call next () method called before getEventType () method. This is the place to note. The reason why we use the above code, like Java iterator approach, because the first tag is always START_DOCUMENT, but we do not need it to operate, and therefore adopted a familiar coding to enable easy understanding. XMLEventReader of nextEvent () method does not exist such a problem.

EventFilter usage and StreamFilter the same, no examples.

StAX also provides us with another marker or event object isolation method of filtering logic that StreamReaderDelegate and EventReaderDelegate two classes, which are located in javax.xml.stream.util .* package. StAX API interface, most of all, the two are indeed the class. They all do the same work, that is, wrapped up the XMLStreamReader and XMLEventReader, and to delegate all the methods (Delegate) to deal with them, neither adding any of the methods or logic, and no change or delete any means, it is used here the strategy (Strategy) mode. We can use decorate (Decorator) mode, to StreamReaderDelegate or EventReaderDelegate adding new features. Consider the following example:

Java code
/ **
* Test StreamReaderDelegate
*
* @ Author zangweiren 2010-4-19
*
* /
public class TestStreamDelegate (

public static void main (String [] args) (
TestStreamDelegate t = new TestStreamDelegate ();
t.listUsers ();
)

public XMLStreamReader getDelegateReader () (
String xmlFile = TestStreamFilter.class.getResource ("/"). getFile ()
+ "Users.xml";
XMLInputFactory factory = XMLInputFactory.newFactory ();
XMLStreamReader reader;
try (
reader = new StreamReaderDelegate (factory
. CreateXMLStreamReader (new FileReader (xmlFile))) (
/ / Override (Override) next () method to increase the filtering logic
@ Override
public int next () throws XMLStreamException (
while (true) (
int event = super.next ();
/ / Keep the beginning of the user element
if (event == XMLStreamConstants.START_ELEMENT
& & "User". EqualsIgnoreCase (getLocalName ())) (
return event;
) Else if (event == XMLStreamConstants.END_DOCUMENT) (
return event;
) Else (
continue;
)
)
)
);
return reader;
) Catch (FileNotFoundException e) (
e.printStackTrace ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
return null;
)

public void listUsers () (
XMLStreamReader reader = this.getDelegateReader ();
try (
while (reader.hasNext ()) (
reader.next ();
if (reader.getEventType ()! = XMLStreamConstants.END_DOCUMENT) (
/ / Lists the user's name and age
System.out.println ("Name ="
+ Reader.getAttributeValue (null, "name") + "; age ="
+ Reader.getAttributeValue (null, "age"));
)
)
reader.close ();
) Catch (XMLStreamException e) (
e.printStackTrace ();
)
)

)

Test results:

Quote
Name = Tom; age = 28
Name = Lily; age = 26
Name = Frank; age = 32
Name = Bob; age = 45
Name = Kate; age = 25

EventReaderDelegate usage and StreamReaderDelegate the same.

Now we introduce the two finished StAX parsing XML document the way, we may also use it with their own understanding. We finally summarize: XMLStreamReader and an application XMLEventReader allow the underlying XML stream iteration, the difference lies in how they provide to the outside after parsing XML message fragment. The former like a pointer that just parsed in the back of XML tags, and provide access to more information on the methods of the mark. Because do not create new objects, so it can save memory. The latter has a more object-oriented features, Java is a standard iterator, parser's current state is reflected in the event object, the application time in dealing with the event object does not need access to the parser / reader.

On the merits of a variety of XML parsing technologies

In addition we have just introduced the new Java 6.0 StAX support this XML document parsing technology, there are four types of widely used analytical methods, we will do them a brief introduction, and compare the advantages and disadvantages of the five technology performance for everyone in the development of analytical techniques in the choice of a reference.

1, DOM (Document Object Model)

Document object model methods. Hierarchical structure (similar to the tree) to organize the information nodes and fragments, mapping the structure of XML documents, allowing access to and operation of any part of the document. Is the official W3C standard.

Cited advantages:
1, allows applications to make changes to the data and structure.
2, access is two-way, may at any time in the tree up and down navigation, access and operate any part of the data.

Quote Disadvantages:
1, typically need to load the entire XML document to construct the hierarchical structure, large consumption of resources.

2, SAX (Simple API for XML)

Flow model in the push model approach. Through event-driven, each found a node on an event triggered by the completion of the callback method analysis of work, the logic of parsing XML documents need to complete the application.

Cited advantages:
1, no need to wait for all the data is processing, analysis can begin immediately.
2, only check the data in the read data is not saved in memory.
3, the conditions can be met at a stop when the resolution, have to parse the entire document.
4, efficiency and performance of higher resolution than the system memory to the document.

Quote Disadvantages:
1, requires the application responsible for their own TAG processing logic (such as maintenance of the parent / child relationship), use the trouble.
2, one-way navigation, it is very difficult to access different parts of the same document data does not support XPath.

3, JDOM (Java-based Document Object Model)

Java-specific document object model. Itself does not contain a parser, using SAX.

Cited advantages:
1, the use of specific classes rather than interfaces, simplifying DOM's API.
2, extensive use of the Java collection classes, convenient for Java developers.

Quote Disadvantages:
1, there is no better flexibility.
2, poor performance.

4, DOM4J (Document Object Model for Java)

Easy to use, use of Java collections framework, and fully supports DOM, SAX, and JAXP.

Cited advantages:
1, extensive use of the Java collection classes, convenient Java developers, while providing a number of alternatives to improve performance.
2, support for XPath.
3, there is a very good performance.

Quote Disadvantages:
1, extensive use of the interface, API more complex.

5, StAX (Streaming API for XML)

Flow model in the pull model approach. Pointer-based and iterator based on the support of two ways.

Cited advantages:
1, the interface is simple, easy to use.
2, using flow modeling method has better performance.

Quote Disadvantages:
1, one-way navigation, does not support XPath, it is very difficult to access different parts of the same document.

In order to compare the five methods in parsing XML documents on the performance, we create three different sizes of XML documents: smallusers.xml (100KB), middleusers.xml (1MB), bigusers.xml (10MB). We are more than five analytical methods were used to these three XML parsing, and then print out all the user information, and they were used to calculate the time. Test code will be given in the annex behind the article, just to compare their time-consuming.

Unit: s (seconds)
100KB 1MB 10MB
DOM 0.146s 0.469s 5.876s
SAX 0.110s 0.328s 3.547s
JDOM 0.172s 0.756s 45.447s
DOM4J 0.161s 0.422s 5.103s
StAX Stream 0.093s 0.334s 3.553s
StAX Event 0.131s 0.359s 3.641s

By the above test results show, the best performance is SAX, followed by the StAX Stream and StAX Event, DOM and DOM4J also has good performance. The worst performance is JDOM.

So, if your application demands high performance, SAX is the preferred course. If you need to access and control any data functions, DOM is a good choice, but in terms of the Java developer, DOM4J is the better choice.

If you only need to be parsed XML document, then overall performance, ease of use, object-oriented characteristics to measure various aspects, StAX Event is undoubtedly the best choice.

Appendix:

Accessories used in the article contained all the sample code, divided into two Eclipse projects: GreatTestProject and XMLTest, the implementation can be compiled. GreatTestProject is StAX API sample code; and XMLTest all five analytical methods use the sample, and can do performance testing for them. Which, XMLTest project jar package default is to use maven to manage, you can modify.
  • del.icio.us
  • StumbleUpon
  • Digg
  • TwitThis
  • Mixx
  • Technorati
  • Facebook
  • NewsVine
  • Reddit
  • Google
  • LinkedIn
  • YahooMyWeb

Related Posts of Java6.0 new features of StAX - a comprehensive analysis of Java XML parsing technologies

  • and analysis of the use of DWR

    1. Download dwr.jar, to add it to WEB-INF/lib directory 2. Modify web.xml file, add the mapping DWRServlet <servlet> <servlet-name> dwr-invoker </ servlet-name> <servlet-class> org.directwebremoting.servlet.DwrServlet </ se ...

  • Design Pattern in EJB Applications

    What is a Design Pattern Design patterns are a set of re-use, most people know, after cataloging, code design experience. For the use of design patterns are reusable code, so that the code more easily understood by others, and ensure the reliability code.

  • Javascript in the browser environment (seven) XMLHttpRequest

    XMLHttpRequest XMLHttpRequest is a function used to create a http request. XHR initial IE through ActiveX objects are realized. After the beginning of the various browsers support. Now AJAX is the popular XMLHttpRequest object to the adoption of the imple

  • Study Application hibernate (1)

    Basic configurations: Environment: myeclipse Version: 5.5.1 tomcat version: 5.5 1, myEclipse to adjust to the MyEclipse Database Explorer mode; a new database connection, the need for database-driven 2, myEclipse to adjust to the MyEclipse Hibernate ...

  • Openfire Plugin Developer's Guide

    Introduction Openfire features plug-ins are enhanced. This document is a guide to developers to create plug-ins. The structure of a plug-in Plug-ins Plug-ins openfireHome stored in the directory. When deploying a plug-in jar or war file, it will automatic

  • Hibernate connection pool configuration

    Hibernate connection pool configuration <! - Jdbc -> <property name="connection.driver_class"> oracle.jdbc.driver.OracleDriver </ property> <property name="connection.url"> jdbc: oracle: thin: @ 10.203.14.132:15

  • The EJB3 Persistence

    EJB3 persistence with Hibernate is very similar to the mechanism: Environment: Server: JBOSS5.0 Database: MySQL5.0 1. Set up a data source First of all, in jboss-5.0.0.GA \ server \ default \ deploy, the establishment of a database used to connect the dat

  • Based on JDBC, JPA Annotation achieve simple CRUD Generic Dao

    The origin of ideas are pretty long history of reasons: [Use iBATIS history] The use of iBATIS has been a long time, the system is to use the CRUD template tool to generate the code, although there are tools to generate, but looked at a lot of CRUD the Sq

blog comments powered by Disqus
Recent
Recent Entries
Tag Cloud
Random Entries