NioSax – Sax style xml parser for Java NIO

NioSax (pronounced ‘Neo-Sax’) provides a Java NIO friendly XML push parser similar in operation to SAX. Unlike SAX, with NioSax it is possible for the xml source to contain partial content (i.e. only part of the XML stream has been received over the network). When this occurs, instead of failing with an error, NioSax simply stops. As soon as your application receives more data you simply call the same instance of the parser again and it will resume parsing where it left off.

The public API consists of the classes within this package, although the bare minimum required for use are the NioSaxParser, NioSaxParserHandler and NioSaxSource classes.

To use NioSax you simply use NioSaxParserFactory to create a NioSaxParser, implement a SAX ContentHandler and finally create a NioSaxSource which references the content.

Then you can parse one or more ByteBuffer’s by updating the NioSaxSource with each buffer and pass it to the NioSaxParser.parse(NioSaxSource) method.

The only other two things you must to do with the parser is to ensure that you call NioSaxParser.startDocument() prior to any parsing, and call NioSaxParser.endDocument() once you are done with the parser so any resources used can be cleaned up.

Example

First in maven we need to add a dependency to NioSax. For details of the repository click on the ‘reteptools’ menu above. However you’ll need to add the following to your pom:

<dependency>
    <groupId>uk.org.retep</groupId>
    <artifactId>niosax</artifactId>
    <version>10.6</version>
</dependency>

Now we’ll create a parser:

import java.nio.ByteBuffer;
import uk.org.retep.niosax.NioSaxParser;
import uk.org.retep.niosax.NioSaxParserFactory;
import uk.org.retep.niosax.NioSaxParserHandler;
import uk.org.retep.niosax.NioSaxSource;

public class MyParser
{
    private NioSaxParser parser;
    private NioSaxParserHandler handler;
    private NioSaxSource source;

    public void start()
    {
        NioSaxParserFactory factory = NioSaxParserFactory.getInstance();

        parser = factory.newInstance();
        parser.setHandler( handler );
        source = new NioSaxSource();

        parser.startDocument();
    }
}

Next, when you receive data from some nio source and have the data in a ByteBuffer you need to pass it to the parser:

    public void parse( ByteBuffer buffer )
    {
        // flip the buffer so the parser starts at the beginning
        buffer.flip();

        // update the source (presuming the buffer has changed)
        source.setByteBuffer( buffer );

        // Parse the available content then compact
        parser.parse( source );
        source.compact();
    }

Finally we must call endDocument() to release any resources:

    public void close()
    {
        // releases any resources and notifies the handler the docment has completed
        parser.endDocument();
    }

Now all we need to is when we receive some data from an external source like a Socket, we pass the ByteBuffer to the parse method. This then passes it to the NioSax parser which in turn calls the ContentHandler as the parse progresses.

When it gets to the end of the available content, it compacts the buffer so that it can be reused.

Usually the buffer will now be empty, however if there was partial content (like only part of a Unicode character was present) then the parser would stop prior to that character and that character would remain in the buffer. The next packet received via nio would have the rest of that character and the parser would then continue where it left off.

This was originally posted early in 2009 but the post seemed to have vanished so this article is loosely based on the documentation for NioSax.

About these ads

12 thoughts on “NioSax – Sax style xml parser for Java NIO

  1. [...] artykuł: NioSax – Sax style xml parser for Java NIO « Retep Open Source Tags: been-, contain-partial, i-e-only, nio, possible-for, push-parser, sax, stream-has, xml [...]

  2. anon_anon says:

    Instead of SAX, have you investigated vtd-xml?

    • petermount1 says:

      No, because I must support NIO based XML streams, so the problem there is that I need something similar to STaX but in a push rather than a pull configuration. Also because character sets other than ASCII must be supported I had to handle the possibility of the stream stopping part way through due to the data being split up as it’s sent over the network. No existing API out there supports that (without parsing from the beginning again), hence ending up writing one from scratch.

      Also, the output had to be DOM as that is then passed on to other frameworks (specifically JAXB in my case) so a non-standard framework would be out.

  3. i have been searching for such sax parser quite few times. It Looks impressing.
    is nioxml completely xml spec compliant. are there any limitations?
    and what is the license?

    I will look into more details of into your project.
    BTW, I also have a project which contains some Core libraries. If you are interested you can have a look at
    http://code.google.com/p/jlibs/

    • petermount1 says:

      It’s not got any validation in there – the core just parses what it receives into a DOM tree which can then be consumed either in it’s entirety or in fragments (it was originally written to support XML streams, specifically XMPP/Jabber).

      As for licence, it’s BSD.

  4. Thanks for sharing your code. It works like charm :) due my leak of knowledge about the Charset internals i would suggest use JDK NIO Charset API


    // Snip of NioSaxSource
    public final boolean isValid(final char c) {
    return c != NOT_ENOUGH_DATA && c != INVALID_CHAR;
    }

    public final boolean hasCharacter() {
    return buffer != null && buffer.hasRemaining();
    }

    public final char decode() {
    if (!hasCharacter()) {
    return NOT_ENOUGH_DATA;
    }
    b.rewind();
    //Where decoder = charset.newDecoder() and b = CharBuffer.allocate(1)
    CoderResult result = decoder.decode(buffer, b, true);
    if (result.isError()) {
    return INVALID_CHAR;
    }
    return b.get(0);
    }

    Thanks again for sharing

    • petermount1 says:

      The problem with the NIO CharSet API is that it decodes an entire ByteBuffer in one go and it assumes that everything is in that buffer. If the buffer is only partial (for example due to a fragmented network packet) then it will fail.

      Heres the problem, say you receive from the network a couple of UTF-16 A & B characters. This would be 4 bytes in total:

      [A0][A1][B0][B1]

      Due to the network fragmenting the packet the second one is only partially received (the second byte is still in transit). In this case our ByteBuffer contains:

      [A0][A1][B0]

      Then the nio API would fail as the second character is incomplete. What I do is to decode up the the beginning of the partial but leave it in the ByteBuffer – hence the NOT_ENOUGH_DATA state. Then when you return that buffer to NIO, it then appends the next block from the network which happens to have [B1], the char is then complete and it can be decoded.

      • According to the javadoc of method decode(ByteBuffer, CharBuffer, boolean)

        The buffers positions will be advanced to reflect the bytes read and the characters written, but their marks and limits will not be modified. </

        so if we have [A0][A1][B0], invoking decode method will decode first char successfully but it will fail on the second char with CR_MALFORMED result, my point here is the buffer will still have 1 byte remaining so we can resume decoding again when more bytes available. I have setup a test unit to simulate the network packets.


        //Snip of NioSaxSource
        private CharsetDecoder decoder;
        private ByteBuffer buf;
        private final CharBuffer tmp = CharBuffer.allocate(1);
        private int remaning;

        public final boolean hasCharacter() {
        return buf != null && remaning != buf.remaining() && buf.
        hasRemaining();
        }

        public final char decode() {
        if (hasCharacter()) {
        tmp.rewind();
        CoderResult result = decoder.decode(buf, tmp, true);
        if (result.isError()) {
        remaning = buf.remaining();
        } else {
        remaning = 0;
        return tmp.get(0);
        }
        }
        return NOT_ENOUGH_DATA;
        }

        And the test unit

        @Test
        public void testRandomSequence() throws Exception {
        byte[] b = createXMLUTF8ByteArray();
        Random rand = new Random();
        List parts = new ArrayList();
        int min = 0, max = 0;
        while (max != b.length) {
        max = rand.nextInt(b.length - min) + min;
        if (!parts.contains(max)) {
        if (max == b.length - 1) {
        max += 1;
        }
        parts.add(max);
        }
        min = max;
        }
        System.out.println(parts + " " + b.length);
        start();
        min = 0;
        ByteBuffer f = ByteBuffer.allocate(b.length);
        source.setByteBuffer(f);
        for (Integer i : parts) {
        f.mark();
        f.limit(i);
        f.position(min);
        f.put(b, min, i - min);
        f.reset();
        min = i;
        parser.parse(source);
        }
        stop();
        }

  5. bud says:

    Hi, i’ve started a little project but i’m in trouble with nio – xml decoding.
    I can’t fint your source or a package to use. Can you help me?

  6. everybuddy says:

    Hi, i’m in stuck with a little project with nio-xml decoding. I can’t find any source or package, can you help me?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,762 other followers

%d bloggers like this: