2005/5/10

     
 

ACDK XML

artefaktur

ACDK XML is a XML parsing framework similar to the java.xml package.



Content of this chapter:

   General
     Features
     Interfaces
     Implementation
   Samples
     Sample using libxml2
       Reading a XML file
       Building an XML File
       Sample parsing HTML



 General

 Features

Feature list:
  • Non-Validating and validating XML SAX parser.
  • Non-Validating validating XML DOM parser.
  • Supports Ascii, ISO Latin 1 and UTF8 encoding.
  • XPath suppport.
  • Namespace support.
  • Object serialization from/to XML ( XMLObject[Reader|Writer]).

  • All encodings supported by the ACDK framework acdk::locale.

 Interfaces

The acdk xml interfaces for SAX and DOM is a mixture of the official interface definition (http http://www.saxproject.org and http http://www.w3.org/TR/DOM-Level-2-Core/core.html) and the dom4j modell (http http://www.dom4j.org/), which simplifies the official dom modell.

The interfaces can be found the namespace  org::xml::sax,  org::w3c::dom and are implementation independent.


 Implementation


The acdk_xml modules has to alternative implementations:
  • libxml2 is a c xml library from the gnome project
    • Implementation in acdk::xml::libxmldom
    • SAX 2 parser
    • DOM 2
    • external entities
    • Validating agains a DTD
    • Namespace
    • XPath
  • expat is a c xml library

    !

    This implementation is deprecated. Please use the class in acdk::xml::libxmldom
    • Implementation in acdk::xml::sax, acdk::xml::dom
    • SAX parser
    • DOM 2

To parse an XML file to a DOM Tree, use the libxml implemenation!

 Samples

 Sample using libxml2

 Reading a XML file



// link org_xml and acdk_xml to the application
#include <org/w3c/dom/NodeList.h>
#include <acdk/xml/libxmldom/LibXMLDocumentBuilder.h>
using namespace ::org::w3c::dom;
using namespace ::org::xml::sax;
using namespace ::acdk::xml::libxmldom;

LibXMLDocumentBuilder parser;
parser.setExtendedFlags(XMLRF_PARSE_DTDVALID); // validate the XML file
::acdk::io::RFile f = new ::acdk::io::File("MyXMLFile.xml");
::acdk::io::RReader in = f->getReader();
RLibXMLDocument doc = (RLibXMLDocument)parser.parse(in, ::acdk::net::URL::fileAsUrlName(f));
RString v = doc->selectNode("/book/body/text()")->getNodeValue();
RString v2 = doc->selectNode("/book")->selectNode("body/text()")->getNodeValue();

 Building an XML File


RDocument document = LibXMLDocumentBuilder().newDocument();
  RElement root = document->addElement("root");
  RElement author1 = root->addElement( "author" )
                         ->addAttribute( "name", "James" )
                          ->addAttribute( "location", "UK" )
                          ->addText( "James Strachan" );
        
  RElement author2 = root->addElement( "author" )
                         ->addAttribute( "name", "Bob" )
                         ->addAttribute( "location", "US" )
                         ->addText( "Bob McWhirter" );

  RString xmlstr = document->toXML();
  System::out->println("Created Document: " + xmlstr);
Output:

Created Document: <?xml version="1.0"?>
 <root>
  <author name="James" location="UK">James Strachan</author>
  <author name="Bob" location="US">Bob McWhirter</author>
 </root>

Please refer to acdk_xml_dom_LibXML_Test.cpp for more samples, how to use the parser.

 Sample parsing HTML

This sample shows parsing a HTML page using CfgScript:

using acdk.xml.libxmldom;
using org.xml; 
using org.w3c.dom;

String urlString = "http://acdk.sourceforge.net";
URL url = new URL(urlString); 
Reader in = url.openStream();

DocumentBuilder docBuilder = new LibXMLDocumentBuilder();
// tell the parser that the document is a HTML document (and not XML)
docBuilder->setExtendedFlags(acdk.xml.sax.XMLRF_PARSE_HTML_FLAGS);  
// do parsing
Document acdkdoc = docBuilder.parse(in, urlString);
// select the text of <title> via xpath
out.println("The Title of the page is: " + acdkdoc.selectNode("/html/head/title/text()").getData());

A more complex sample, which merges two HTML Documents can be viewed at csf_acdk_tools_docutils_mergewxdocs_csf.

Sample using XML expat DOM

!

expat is deprecated

#include <acdk.h>
#include <acdk/lang/System.h>
#include <acdk/io/CharArrayReader.h>
#include <acdk/io/File.h>
#include <acdk/io/GlobFilenameFilter.h>
#include <acdk/io/MemWriter.h>
#include <acdk/xml/dom/DOMParser.h>
#include <org/w3c/dom/NodeList.h>

::acdk::io::RFile f = new ::acdk::io::File("MyXMLFile.xml");
::acdk::io::RReader in = f->getReader();
::acdk::io::MemWriter bout;
in->trans(&bout);
::acdk::xml::dom::RDOMParser parser = new ::acdk::xml::dom::DOMParser();
::acdk::xml::dom::RXMLDocument doc = parser->parse(bout.getBuffer());
RString v = doc->selectNode("/book/body/text()")->getNodeValue();
RString v2 = doc->selectNode("/book")->selectNode("body/text()")->getNodeValue();


Please refer also to the source of the unitests of acdk_xml.