
                         Python/XML Reference Guide
     _________________________________________________________________

                   The Python/XML Special Interest Group

                             xml-sig@python.org
                       (edited by akuchling@acm.org)

  Abstract:

   XML  is  the eXtensible Markup Language, a subset of SGML, intended to
   allow  the  creation  and  processing  of  application-specific markup
   languages. Python makes an excellent language for processing XML data.
   This  document  is  the  reference  manual for the Python/XML package,
   containing several XML modules.

   This  is  a draft document; 'XXX' in the text indicates that something
   has to be filled in later, or rewritten, or verified, or something.

   THIS  DOCUMENT  IS SIGNIFICANTLY OUTDATED, DO NOT USE IT AS A BASIS OF
   APPLICATION DEVELOPMENT.

Contents

     * 1 xml.dom.ext.c14n -- Canonical XML generation
     * 2 xml.ns -- XML Namespace constants
     * 3 xml.parsers.xmllib -- Augmented version of xmllib
     * 4 xml.sax.saxexts
          + 4.1 ExtendedParser methods
          + 4.2 ParserFactory methods
     * 5 xml.sax.saxlib
          + 5.1 AttributeList methods
          + 5.2 DocumentHandler methods
          + 5.3 DTDHandler methods
          + 5.4 EntityResolver methods
          + 5.5 ErrorHandler methods
          + 5.6 Locator methods
          + 5.7 Parser methods
          + 5.8 SAXException methods
          + 5.9 SAXParseException methods
     * 6 xml.sax.saxutils
          + 6.1 Location methods
     * 7 xml.utils.iso8601 
     * About this document ...

                1 xml.dom.ext.c14n -- Canonical XML generation

   This  module  takes  a  DOM  element  node  (and all its children) and
   generates canonical XML as defined by the W3C candidate recommendation
   http://www.w3.org/TR/xml-c14n.  (Unlike  the  specification,  however,
   general document subsets are not supported.)

   The module name, c14n, comes from the standard way of abbreviating the
   word  "canonicalization."  This  module is typically imported by doing
   from xml.dom.ext import Canonicalize.

   Canonicalize(node[output[**keywords]])
          This  function  generates  the  canonical  format. If output is
          specified,  the  data is sent by invoking its write method, the
          the  function will return None. If output is omitted or has the
          value  None,  then  the  Canonicalize will return the data as a
          string.

          The   keyword  argument  comments,  if  non-zero,  directs  the
          function  to leave any comment nodes in the output. By default,
          they are removed.

          The  keyword  argument  stripspace,  if  non-zero,  directs the
          function  to  strip all extra whitespace from text elements. By
          default,  whitespace is preserved. This argument should be used
          with  caution,  as  the  canonicalization specification directs
          that whitespace be preserved.

          The  keyword argument nsdict may be used to provide a namespace
          dictionary  that  is  assumed  to  be  in the node's containing
          context.  The  keys  are namespace prefixes, and the values are
          the  namespace URI's. If nsdict is None or an empty dictionary,
          then  an  initial  dictionary containing just the URI's for the
          xml and xmlns prefixes will be used.

                      2 xml.ns -- XML Namespace constants

   This  module  contains  the  definitions  of namespaces (and sometimes
   other  URI's)  used  by  a  variety of XML standards. Each class has a
   short   all-uppercase   name,   which  should  follow  any  (emerging)
   convention  for  how that standard is commonly used. For example, "ds"
   is  almost  always  used  as  the  namespace prefixes for items in XML
   Signature,  so  "DS"  is  the class name. Attributes within that class
   define  symbolic names (hopefully evocative) for ``constants'' used in
   that standard.

   class XMLNS
          The  Namespaces  in  XML recommendation defines the concept and
          syntactic constructs relating to XML namespaces.

        BASE
                The  namespace  URI  assigned  to namespace declarations.
                This is assigned to attributes named xmlns and attributes
                which have a namespace prefix of xmlns.

        XML
                The  namespace bound to this URI is used for all elements
                and  attributes  which  start  with  the  letters  "xml",
                regardless  of  case. No other elements or attributes are
                allowed to use this namespace.

        HTML
                This namespace is recommended for use with HTML 4.0.

   class XLINK
          The XML Linking Language defines document linking semantics and
          an  attribute  language  that  allows  these  semantics  to  be
          expressed in XML documents.

        BASE
                The  URI  of  the  global attributes defined in the XLink
                specification.  All  attributes  that define the presence
                and behavior of links are in this namespace.

   class SOAP
          Simple  Object Access Protocol defines a means of communicating
          with  objects  on servers. It can be used as a remote procedure
          call  (RPC)  mechanism,  or  as  a  basis  for  message passing
          systems.

        ENV
                This  URI  is  used for the namespace of the ``envelope''
                which  contains  the  message. Elements in this namespace
                provide   for   destination   identification   and  other
                information needed to route and decode the message.

        ENC
                The  namespace URI used for the optional payload encoding
                defined in section 5 of the SOAP specification.

        ACTOR_NEXT
                The   URI   specified   in  section  4.2.2  of  the  SOAP
                specification  which  is used to indicate the destination
                of a SOAP message.

   class DSIG
          The  namespace  URIs  given here are defined by the XML digital
          signature specification.

        BASE
                The basic namespace defined by the specification.

        C14N
                The   URI   by  which  Canonical  XML  (Version  1.0)  is
                identified    when    used   as   a   transformation   or
                canonicalization method.

        C14N_COMM
                This  URI  identifies ``canonical XML with comments,'' as
                described in Canonical XML (Version 1.0), section 2.1.

        C14N_EXCL
                The  URI by which the canonicalization variant defined in
                Exclusive   XML   Canonicalization   (Version   1.0)   is
                identified    when    used   as   a   transformation   or
                canonicalization method.

          The  specification  also  assigns  URIs  to specific methods of
          computing  message  digests  and signatures, and other encoding
          techniques used in the specification.

        DIGEST_SHA1
                The URI for the SHA-1 digest method.

        DIGEST_MD2
                The URI for the MD2 digest method.

        DIGEST_MD5
                The URI for the MD5 digest method.

        SIG_DSA_SHA1
                The  URI  used to specify the Digital Signature Algorithm
                (DSA)  with the SHA-1 hash algorithm. DSA is specified in
                FIPS PUB 186-2, Digital Signature Standard (DSS).

        SIG_RSA_SHA1
                The  URI  indicating  the  RSA  signature algorithm using
                SHA-1 for the secure hash.

        HMAC_SHA1
                URI for the SHA-1 HMAC algorithm.

        ENC_BASE64
                URI used to denote the base64 encoding and transform.

        ENVELOPED
                URI  used  to  specify  the enveloped signature transform
                method (section 6.6.4 of the specification).

        XPATH
                URI  used to specify the XPath filtering transform method
                (section 6.6.3 of the specification).

        XSLT
                URI  used  to  specify the XSLT transform method (section
                6.6.5 of the specification).

   class RNG
          The  URIs  provided  here  are  used  with  the Relax NG schema
          language.

        BASE
                The namespace URI of the elements defined by the Relax NG
                Specification.

   class SCHEMA

        BASE

        XSD1

        XSD2

        XSD3

        XSI1

        XSI2

        XSI3

          Two additional convenience attributes are defined:

        XSD_LIST
                A sequence of all ... namespaces.

        XSI_LIST
                A sequence of all ... namespaces.

   class XSLT
          XSLT,  defined  in  XML Stylesheet Language -- Transformations,
          defines a single namespace:

        BASE
                This  URI  is used as the namespace for all XSLT elements
                and for XSLT attributes attached to non-XSLT elements.

   class WSDL
          The Web Services Description Language (WSDL) defines a language
          to  specify the logical interactions with applications that use
          Web technologies as their access mechanism; this can be thought
          of  as  an  IDL  for  servers that speak HTTP instead of XDR or
          IIOP.

        BASE
                The basic namespace defined in this specification.

        BIND_SOAP
                The URI of the SOAP binding for WSDL.

        BIND_HTTP
                HTTP bindings for WSDL using the GET and POST methods.

        BIND_MIME
                The URI of the namespace for MIME-type bindings for WSDL.

              3 xml.parsers.xmllib -- Augmented version of xmllib

   This  is  a  version of the xmllib module from Python 1.5, modified to
   use  the  sgmlop  C  extension  when  it's  available. This produces a
   significant  speedup,  amounting to about a factor of 5. The interface
   is  unchanged  from  the  original  xmllib  module; consult the Python
   Library Reference documentation for that module.

                               4 xml.sax.saxexts

   make_parser([parser])
          A   utility  function  that  returns  a  Parser  object  for  a
          non-validating XML parser. If parser is specified, it must be a
          parser  name; otherwise, a list of available parsers is checked
          and the fastest one chosen.

   HTMLParserFactory
          An  instance  of  the  ParserFactory  class that's already been
          prepared   with  a  list  of  HTML  parsers.  Simply  call  its
          make_parser() method to get a Parser object.

   class ParserFactory()
          A general class to be used by applications for creating parsers
          on  foreign  systems  where  the  list  of installed parsers is
          unknown.

   SGMLParserFactory
          An  instance  of  the  ParserFactory  class that's already been
          prepared   with  a  list  of  SGML  parsers.  Simply  call  its
          make_parser() method to get a parser object.

   XMLParserFactory
          An  instance  of  the  ParserFactory  class that's already been
          prepared  with a list of nonvalidating XML parsers. Simply call
          its make_parser() method to get a parser object.

   XMLValParserFactory
          An  instance  of  the  ParserFactory  class that's already been
          prepared with a list of validating XML parsers. Simply call its
          make_parser() method to get a parser object.

   class ExtendedParser()
          This  class  is an experimental extended parser interface, that
          offers  additional  functionality  that may be useful. However,
          it's not specified by the SAX specification.

4.1 ExtendedParser methods

   close()
          Called  after  the  last  call  to feed, when there are no more
          data.

   feed(data)
          Feeds data to the parser.

   get_parser_name()
          Returns a single-word parser name.

   get_parser_version()
          Returns  the  version  of the imported parser, which may not be
          the one the driver was implemented for.

   is_dtd_reading()
          True  if  the parser is non-validating, but conforms to the XML
          specification by reading the DTD.

   is_validating()
          Returns true if the parser is validating, false otherwise.

   reset()
          Makes the parser start parsing afresh.

4.2 ParserFactory methods

   get_parser_list()
          Returns the list of possible drivers. Currently this starts out
          as ["xml.sax.drivers.drv_xmltok",
          "xml.sax.drivers.drv_xmlproc",
          "xml.sax.drivers.drv_xmltoolkit",
          "xml.sax.drivers.drv_xmllib"].

   make_parser([driver_name])
          Returns  a  SAX  driver  for  the first available parser of the
          parsers in the list. Note that the list contains drivers, so it
          first  tries the driver and if that exists imports it to see if
          the   parser  also  exists.  If  no  parsers  are  available  a
          SAXException is thrown.

          Optionally,  driver_name can be a string containing the name of
          the  driver to be used; the stored parser list will then not be
          used at all.

   set_parser_list(list)
          Sets the driver list to list.

                               5 xml.sax.saxlib

   class AttributeList()
          Interface  for  an  attribute  list.  This  interface  provides
          information  about  a  list  of attributes for an element (only
          specified  or defaulted attributes will be reported). Note that
          the  information  returned  by  this  object will be valid only
          during  the scope of the DocumentHandler.startElement callback,
          and  the  attributes  will  not  necessarily be provided in the
          order declared or specified.

   class DocumentHandler()
          Handle  general  document  events.  This  is  the  main  client
          interface for SAX: it contains callbacks for the most important
          document  events,  such  as  the start and end of elements. You
          need  to  create  an object that implements this interface, and
          then  register  it  with  the  Parser.  If  you  do not want to
          implement  the  entire  interface,  you can derive a class from
          HandlerBase,  which  implements  the default functionality. You
          can  find  the location of any document event using the Locator
          interface supplied by setDocumentLocator().

   class DTDHandler()
          Handle  DTD  events.  This  interface  specifies only those DTD
          events  required  for  basic  parsing  (unparsed  entities  and
          attributes).  If  you  do  not  want  to  implement  the entire
          interface,  you  can  extend  HandlerBase, which implements the
          default behaviour.

   class EntityResolver()
          This  is  the  basic  interface  for resolving entities. If you
          create an object implementing this interface, then register the
          object  with  your  Parser  instance,  the parser will call the
          method  in  your  object to resolve all external entities. Note
          that  HandlerBase  implements  this  interface with the default
          behaviour.

   class ErrorHandler()
          This  is  the  basic  interface  for SAX error handlers. If you
          create  an object that implements this interface, then register
          the  object  with your Parser, the parser will call the methods
          in  your  object  to  report all warnings and errors. There are
          three   levels   of   errors  available:  warnings,  (possibly)
          recoverable  errors, and unrecoverable errors. All methods take
          a SAXParseException as the only parameter.

   class HandlerBase()
          Default  base  class  for  handlers.  This class implements the
          default behaviour for four SAX interfaces, inheriting from them
          all:    EntityResolver,    DTDHandler,   DocumentHandler,   and
          ErrorHandler.  Rather  than implementing those full interfaces,
          you  may simply extend this class and override the methods that
          you  need.  Note  that the use of this class is optional, since
          you are free to implement the interfaces directly if you wish.

   class Locator()
          Interface for associating a SAX event with a document location.
          A locator object will return valid results only during calls to
          methods of the SAXDocumentHandler class; at any other time, the
          results are unpredictable.

   class Parser()
          Basic interface for SAX parsers. All SAX parsers must implement
          this  basic interface: it allows users to register handlers for
          different types of events and to initiate a parse from a URI, a
          character  stream,  or  a  byte stream. SAX parsers should also
          implement a zero-argument constructor.

   class SAXException(msg, exception, locator)
          Encapsulate  an  XML  error  or warning. This class can contain
          basic  error  or warning information from either the XML parser
          or  the  application: you can subclass it to provide additional
          functionality,  or  to add localization. Note that although you
          will  receive a SAXException as the argument to the handlers in
          the  ErrorHandler  interface,  you are not actually required to
          throw   the   exception;  instead,  you  can  simply  read  the
          information in it.

   class SAXParseException(msg, exception, locator)
          Encapsulate an XML parse error or warning.

          This  exception will include information for locating the error
          in   the   original   XML  document.  Note  that  although  the
          application will receive a SAXParseException as the argument to
          the  handlers in the ErrorHandler interface, the application is
          not  actually  required to throw the exception; instead, it can
          simply read the information in it and take a different action.

          Since this exception is a subclass of SAXException, it inherits
          the ability to wrap another exception.

5.1 AttributeList methods

   The  AttributeList  class  supports  some  of  the behaviour of Python
   dictionaries;  the len(), has_key(), keys() methods are available, and
   attr['href']  will retrieve the value of the href attribute. There are
   also additional methods specific to AttributeList:

   getLength()
          Return the number of attributes in the list.

   getName(i)
          Return the name of attribute i in the list.

   getType(i)
          Return  the  type  of an attribute in the list. i can be either
          the integer index or the attribute name.

   getValue(i)
          Return  the  value of an attribute in the list. i can be either
          the integer index or the attribute name.

5.2 DocumentHandler methods

   characters(ch, start, length)
          Handle a character data event.

   endDocument()
          Handle an event for the end of a document.

   endElement(name)
          Handle an event for the end of an element.

   ignorableWhitespace(ch, start, length)
          Handle an event for ignorable whitespace in element content.

   processingInstruction(target, data)
          Handle a processing instruction event.

   setDocumentLocator(locator)
          Receive  an  object  for  locating  the  origin of SAX document
          events.  You'll  probably want to store the value of locator as
          an attribute of the handler instance.

   startDocument()
          Handle an event for the beginning of a document.

   startElement(name, attrs)
          Handle an event for the beginning of an element.

5.3 DTDHandler methods

   notationDecl(name, publicId, systemId)
          Handle a notation declaration event.

   unparsedEntityDecl(publicId, systemId, notationName)
          Handle an unparsed entity declaration event.

5.4 EntityResolver methods

   resolveEntity(name, publicId, systemId)
          Resolve the system identifier of an entity.

5.5 ErrorHandler methods

   error(exception)
          Handle a recoverable error.

   fatalError(exception)
          Handle a non-recoverable error.

   warning(exception)
          Handle a warning.

5.6 Locator methods

   getColumnNumber()
          Return the column number where the current event ends.

   getLineNumber()
          Return the line number where the current event ends.

   getPublicId()
          Return the public identifier for the current event.

   getSystemId()
          Return the system identifier for the current event.

5.7 Parser methods

   parse(systemId)
          Parse an XML document from a system identifier.

   parseFile(fileobj)
          Parse an XML document from a file-like object.

   setDocumentHandler(handler)
          Register an object to receive basic document-related events.

   setDTDHandler(handler)
          Register an object to receive basic DTD-related events.

   setEntityResolver(resolver)
          Register an object to resolve external entities.

   setErrorHandler(handler)
          Register an object to receive error-message events.

   setLocale(locale)
          Allow an application to set the locale for errors and warnings.

          SAX parsers are not required to provide localisation for errors
          and  warnings;  if  they  cannot  support the requested locale,
          however,  they  must  throw  a  SAX exception. Applications may
          request a locale change in the middle of a parse.

5.8 SAXException methods

   getException()
          Return the embedded exception, if any.

   getMessage()
          Return a message for this exception.

5.9 SAXParseException methods

   The  SAXParseException  class  has  a locator attribute, containing an
   instance  of  the  Locator class, which represents the location in the
   document  where  the  parse  error occurred. The following methods are
   delegated to this instance.

   getColumnNumber()
          Return  the  column  number  of  the  end of the text where the
          exception occurred.

   getLineNumber()
          Return  the  line  number  of  the  end  of  the text where the
          exception occurred.

   getPublicId()
          Return  the public identifier of the entity where the exception
          occurred.

   getSystemId()
          Return  the system identifier of the entity where the exception
          occurred.

                              6 xml.sax.saxutils

   escape(data[, entities])
          Escape "&", "<", and ">" in a string of data.

          You can escape other strings of data by passing a dictionary as
          the  optional  entities parameter. The keys and values must all
          be  strings;  each  key will be replaced with its corresponding
          value.

   quoteattr(data[, entities])
          Similar  to  escape(),  but also prepares data to be used as an
          attribute  value.  The return value is a quoted version of data
          with  any  additional  required  replacements. quoteattr() will
          select  a  quote  character  based  on  the  content  of  data,
          attempting  to  avoid  encoding  any  quote  characters  in the
          string. If both single- and double-quote characters are already
          in  data,  the double-quote characters will be encoded and data
          will  be  wrapped  in doule-quotes. The resulting string can be
          used directly as an attribute value:

>>> print "<element attr=%s>" % quoteattr("ab ' cd \" ef")
<element attr="ab ' cd &quot; ef">

          This  function  is  useful when generating attribute values for
          HTML or any SGML using the reference concrete syntax.

   class Canonizer(writer)
          A  SAX document handler that produces canonicalized XML output.
          writer  must  support  a  write() method which accepts a single
          string.

   class ErrorPrinter()
          A  simple  class  that  just  prints error messages to standard
          error (sys.stderr).

   class ESISDocHandler(writer)
          A  SAX document handler that produces naive ESIS output. writer
          must support a write() method which accepts a single string.

   class EventBroadcaster(list)
          Takes  a list of objects and forwards any method calls received
          to  all  objects in the list. The attribute list holds the list
          and can freely be modified by clients.

   class Location(locator)
          Represents  a  location  in an XML entity. Initialized by being
          passed a locator, from which it reads off the current location,
          which is then stored internally.

6.1 Location methods

   getColumnNumber()
          Return the column number of the location.

   getLineNumber()
          Return the line number of the location.

   getPublicId()
          Return the public identifier for the location.

   getSystemId()
          Return the system identifier for the location.

                              7 xml.utils.iso8601

   The  xml.utils.iso8601 module provides conversion routines between the
   ISO 8601  representations  of  date/time values and the floating point
   values used elsewhere in Python. The floating point represtentation is
   particularly useful in conjunction with the standard time module.

   Currently,  this  module  supports  a  small  superset of the ISO 8601
   profile  described  by  the World Wide Web Consortium (W3C). This is a
   subset  of  ISO 8601,  but  covers  the cases expected to be used most
   often  in  the  context of XML processing and Web applications. Future
   versions   of   this   module   may   support   a   larger  subset  of
   ISO 8601-defined formats.

   parse(s)
          Parse   an  ISO 8601  date  representation  (with  an  optional
          time-of-day component) and return the date in seconds since the
          epoch.

   parse_timezone(timezone)
          Parse  an  ISO 8601  time zone designator and return the offset
          relative  to  Universal  Coordinated  Time (UTC) in seconds. If
          timezone is not valid, ValueError is raised.

   tostring(t[, timezone])
          Return  formatted  date/time  value  according  to  the profile
          described  by  the W3C. If timezone is provided, it must be the
          offset  from UTC in seconds specified as a number, or time zone
          designator which can be parsed by parse_timezone(). If timezone
          is   specified   as   a   string   and   cannot  be  parsed  by
          parse_timezone(), ValueError will be raised.

   ctime(t)
          Return formatter date/time value using the local timezone. This
          is equivalent to "tostring(t, time.timezone)".

   See Also:

   International  Organization  for  Standardization.  Data  elements and
   interchange  formats  --  Information interchange -- Representation of
   dates and times. International Organization for Standardization, 1988.

   Gary   Houston.  ISO 8601  date/time  representations.  January  1993.
   Available online as compressed PostScript:
   ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z.

   Markus  Kuhn.  A  Summary  of  the International Standard Dateand Time
   Notation. Available online at
   http://www.cl.cam.ac.uk/~mgk25/iso-time.html.

   Misha  Wolf  and  Charles Wicksteed. Date and Time Formats. World Wide
   Web  Consortium  Technical  Note,  September 1998. Available online at
   http://www.w3.org/TR/NOTE-datetime.

                            About this document ...

   Python/XML Reference Guide

   This document was generated using the LaTeX2HTML translator.

   LaTeX2HTML  is Copyright  1993, 1994, 1995, 1996, 1997, Nikos Drakos,
   Computer  Based  Learning  Unit,  University of Leeds, and Copyright 
   1997,  1998, Ross Moore, Mathematics Department, Macquarie University,
   Sydney.

   The  application  of  LaTeX2HTML  to the Python documentation has been
   heavily  tailored by Fred L. Drake, Jr. Original navigation icons were
   contributed by Christopher Petrilli.
     _________________________________________________________________

                         Python/XML Reference Guide
     _________________________________________________________________

   Release 0.06.
