PERL Modules | XML-TreePP-XMLPath

CodePin.org



ABOUT

A pure PERL module to compliment the pure PERL XML::TreePP module. XMLPath may be similar to XPath, and it does attempt to conform to the XPath standard when possible, but it is far from being fully XPath compliant.
Its purpose is to implement an XPath-like accessor methodology to nodes in a XML::TreePP parsed XML Document. In contrast, XPath is an accessor methodology to nodes in an unparsed XML Document.


AVAILABILITY

View the XML-TreePP-XMLPath README, CHANGES, online.
View the XML-TreePP-XMLPath Subversion repository online.

Download the source code from Subversion.

  • Subversion: https://dev.codepin.org/svn/perlmod/XML-TreePP-XMLPath/trunk
  • Revisions: https://dev.codepin.org/svn/perlmod/XML-TreePP-XMLPath/tags
  • Latest Revisions:
    daterevisiontarballsummarized description
    2013.05.31 0.72 XML-TreePP-XMLPath-0.72.tgz Fixes documentation about internal use of XML::TreePP. Allows XML:TreePP object to be provide in new() method.
    2013.05.30 0.71 XML-TreePP-XMLPath-0.71.tgz Modifies setter and getter to not modify properties of the internally referenced tpp module.
    2013.04.16 0.70 XML-TreePP-XMLPath-0.70.tgz Fixes bug in calling $tpp->($x), such that it dies when $x is not defined. Fixes bug in method charlexsplit() not recognizing strings with escape characters. Removed all use and references to Data::Dump in code and documentation. Added new method assembleXMLPath() for assembling an XMLPath from a provided representative array or hash ref structure. Removed deprecated methods validateAttrValue() and getSubtree().
    2011.01.17 0.63 XML-TreePP-XMLPath-0.63.tgz Fixes a bug exposed in PERL 5.13.1 and higher regarding the sharing of local variables with a sub reference. Replaces the Data::Dump module with Data::Dumper for cloning XML Structures. Data::Dumper quotes hash keys, which is less problimatic in Perl 5.13.1 and higher.
    2010.02.04 0.62 XML-TreePP-XMLPath-0.62.tgz This fixes a bug for getValues() method. This method would not return the value of an attribute if the value was "0" (zero).
    2009.11.04 0.61 XML-TreePP-XMLPath-0.61.tgz Bug fix for issue when filtering for the root element (i.e. '/root') filterXMLDoc returned a hash as a result rather than an ARRAY ref.
    2009.10.07 0.60 XML-TreePP-XMLPath-0.60.tgz Major changes to internal functionality of the filterXMLDoc() method, which includes a new mapping option (structure => type) to define the the format of the returned results. The filterXMLDoc() method now has good support for use of the special parent (..) indication in an XMLPath.
    2009.09.08 0.56 XML-TreePP-XMLPath-0.56.tgz Fixed bug caused by getElements and getAttributes using the deprecated method getSubtree(). Added new test cases. Expanded documentation to show how to use this module with a non-XML generic PERL code referece tree.
    2009.08.10 0.55 XML-TreePP-XMLPath-0.55.tgz Added new method getValues(), now using carp in place of warn, added Data::Dump dependency to Makefile to allow for successful building.
    2009.05.12 0.52 XML-TreePP-XMLPath-0.52.tgz Removed Params::Validate dependency, and expanded XMLPath filtering support.
    2008.11.10 0.51 XML-TreePP-XMLPath-0.51.tgz Added Params::Validate dependency to Makefile to allow for successful building.
    2008.11.03 0.50 XML-TreePP-XMLPath-0.50.tgz Initial Release

Download from CPAN: http://cpan.perl.org/modules/by-module/XML/


POD DOCUMENTATION



NAME

XML::TreePP::XMLPath - Similar to XPath, defines a path as an accessor to nodes of an XML::TreePP parsed XML Document.


SYNOPSIS

    use XML::TreePP;
    use XML::TreePP::XMLPath;
    
    my $tpp = XML::TreePP->new();
    my $tppx = XML::TreePP::XMLPath->new();
    
    my $tree = { rss => { channel => { item => [ {
        title   => "The Perl Directory",
        link    => "http://www.perl.org/";,
    }, {
        title   => "The Comprehensive Perl Archive Network",
        link    => "http://cpan.perl.org/";,
    } ] } } };
    my $xml = $tpp->write( $tree );

Get a subtree of the XMLTree:

    my $xmlsub = $tppx->filterXMLDoc( $tree , q{rss/channel/item[title="The Comprehensive Perl Archive Network"]} );
    print $xmlsub->{'link'};

Iterate through all attributes and Elements of each <item> XML element:

    my $xmlsub = $tppx->filterXMLDoc( $tree , q{rss/channel/item} );
    my $h_attr = $tppx->getAttributes( $xmlsub );
    my $h_elem = $tppx->getElements( $xmlsub );
    foreach $attrHash ( @{ $h_attr } ) {
        while my ( $attrKey, $attrVal ) = each ( %{$attrHash} ) {
            ...
        }
    }
    foreach $elemHash ( @{ $h_elem } ) {
        while my ( $elemName, $elemVal ) = each ( %{$elemHash} ) {
            ...
        }
    }

EXAMPLE for using XML::TreePP::XMLPath to access a non-XML compliant tree of PERL referenced data.

    use XML::TreePP::XMLPath;
    
    my $tppx = new XML::TreePP::XMLPath;
    my $hashtree = {
        config => {
            nodes => {
                "10.0.10.5" => {
                    options => [ 'option1', 'option2' ],
                    alerts => {
                        email => 'someone@nowhere.org'
                    }
                }
            }
        }
    };
    print $tppx->filterXMLDoc($hashtree, '/config/nodes/10.0.10.5/alerts/email');
    print "\n";
    print $tppx->filterXMLDoc($hashtree, '/config/nodes/10.0.10.5/options[2]');
    print "\n";

Result


    someone@nowhere.org
    option2


DESCRIPTION

A pure PERL module to compliment the pure PERL XML::TreePP module. XMLPath may be similar to XPath, and it does attempt to conform to the XPath standard when possible, but it is far from being fully XPath compliant.

Its purpose is to implement an XPath-like accessor methodology to nodes in a XML::TreePP parsed XML Document. In contrast, XPath is an accessor methodology to nodes in an unparsed (or raw) XML Document.

The advantage of using XML::TreePP::XMLPath over any other PERL implementation of XPath is that XML::TreePP::XMLPath is an accessor to XML::TreePP parsed XML Documents. If you are already using XML::TreePP to parse XML, you can use XML::TreePP::XMLPath to access nodes inside that parsed XML Document without having to convert it into a raw XML Document.

As an additional side-benefit, any PERL HASH/ARRY reference data structure can be accessible via the XPath accessor method provided by this module. It does not have to a parsed XML structure. The last example in the SYNOPSIS illustrates this.


REQUIREMENTS

The following perl modules are depended on by this module: ( Note: Dependency on Params::Validate was removed in version 0.52; Dependency on Data::Dump was removed in version 0.64 )


IMPORTABLE METHODS

When the calling application invokes this module in a use clause, the following methods can be imported into its space.

Example:

    use XML::TreePP::XMLPath qw(parseXMLPath filterXMLDoc getValues getAttributes getElements);


REMOVED METHODS

The following methods are removed in the current release.


XMLPath PHILOSOPHY

General Illustration of XMLPath

Referring to the following XML Data.

    <paragraph>
        <sentence language="english">
            <words>Do red cats eat yellow food</words>
            <punctuation>?</punctuation>
        </sentence>
        <sentence language="english">
            <words>Brown cows eat green grass</words>
            <punctuation>.</punctuation>
        </sentence>
    </paragraph>

Where the path ``paragraph/sentence[@language=english]/words'' has two matches: ``Do red cats eat yellow food'' and ``Brown cows eat green grass''.

Where the path ``paragraph/sentence[@language]'' has the same previous two matches.

Where the path ``paragraph/sentence[2][@language=english]/words'' has one match: ``Brown cows eat green grass''.

And where the path ``paragraph/sentence[punctuation=.]/words'' matches ``Brown cows eat green grass''

So that ``[@attr=val]'' is identified as an attribute inside the ``<tag attr='val'></tag>''

And ``[attr=val]'' is identified as a nested attribute inside the ``<tag><attr>val</attr></tag>''

And ``[2]'' is a positional argument identifying the second node in a list ``<tag><attr>value-1</attr><attr>value-2</attr></tag>''.

And ``@attr'' identifies all nodes containing the @attr attribute. ``<tag><item attr=''value-A``>value-1</item><item attr=''value-B``>value-2</item></tag>''.

After XML::TreePP parses the above XML, it looks like this:

    {
      paragraph => {
            sentence => [
                  {
                    "-language" => "english",
                    punctuation => "?",
                    words => "Do red cats eat yellow food",
                  },
                  {
                    "-language" => "english",
                    punctuation => ".",
                    words => "Brown cows eat green grass",
                  },
                ],
          },
    }

Noting Attribute Identification in Parsed XML

Note that attributes are specified in the XMLPath as @attribute_name, but after XML::TreePP::parse() parses the XML Document, the attribute name is identified as -attribute_name in the resulting parsed document. This can be changed in Object Oriented mode using the $tppx-tpp->set(attr_prefix=>'@')> method to set the attr_prefix attribute in the XML::TreePP object referenced internally. It should only be changed if the XML Document is provided as already parsed, and the attributes are represented with a value other than the default. This document uses the default value of - in its examples.

XMLPath requires attributes to be specified as @attribute_name and takes care of the conversion from @ to - behind the scenes when accessing the XML::TreePP parsed XML document.

Child elements on the next level of a parent element are accessible as attributes as attribute_name. This is the same format as @attribute_name except without the @ symbol. Specifying the attribute without an @ symbol identifies the attribute as a child element of the parent element being evaluated.

Noting Text (CDATA) Identification in Parsed XML

Additionally, the values of child elements are identified in XML parsed by XML::TreePP::parse() with the # pound/hash symbol. This can be changed via the text_node_key property in the XML::TreePP object referenced by XML::TreePP::XMLPath-tpp()>. XML::TreePP::XMLPath derives the value to use from this.

Accessing Child Element Values in XMLPath

Child element values are only accessible as CDATA. That is when the element being evaluated is animal, the attribute (or child element) is cat, and the value of the attribute is tiger, it is presented as this:

    <jungle>
        <animal>
            <cat>tiger</cat>
        </animal>
    </jungle>

The XMLPath used to access the key=value pair of cat=tiger for element animal would be as follows:

    jungle/animal[cat='tiger']

And in version 0.52, in this second case, the above XMLPath is still valid:

    <jungle>
        <animal>
            <cat color="black">tiger</cat>
        </animal>
    </jungle>

In version 0.52, the period (.) is supported as it is in XPath to represent the current context node. As such, the following XMLPaths would also be valid:

    jungle/animal/cat[.='tiger']
    jungle/animal/cat[@color='black'][.='tiger']

One should realize that in these previous two XMLPaths, the element cat is being evaluated, and not the element animal as in the first case. And will be undesirable if you want to evaluate animal for results.

To perform the same evaluation, but return the matching animal node, the following XMLPath can be used:

    jungle/animal[cat='tiger']

To evaluate animal and cat, but return the matching cat node, the following XMLPaths can be used:

    jungle/animal[cat='tiger']/cat
    jungle/animal/cat[.='tiger']

The first path analyzes animal, and the second path analyzes cat. But both matches the same node ``<cat color='black>tiger</cat>''.

Matching Attributes

Prior to version 0.52, attributes could only be used in XMLPath to evaluate an element for a result set. As of version 0.52, attributes can now be matched in XMLPath to return their values.

This next example illustrates:

    <jungle>
        <animal>
            <cat color="black">tiger</cat>
        </animal>
    </jungle>
    
    /jungle/animal/cat[.='tiger']/@color

The result set of this XMLPath would be ``black''.


METHODS

tpp

This module is an extension of the XML::TreePP module. As such, it uses the module in many different methods to parse XML Documents, and to get the value of XML::TreePP properties like attr_prefix and text_node_key.

The XML::TreePP module, however, is only loaded into XML::TreePP::XMLPath when it becomes necessary to perform the previously described requests. For the aformentioned properties attr_prefix and text_node_key, default values are used if a XML::TreePP object has not been loaded.

To avoid having this module load the XML::TreePP module, do not pass in unparsed XML documents. The caller would instead want to parse the XML document with XML::TreePP::parse() before passing it in. Passing in an unparsed XML document causes this module to load XML::TreePP in order to parse it for processing.

Alternately, If the caller has loaded a copy of XML::TreePP, that object instance can be assigned to be used by the instance of this module using this method. In doing so, when XML::TreePP is needed, the instance provided is used instead of loading another copy.

If this module has loaded an instance of <XML::TreePP>, this instance can be directly accessed or retrieved through this method. For example, the aformentioned properties can be set.

    $tppx->tpp->set('attr_prefix','@');  # default is (-) dash
    $tppx->tpp->set('text_node_key','#');  # default is (#) pound

If you want to only get the internally loaded instance of XML::TreePP, but do not want to load a new instance and instead have undef returned if an instance is not already loaded, then use the get() method.

    my $tppobj = $tppx->get( 'tpp' );
    warn "XML::TreePP is not loaded in XML::TreePP::XMLPath.\n" if !defined $tppobj;

This method was added in version 0.52

  • XML::TreePP

    An instance of XML::TreePP that this object should use instead of, when needed, loading its own copy. If not provided, the currently loaded instance is returned. If an instance is not loaded, an instance is loaded and then returned.

  • returns

    Returns the result of setting an instance of XML::TreePP in this object. Or returns the internally loaded instance of XML::TreePP. Or loads a new instance of XML::TreePP and returns it.

        $tppx->tpp( new XML::TreePP );  # Sets the XML::TreePP instance to be used by this object
        $tppx->tpp();  # Retrieve the currently loaded XML::TreePP instance

set

Set the value for a property in this object instance. This method can only be accessed in object oriented style.

This method was added in version 0.52

  • propertyname

    The property to set the value for.

  • propertyvalue

    The value of the property to set. If no value is given, the property is deleted.

  • returns

    Returns the result of setting the value of the property, or the result of deleting the property.

        $tppx->set( 'property_name' );            # deletes the property property_name
        $tppx->set( 'property_name' => 'val' );   # sets the value of property_name

get

Retrieve the value set for a property in this object instance. This method can only be accessed in object oriented style.

This method was added in version 0.52

  • propertyname

    The property to get the value for

  • returns

    Returns the value of the property requested

        $tppx->get( 'property_name' );

new

Create a new object instances of this module.

  • tpp

    An instance of XML::TreePP to be used instead of letting this module load its own.

  • returns

    An object instance of this module.

        $tppx = new XML::TreePP::XMLPath();

charlexsplit

An analysis method for single character boundary and start/stop tokens

  • string

    The string to analyze

  • boundry_start

    The single character starting boundary separating wanted elements

  • boundry_stop

    The single character stopping boundary separating wanted elements

  • tokens

    A { start_char => stop_char } hash reference of start/stop tokens. The characters in string contained within a start_char and stop_char are not evaluated to match boundaries.

  • boundry_begin

    Provide ``1'' if the beginning of the string should be treated as a boundry_start character.

  • boundry_end

    Provide ``1'' if the ending of the string should be treated as a boundry_stop character.

  • escape_char

    The character that indicates the next character in the string is to be escaped. The default value is the backward slash (\). And example is used in the following string:

        'The Cat\'s Meow'

    Without a recognized escape character, the previous string would fail to be recognized properly.

    This optional parameter was introduced in version 0.70.

  • returns

    An array reference of elements

        $elements = charlexsplit (
                            string         => $string,
                            boundry_start  => $charA,   boundry_stop   => $charB,
                            tokens         => \@tokens,
                            boundry_begin  => $char1,   boundry_end    => $char2 );

parseXMLPath

Parse a string that represents the XMLPath to a XML element or attribute in a XML::TreePP parsed XML Document.

Note that the XML attributes, known as ``@attr'' are transformed into ``-attr''. The preceding (-) minus in place of the (@) at is the recognized format of attributes in the XML::TreePP module.

Being that this is intended to be a submodule of XML::TreePP, the format of '@attr' is converted to '-attr' to conform with how XML::TreePP handles attributes.

See: XML::TreePP-set( attr_prefix => '@' )> for more information. This module supports the default format, '-attr', of attributes. But this can be changed by setting the 'attr_prefix' property in the internally referenced XML::TreePP object using the set() method in object oriented programming. Example:

    my $tppx = new XML::TreePP::XMLPath();
    $tppx->tpp->set( attr_prefix => '@' );

XMLPath Filter by index and existence Also, as of version 0.52, there are two additional types of XMLPaths understood.

XMLPath with indexes, which is similar to the way XPath does it

    $path = '/books/book[5]';

This defines the fifth book in a list of book elements under the books root. When using this to get the value, the 5th book is returned. When using this to test an element, there must be 5 or more books to return true.

XMLPath by existence, which is similar to the way XPath does it

    $path = '/books/book[author]';

This XMLPath represents all book elements under the books root which have 1 or more author child element. It does not evaluate if the element or attribute to evaluate has a value. So it is a test for existence of the element or attribute.

  • XMLPath

    The XML path to be parsed.

  • returns

    An array reference of array referenced elements of the XMLPath.

        $parsedXMLPath = parseXMLPath( $XMLPath );

assembleXMLPath

Assemble an ARRAY or HASH ref structure representing an XMLPath. This method can be used to construct an XMLPath array ref that has been parsed by the parseXMLPath method.

Note that the XML attributes can be identified as ``-attribute'' or ``@attribute''. When identified as ``-attribute', they are transformed into ''@attribute`` upon assembly. The preceding minus (-) in place of the at (@) is the recognized format of attributes in the XML::TreePP module, though can be changed. See the parseXMLPath method for further information.

This method was added in version 0.70.

  • parsed-XMLPath

    The XML path to be assembled, represented as either an ARRAY or HASH reference.

  • returns

    An XMLPath.

        $XMLPath = assembleXMLPath( $parsedXMLPath );

    or

        my $xmlpath = q{/books/book[5]/cats[@author="The Cat's Meow"]/tigers[meateater]};
        
        my $ppath = $tppx->parseXMLPath($xpath);
        ## $ppath == [['books',undef],['book',[['5',undef]]],['cats',[['-author','The Cat\'s Meow']]],['tigers',[['meateater',undef]]]]
        my $apath = [ 'books', ['book', 5], ['cats',[['@author' => "The Cat's Meow"]]], ['tigers',['meateater']] ];
        my $hpath = { books => { book => { -attrs => [5], cats => { -attrs => [['-author' => "The Cat's Meow"]], tigers => { -attrs => ["meateater"] } } } } };
        
        print "original: ",$xmlpath,"\n";
        print "      re: ",$tppx->assembleXMLPath($ppath),"\n";
        print "   array: ",$tppx->assembleXMLPath($apath),"\n";
        print "    hash: ",$tppx->assembleXMLPath($hpath),"\n";

    output

        original: /books/book[5]/cats[@author="The Cat's Meow"]/tigers[meateater]
              re: /books/book[5]/cats[@author="The Cat's Meow"]/tigers[meateater]
           array: /books/book[5]/cats[@author="The Cat's Meow"]/tigers[meateater]
            hash: /books/book[5]/cats[@author="The Cat's Meow"]/tigers[meateater]

filterXMLDoc

To filter down to a subtree or set of subtrees of an XML document based on a given XMLPath

This method can also be used to determine if a node within an XML tree is valid based on the given filters in an XML path.

This method replaces the two methods getSubtree() and validateAttrValue().

This method was added in version 0.52

  • XMLDocument

    The XML document tree, or subtree node to validate. This is an XML document either given as plain text string, or as parsed by the XML::TreePP-parse()> method.

    The XMLDocument, when parsed, can be an ARRAY of multiple elements to evaluate, which would be validated as follows:

        # when path is: context[@attribute]
        # returning: $subtree[item] if valid (returns all validated [item])
        $subtree[item]->{'-attribute'} exists
        # when path is: context[@attribute="value"]
        # returning: $subtree[item] if valid (returns all validated [item])
        $subtree[item]->{'-attribute'} eq "value"
        $subtree[item]->{'-attribute'}->{'value'} exists
        # when path is: context[5]
        # returning: $subtree[5] if exists (returns the fifth item if validated)
        $subtree['itemnumber']
        # when path is: context[5][element="value"]
        # returning: $subtree[5] if exists (returns the fifth item if validated)
        $subtree['itemnumber']->{'element'} eq "value"
        $subtree['itemnumber']->{'element'}->{'value'} exists

    Or the XMLDocument can be a HASH which would be a single element to evaluate. The XMLSubTree would be validated as follows:

        # when path is: context[element]
        # returning: $subtree if validated
        $subtree{'element'} exists
        # when path is: context[@attribute]
        # returning: $subtree if validated
        $subtree{'-attribute'} eq "value"
        $subtree{'-attribute'}->{'value'} exists
  • XMLPath

    The path within the XML Tree to retrieve. See parseXMLPath()

  • structure => TargetRaw | RootMAP | ParentMAP (optional)

    This optional argument defines the format of the search results to be returned. The default structure is TargetRaw

    TargetRaw - Return references to xml document fragments matching the XMLPath filter. If the matching xml document fragment is a string, then the string is returned as a non-reference.

    RootMap - Return a Map of the entire xml document, a result set (list) of the definitive XMLPath (mapped from the root) to the found targets, which includes: (1) a reference map from root (/) to all matching child nodes (2) a reference to the xml document from root (/) (3) a list of targets as absolute XMLPath strings for the matching child nodes

        { root      => HASHREF,
          path      => '/',
          target    => [ "/nodename[#]/nodename[#]/nodename[#]/targetname" ],
          child     =>
            [{ name => nodename, position => #, child => [{
                [{ name => nodename, position => #, child => [{
                    [{ name => nodename, position => #, target => targetname }]
                }] }]
            }] }]
        }

    ParentMap - Return a Map of the parent nodes to found target nodes in the xml document, which includes: (1) a reference map from each parent node to all matching child nodes (2) a reference to xml document fragments from the parent nodes

        [
        { root      => HASHREF,
          path      => '/nodename[#]/nodename[6]/targetname',
          child => [{ name => nodename, position => 6, target => targetname }]
        },
        { root      => HASHREF,
          path      => '/nodename[#]/nodename[7]/targetname',
          child => [{ name => nodename, position => 7, target => targetname }]
        },
        ]
  • returns

    The parsed XML Document subtrees that are validated, or undef if not validated

    You can retrieve the result set in one of two formats.

        # Option 1 - An ARRAY reference to a list
        my $result = filterXMLDoc( $xmldoc, '/books' );
        # $result is:
        # [ { book => { title => "PERL", subject => "programming" } },
        #   { book => { title => "All About Backpacks", subject => "hiking" } } ]
        
        # Option 2 - A list, or normal array
        my @result = filterXMLDoc( $xmldoc, '/books/book[subject="camping"]' );
        # $result is:
        # ( { title => "campfires", subject => "camping" },
        #   { title => "tents", subject => "camping" } )
        my $result = filterXMLDoc( $XMLDocument , $XMLPath );
        my @result = filterXMLDoc( $XMLDocument , $XMLPath );

getValues

Retrieve the values found in the given XML Document at the given XMLPath.

This method was added in version 0.53 as getValue, and changed to getValues in 0.54

  • XMLDocument

    The XML Document to search and return values from.

  • XMLPath

    The XMLPath to retrieve the values from.

  • valstring => 1 | 0

    Return values that are strings. (default is 1)

  • valxml => 1 | 0

    Return values that are xml, as raw xml. (default is 0)

  • valxmlparsed => 1 | 0

    Return values that are xml, as parsed xml. (default is 0)

  • valtrim => 1 | 0

    Trim off the white space at the beginning and end of each value in the result set before returning the result set. (default is 0)

  • returns

    Returns the values from the XML Document found at the XMLPath.

        # return the value of @author from all book elements
        $vals = $tppx->getValues( $xmldoc, '/books/book/@author' );
        # return the values of the current node, or XML Subtree
        $vals = $tppx->getValues( $xmldoc_node, "." );
        # return only XML data from the 5th book node
        $vals = $tppx->getValues( $xmldoc, '/books/book[5]', valstring => 0, valxml => 1 );
        # return only XML::TreePP parsed XML from the all book nodes having an id attribute
        $vals = $tppx->getValues( $xmldoc, '/books/book[@id]', valstring => 0, valxmlparsed => 1 );
        # return both unparsed XML data and text content from the 3rd book excerpt,
        # and trim off the white space at the beginning and end of each value
        $vals = $tppx->getValues( $xmldoc, '/books/book[3]/excerpt', valstring => 1, valxml => 1, valtrim => 1 );

getAttributes

Retrieve the attributes found in the given XML Document at the given XMLPath.

  • XMLTree

    An XML::TreePP parsed XML document.

  • XMLPath

    The path within the XML Tree to retrieve. See parseXMLPath()

  • returns

    An array reference of [{attribute=>value}], or undef if none found

    In the case where the XML Path points at a multi-same-name element, the return value is a ref array of ref hashes, one hash ref for each element.

    Example Returned Data:

        XML Path points at a single named element
        [ {attr1=>val,attr2=>val} ]
        XML Path points at a multi-same-name element
        [ {attr1A=>val,attr1B=>val}, {attr2A=>val,attr2B=>val} ]
        $attributes = getAttributes ( $XMLTree , $XMLPath );

getElements

Gets the child elements found at a specified XMLPath

  • XMLTree

    An XML::TreePP parsed XML document.

  • XMLPath

    The path within the XML Tree to retrieve. See parseXMLPath()

  • returns

    An array reference of [{element=>value}], or undef if none found

    An array reference of a hash reference of elements (not attributes) and each elements XMLSubTree, or undef if none found. If the XMLPath points at a multi-valued element, then the subelements of each element at the XMLPath are returned as separate hash references in the returning array reference.

    The format of the returning data is the same as the getAttributes() method.

    The XMLSubTree is fetched based on the provided XMLPath. Then all elements found under that XMLPath are placed into a referenced hash table to be returned. If an element found has additional XML data under it, it is all returned just as it was provided.

    Simply, this strips all XML attributes found at the XMLPath, returning the remaining elements found at that path.

    If the XMLPath has no elements under it, then undef is returned instead.

        $elements = getElements ( $XMLTree , $XMLPath );


EXAMPLES

Method: new

It is not necessary to create an object of this module. However, if you choose to do so any way, here is how you do it.

    my $obj = new XML::TreePP::XMLPath;

This module supports being called by two methods.

  1. By importing the functions you wish to use, as in:
        use XML::TreePP::XMLPath qw( function1 function2 );
        function1( args )

    See IMPORTABLE METHODS section for methods available for import

  2. Or by calling the functions in an object oriented manor, as in:
        my $tppx = new XML::TreePP::XMLPath;
        $tppx->function1( args )

Using either method works the same and returns the same output.

Method: charlexsplit

Here are three steps that can be used to parse values out of a string:

Step 1:

First, parse the entire string delimited by the / character.

    my $el = charlexsplit   (
        string        => q{abcdefg/xyz/path[@key='val'][@key2='val2']/last},
        boundry_start => '/',
        boundry_stop  => '/',
        tokens        => [qw( [ ] ' ' " " )],
        boundry_begin => 1,
        boundry_end   => 1
        );
    print Dumper( $el );

Output:

    ["abcdefg", "xyz", "path[\@key='val'][\@key2='val2']", "last"],

Step 2:

Second, parse the elements from step 1 that have key/val pairs, such that each single key/val is contained by the [ and ] characters

    my $el = charlexsplit (
        string        => q( path[@key='val'][@key2='val2'] ),
        boundry_start => '[',
        boundry_stop  => ']',
        tokens        => [qw( ' ' " " )],
        boundry_begin => 0,
        boundry_end   => 0
        );
    print Dumper( $el );

Output:

    ["\@key='val'", "\@key2='val2'"]

Step 3:

Third, parse the elements from step 2 that is a single key/val, the single key/val is delimited by the = character

    my $el = charlexsplit (
        string        => q{ @key='val' },
        boundry_start => '=',
        boundry_stop  => '=',
        tokens        => [qw( ' ' " " )],
        boundry_begin => 1,
        boundry_end   => 1
        );
    print Dumper( $el );

Output:

    ["\@key", "'val'"]

Note that in each example the tokens represent a group of escaped characters which, when analyzed, will be collected as part of an element, but will not be allowed to match any starting or stopping boundry.

So if you have a start token without a stop token, you will get undesired results. This example demonstrate this data error.

    my $el = charlexsplit   (
        string        => q{ path[@key='val'][@key2=val2'] },
        boundry_start => '[',
        boundry_stop  => ']',
        tokens        => [qw( ' ' " " )],
        boundry_begin => 0,
        boundry_end   => 0
        );
    print Dumper( $el );

Undesired output:

    ["\@key='val'"]

In this example of bad data being parsed, the boundry_stop character ] was never matched for the key2=val2 element.

And there is no error message. The charlexsplit method throws away the second element silently due to the token start and stop mismatch.

Method: parseXMLPath

    use XML::TreePP::XMLPath qw(parseXMLPath);
    use Data::Dumper;
    
    my $parsedPath = parseXMLPath(
                                  q{abcdefg/xyz/path[@key1='val1'][key2='val2']/last}
                                  );
    print Dumper ( $parsedPath );

Output:

    [
      ["abcdefg", undef],
      ["xyz", undef],
      ["path", [["-key1", "val1"], ["key2", "val2"]]],
      ["last", undef],
    ]

Method: filterXMLDoc

Filtering an XML Document, using an XMLPath, to find a node within the document.

    #!/usr/bin/perl
    use XML::TreePP;
    use XML::TreePP::XMLPath qw(filterXMLDoc);
    use Data::Dumper;
    #
    # The XML document data
    my $xmldata=<<XMLEND;
        <level1>
            <level2>
                <level3 attr1="val1" attr2="val2">
                    <attr3>val3</attr3>
                    <attr4/>
                    <attrX>one</attrX>
                    <attrX>two</attrX>
                    <attrX>three</attrX>
                </level3>
                <level3 attr1="valOne"/>
            </level2>
        </level1>
    XMLEND
    #
    # Parse the XML document.
    my $tpp = new XML::TreePP;
    my $xmldoc = $tpp->parse($xmldata);
    print "Output Test #1\n";
    print Dumper( $xmldoc );
    #
    # Retrieve the sub tree of the XML document at path "level1/level2"
    my $xmlSubTree = filterXMLDoc($xmldoc, 'level1/level2');
    print "Output Test #2\n";
    print Dumper( $xmlSubTree );
    #
    # Retrieve the sub tree of the XML document at path "level1/level2/level3[@attr1='val1']"
    my $xmlSubTree = filterXMLDoc($xmldoc, 'level1/level2/level3[@attr1="val1"]');
    print "Output Test #3\n";
    print Dumper( $xmlSubTree );

Output:

    Output Test #1
    {
      level1 => {
            level2 => {
                  level3 => [
                        {
                          "-attr1" => "val1",
                          "-attr2" => "val2",
                          attr3    => "val3",
                          attr4    => undef,
                          attrX    => ["one", "two", "three"],
                        },
                        { "-attr1" => "valOne" },
                      ],
                },
          },
    }
    Output Test #2
    {
      level3 => [
            {
              "-attr1" => "val1",
              "-attr2" => "val2",
              attr3    => "val3",
              attr4    => undef,
              attrX    => ["one", "two", "three"],
            },
            { "-attr1" => "valOne" },
          ],
    }
    Output Test #3
    {
      "-attr1" => "val1",
      "-attr2" => "val2",
      attr3    => "val3",
      attr4    => undef,
      attrX    => ["one", "two", "three"],
    }

Validating attribute and value pairs of a given node.

    #!/usr/bin/perl
    use XML::TreePP;
    use XML::TreePP::XMLPath qw(filterXMLDoc);
    use Data::Dumper;
    #
    # The XML document data
    my $xmldata=<<XMLEND;
        <paragraph>
            <sentence language="english">
                <words>Do red cats eat yellow food</words>
                <punctuation>?</punctuation>
            </sentence>
            <sentence language="english">
                <words>Brown cows eat green grass</words>
                <punctuation>.</punctuation>
            </sentence>
        </paragraph>
    XMLEND
    #
    # Parse the XML document.
    my $tpp = new XML::TreePP;
    my $xmldoc = $tpp->parse($xmldata);
    print "Output Test #1\n";
    print Dumper( $xmldoc );
    #
    # Retrieve the sub tree of the XML document at path "paragraph/sentence"
    my $xmlSubTree = filterXMLDoc($xmldoc, "paragraph/sentence");
    print "Output Test #2\n";
    print Dumper( $xmlSubTree );
    #
    my (@params, $validatedSubTree);
    #
    # Test the XML Sub Tree to have an attribute "-language" with value "german"
    @params = (['-language', 'german']);
    $validatedSubTree = filterXMLDoc($xmlSubTree, [ ".", \@params ]);
    print "Output Test #3\n";
    print Dumper( $validatedSubTree );
    #
    # Test the XML Sub Tree to have an attribute "-language" with value "english"
    @params = (['-language', 'english']);
    $validatedSubTree = filterXMLDoc($xmlSubTree, [ ".", \@params ]);
    print "Output Test #4\n";
    print Dumper( $validatedSubTree );

Output:

    Output Test #1
    {
      paragraph => {
            sentence => [
                  {
                    "-language" => "english",
                    punctuation => "?",
                    words => "Do red cats eat yellow food",
                  },
                  {
                    "-language" => "english",
                    punctuation => ".",
                    words => "Brown cows eat green grass",
                  },
                ],
          },
    }
    Output Test #2
    [
      {
        "-language" => "english",
        punctuation => "?",
        words => "Do red cats eat yellow food",
      },
      {
        "-language" => "english",
        punctuation => ".",
        words => "Brown cows eat green grass",
      },
    ]
    Output Test #3
    undef
    Output Test #4
    {
      "-language" => "english",
      punctuation => "?",
      words => "Do red cats eat yellow food",
    }

Method: getAttributes

    #!/usr/bin/perl
    #
    use XML::TreePP;
    use XML::TreePP::XMLPath qw(getAttributes);
    use Data::Dumper;
    #
    # The XML document data
    my $xmldata=<<XMLEND;
        <level1>
            <level2>
                <level3 attr1="val1" attr2="val2">
                    <attr3>val3</attr3>
                    <attr4/>
                    <attrX>one</attrX>
                    <attrX>two</attrX>
                    <attrX>three</attrX>
                </level3>
                <level3 attr1="valOne"/>
            </level2>
        </level1>
    XMLEND
    #
    # Parse the XML document.
    my $tpp = new XML::TreePP;
    my $xmldoc = $tpp->parse($xmldata);
    print "Output Test #1\n";
    print Dumper( $xmldoc );
    #
    # Retrieve the sub tree of the XML document at path "level1/level2/level3"
    my $attributes = getAttributes($xmldoc, 'level1/level2/level3');
    print "Output Test #2\n";
    print Dumper( $attributes );
    #
    # Retrieve the sub tree of the XML document at path "level1/level2/level3[attr3=""]"
    my $attributes = getAttributes($xmldoc, 'level1/level2/level3[attr3="val3"]');
    print "Output Test #3\n";
    print Dumper( $attributes );

Output:

    Output Test #1
    {
      level1 => {
            level2 => {
                  level3 => [
                        {
                          "-attr1" => "val1",
                          "-attr2" => "val2",
                          attr3    => "val3",
                          attr4    => undef,
                          attrX    => ["one", "two", "three"],
                        },
                        { "-attr1" => "valOne" },
                      ],
                },
          },
    }
    Output Test #2
    [{ attr1 => "val1", attr2 => "val2" }, { attr1 => "valOne" }]
    Output Test #3
    [{ attr1 => "val1", attr2 => "val2" }]

Method: getElements

    #!/usr/bin/perl
    #
    use XML::TreePP;
    use XML::TreePP::XMLPath qw(getElements);
    use Data::Dumper;
    #
    # The XML document data
    my $xmldata=<<XMLEND;
        <level1>
            <level2>
                <level3 attr1="val1" attr2="val2">
                    <attr3>val3</attr3>
                    <attr4/>
                    <attrX>one</attrX>
                    <attrX>two</attrX>
                    <attrX>three</attrX>
                </level3>
                <level3 attr1="valOne"/>
            </level2>
        </level1>
    XMLEND
    #
    # Parse the XML document.
    my $tpp = new XML::TreePP;
    my $xmldoc = $tpp->parse($xmldata);
    print "Output Test #1\n";
    print Dumper( $xmldoc );
    #
    # Retrieve the multiple same-name elements of the XML document at path "level1/level2/level3"
    my $elements = getElements($xmldoc, 'level1/level2/level3');
    print "Output Test #2\n";
    print Dumper( $elements );
    #
    # Retrieve the elements of the XML document at path "level1/level2/level3[attr3="val3"]
    my $elements = getElements($xmldoc, 'level1/level2/level3[attr3="val3"]');
    print "Output Test #3\n";
    print Dumper( $elements );

Output:

    Output Test #1
    {
      level1 => {
            level2 => {
                  level3 => [
                        {
                          "-attr1" => "val1",
                          "-attr2" => "val2",
                          attr3    => "val3",
                          attr4    => undef,
                          attrX    => ["one", "two", "three"],
                        },
                        { "-attr1" => "valOne" },
                      ],
                },
          },
    }
    Output Test #2
    [
      { attr3 => "val3", attr4 => undef, attrX => ["one", "two", "three"] },
      undef,
    ]
    Output Test #3
    [
      { attr3 => "val3", attr4 => undef, attrX => ["one", "two", "three"] },
    ]


AUTHOR

Russell E Glaue, http://russ.glaue.org


SEE ALSO

XML::TreePP

XML::TreePP::XMLPath on Codepin: http://www.codepin.org/project/perlmod/XML-TreePP-XMLPath


COPYRIGHT AND LICENSE

Copyright (c) 2008-2013 Russell E Glaue, Center for the Application of Information Technologies, Western Illinois University. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.