HTML::Element - Class for objects that represent HTML
       elements


SYNOPSIS

        require HTML::Element;
        $a = new HTML::Element 'a', href => 'http://www.oslonett.no/';
        $a->push_content("Oslonett AS");

        $tag = $a->tag;
        $tag = $a->starttag;
        $tag = $a->endtag;
        $ref = $a->attr('href');

        $links = $a->extract_links();

        print $a->as_HTML;



DESCRIPTION

       Objects of the HTML::Element class can be used to
       represent elements of HTML.  These objects have attributes
       and content.  The content is an array of text segments and
       other HTML::Element objects.  Thus a tree of HTML::Element
       objects as nodes can represent the syntax tree for a HTML
       document.

       The following methods are available:

       $h = HTML::Element->new('tag', 'attrname' => 'value',...)
           The object constructor.  Takes a tag name as argument.
           Optionally, allows you to specify initial attributes
           at object creation time.

       $h->tag()
           Returns (optionally sets) the tag name for the
           element.  The tag is always converted to lower case.

       $h->starttag()
           Returns the complete start tag for the element.
           Including leading "<", trailing ">" and attributes.

       $h->endtag()
           Returns the complete end tag.  Includes leading "</"
           and the trailing ">".

       $h->parent([$newparent])
           Returns (optionally sets) the parent for this element.

       $h->implicit([$bool])
           Returns (optionally sets) the implicit attribute.
           This attribute is used to indicate that the element
           was not originally present in the source, but was
           Returns true if this tag is contained inside one of
           the specified tags.

       $h->pos()
           Returns (and optionally sets) the current position.
           The position is a reference to a HTML::Element object
           that is part of the tree that has the current object
           as root.  This restriction is not enforced when
           setting pos(), but unpredictable things will happen if
           this is not true.

       $h->attr('attr', [$value])
           Returns (and optionally sets) the value of some
           attribute.

       $h->content()
           Returns the content of this element.  The content is
           represented as a reference to an array of text
           segments and references to other HTML::Element
           objects.

       $h->is_empty()
           Returns true if there is no content.

       $h->insert_element($element, $implicit)
           Inserts a new element at current position and updates
           pos() to point to the inserted element.  Returns
           $element.

       $h->push_content($element_or_text,...)
           Adds to the content of the element.  The content
           should be a text segment (scalar) or a reference to a
           HTML::Element object.

       $h->delete_content()
           Clears the content.

       $h->delete()
           Frees memory associated with the element and all
           children.  This is needed because perl's reference
           counting does not work since we use circular
           references.

       $h->traverse(\&callback, [$ignoretext])
           Traverse the element and all of its children.  For
           each node visited, the callback routine is called with
           the node, a startflag and the depth as arguments.  If
           the $ignoretext parameter is true, then the callback
           will not be called for text content.  The flag is 1
           when we enter a node and 0 when we leave the node.

           If the returned value from the callback is false then
           Returns links found by traversing the element and all
           of its children.  The return value is a reference to
           an array.  Each element of the array is an array with
           2 values; the link value and a reference to the
           corresponding element.

           You might specify that you just want to extract some
           types of links.  For instance if you only want to
           extract <a href="..."> and <img src="..."> links you
           might code it like this:

             for (@{ $e->extract_links(qw(a img)) }) {
                 ($link, $linkelem) = @$_;
                 ...
             }


       $h->dump()
           Prints the element and all its children to STDOUT.
           Mainly useful for debugging.  The structure of the
           document is shown by indentation (no end tags).

       $h->as_HTML()
           Returns a string (the HTML document) that represents
           the element and its children.


BUGS

       If you want to free the memory assosiated with a tree
       built of HTML::Element nodes then you will have to delete
       it explicitly.  The reason for this is that perl currently
       has no proper garbage collector, but depends on reference
       counts in the objects.  This scheme fails because the
       parse tree contains circular references (parents have
       references to their children and children have a reference
       to their parent).


SEE ALSO

       the HTML::AsSubs manpage


COPYRIGHT

       Copyright 1995-1997 Gisle Aas.

       This library is free software; you can redistribute it
       and/or modify it under the same terms as Perl itself.