DOMDocument HTML problem [message #177104] |
Tue, 21 February 2012 19:00 |
Aaron Gray
Messages: 4 Registered: August 2011
Karma: 0
|
Junior Member |
|
|
Hi, I am trying to take an incomplete "summary" of a number of characters
of an HTML fragment without DOCTYPE, HTML, or BODY elements and a possibly
incomplete unbalanced fragment, and convert it to a ballanced fragment
without DOCTYPE, HTML, or BODY, using DOMDocument and friends.
This is what I have got (it works without a Error 500 using php command line
tool :-
~~~~
<?php
$html = '<div>
<p><a href="#test">foo</a></p>
<hr>
<br>
<div>name</div>
';
$dom = new DOMDocument();
$newdom = new DOMDocument();
$dom->loadHTML($html);
$Elements = $dom->childNodes;
foreach ( $Elements as $Element ) {
$NewElement = $newdom->createElement($Element->nodeName);
if ($Element->attributes)
foreach($Element->attributes as $attribute)
$NewElement->setAttribute($attribute->name, $attribute->value);
if ($Element->childNodes)
foreach($Element->childNodes as $child)
$NewElement->appendChild( $newdom->importNode($child, true));
$newdom->appendChild( $NewElement);
}
echo $newdom->saveHTML();
?>
~~~~
It balances the unbalanced <div> seems to be adding a duplicate
<HTML></HTML> at the beginning of the output.
Also I need to get the fragment of the code like a JavaScript DOM innerHTML
without the <HTML> and <BODY> tags.
Many thanks in advance.
Hope you can help,
Aaron
|
|
|
Re: DOMDocument HTML problem [message #177105 is a reply to message #177104] |
Tue, 21 February 2012 19:14 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 2/21/2012 2:00 PM, Aaron Gray wrote:
> Hi, I am trying to take an incomplete "summary" of a number of
> characters of an HTML fragment without DOCTYPE, HTML, or BODY elements
> and a possibly incomplete unbalanced fragment, and convert it to a
> ballanced fragment without DOCTYPE, HTML, or BODY, using DOMDocument and
> friends.
>
> This is what I have got (it works without a Error 500 using php command
> line tool :-
>
> ~~~~
> <?php
>
> $html = '<div>
> <p><a href="#test">foo</a></p>
> <hr>
> <br>
> <div>name</div>
> ';
>
> $dom = new DOMDocument();
> $newdom = new DOMDocument();
>
> $dom->loadHTML($html);
>
> $Elements = $dom->childNodes;
>
> foreach ( $Elements as $Element ) {
>
> $NewElement = $newdom->createElement($Element->nodeName);
>
> if ($Element->attributes)
> foreach($Element->attributes as $attribute)
> $NewElement->setAttribute($attribute->name, $attribute->value);
>
> if ($Element->childNodes)
> foreach($Element->childNodes as $child)
> $NewElement->appendChild( $newdom->importNode($child, true));
>
> $newdom->appendChild( $NewElement);
> }
>
> echo $newdom->saveHTML();
> ?>
> ~~~~
>
> It balances the unbalanced <div> seems to be adding a duplicate
> <HTML></HTML> at the beginning of the output.
>
> Also I need to get the fragment of the code like a JavaScript DOM
> innerHTML without the <HTML> and <BODY> tags.
>
> Many thanks in advance.
>
> Hope you can help,
>
> Aaron
>
>
>
If your document is not well formed, DOMDocument has to guess at what it
the document is supposed to represent. Results are likely to be rather
indeterminate.
This doesn't mean you need DOCTYPE, <head>, <body>, etc., but things
like unbalanced <div> tags are likely to cause problems.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
Re: DOMDocument HTML problem [message #177106 is a reply to message #177104] |
Tue, 21 February 2012 20:08 |
Aaron Gray
Messages: 4 Registered: August 2011
Karma: 0
|
Junior Member |
|
|
"Aaron Gray" wrote in message news:9qi7vtFfmvU1(at)mid(dot)individual(dot)net...
Hi, I am trying to take an incomplete "summary" of a number of characters
of an HTML fragment without DOCTYPE, HTML, or BODY elements and a possibly
incomplete unbalanced fragment, and convert it to a ballanced fragment
without DOCTYPE, HTML, or BODY, using DOMDocument and friends.
~~~
<?php
$html = '<div><div><span>
<p><a href="#test">foo</a></p>
<hr>
<br>
<span>name</span>
';
$html = trim( preg_replace( '/\s\s+/', '', $html));
echo $html . "\n\n";
$dom= new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$body = $xpath->query('/html/body/*');
echo $dom->saveXml($body->item(0));
?>
~~~
This does the job !
Aaron
|
|
|