FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » DOMDocument HTML problem
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
DOMDocument HTML problem [message #177104] Tue, 21 February 2012 19:00 Go to next message
Aaron Gray is currently offline  Aaron Gray
Messages: 4
Registered: August 2011
Karma: 0
Junior Member
Hi, I am trying to take an incomplete "summary" of a number of characters
of an HTML fragment without DOCTYPE, HTML, or BODY elements and a possibly
incomplete unbalanced fragment, and convert it to a ballanced fragment
without DOCTYPE, HTML, or BODY, using DOMDocument and friends.

This is what I have got (it works without a Error 500 using php command line
tool :-

~~~~
<?php

$html = '<div>
<p><a href="#test">foo</a></p>
<hr>
<br>
<div>name</div>
';

$dom = new DOMDocument();
$newdom = new DOMDocument();

$dom->loadHTML($html);

$Elements = $dom->childNodes;

foreach ( $Elements as $Element ) {

$NewElement = $newdom->createElement($Element->nodeName);

if ($Element->attributes)
foreach($Element->attributes as $attribute)
$NewElement->setAttribute($attribute->name, $attribute->value);

if ($Element->childNodes)
foreach($Element->childNodes as $child)
$NewElement->appendChild( $newdom->importNode($child, true));

$newdom->appendChild( $NewElement);
}

echo $newdom->saveHTML();
?>
~~~~

It balances the unbalanced <div> seems to be adding a duplicate
<HTML></HTML> at the beginning of the output.

Also I need to get the fragment of the code like a JavaScript DOM innerHTML
without the <HTML> and <BODY> tags.

Many thanks in advance.

Hope you can help,

Aaron
Re: DOMDocument HTML problem [message #177105 is a reply to message #177104] Tue, 21 February 2012 19:14 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
On 2/21/2012 2:00 PM, Aaron Gray wrote:
> Hi, I am trying to take an incomplete "summary" of a number of
> characters of an HTML fragment without DOCTYPE, HTML, or BODY elements
> and a possibly incomplete unbalanced fragment, and convert it to a
> ballanced fragment without DOCTYPE, HTML, or BODY, using DOMDocument and
> friends.
>
> This is what I have got (it works without a Error 500 using php command
> line tool :-
>
> ~~~~
> <?php
>
> $html = '<div>
> <p><a href="#test">foo</a></p>
> <hr>
> <br>
> <div>name</div>
> ';
>
> $dom = new DOMDocument();
> $newdom = new DOMDocument();
>
> $dom->loadHTML($html);
>
> $Elements = $dom->childNodes;
>
> foreach ( $Elements as $Element ) {
>
> $NewElement = $newdom->createElement($Element->nodeName);
>
> if ($Element->attributes)
> foreach($Element->attributes as $attribute)
> $NewElement->setAttribute($attribute->name, $attribute->value);
>
> if ($Element->childNodes)
> foreach($Element->childNodes as $child)
> $NewElement->appendChild( $newdom->importNode($child, true));
>
> $newdom->appendChild( $NewElement);
> }
>
> echo $newdom->saveHTML();
> ?>
> ~~~~
>
> It balances the unbalanced <div> seems to be adding a duplicate
> <HTML></HTML> at the beginning of the output.
>
> Also I need to get the fragment of the code like a JavaScript DOM
> innerHTML without the <HTML> and <BODY> tags.
>
> Many thanks in advance.
>
> Hope you can help,
>
> Aaron
>
>
>

If your document is not well formed, DOMDocument has to guess at what it
the document is supposed to represent. Results are likely to be rather
indeterminate.

This doesn't mean you need DOCTYPE, <head>, <body>, etc., but things
like unbalanced <div> tags are likely to cause problems.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: DOMDocument HTML problem [message #177106 is a reply to message #177104] Tue, 21 February 2012 20:08 Go to previous message
Aaron Gray is currently offline  Aaron Gray
Messages: 4
Registered: August 2011
Karma: 0
Junior Member
"Aaron Gray" wrote in message news:9qi7vtFfmvU1(at)mid(dot)individual(dot)net...

Hi, I am trying to take an incomplete "summary" of a number of characters
of an HTML fragment without DOCTYPE, HTML, or BODY elements and a possibly
incomplete unbalanced fragment, and convert it to a ballanced fragment
without DOCTYPE, HTML, or BODY, using DOMDocument and friends.

~~~
<?php

$html = '<div><div><span>
<p><a href="#test">foo</a></p>
<hr>
<br>
<span>name</span>
';

$html = trim( preg_replace( '/\s\s+/', '', $html));

echo $html . "\n\n";

$dom= new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$body = $xpath->query('/html/body/*');

echo $dom->saveXml($body->item(0));

?>
~~~

This does the job !

Aaron
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: FILTER_SANITIZE_NUMBER_FLOAT non/sense
Next Topic: jailshell and PHP daemon
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sun Nov 24 18:49:41 GMT 2024

Total time taken to generate the page: 0.02373 seconds