FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Thorny string translation issue.
Show: Today's Messages :: Unread Messages :: Show Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
Thorny string translation issue. [message #180782] Mon, 18 March 2013 19:24 Go to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 993
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
The problem.

I want the user to enter raw text and embedded HTML into a database. No
trouble with getting it in, but when getting it out, it needs massaging
for display..

Now I cant find a function in the library that will skip html styles
like <b>...</b> or <a...></a> code, and yet still tranlsatee e.g. '>' to
'&gt;' etc etc.

Anyone done this before?


--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
Re: Thorny string translation issue. [message #180786 is a reply to message #180782] Mon, 18 March 2013 20:26 Go to previous messageGo to next message
Christoph Becker is currently offline  Christoph Becker
Messages: 91
Registered: June 2012
Karma: 0
Member
add to buddy list
ignore all messages by this user
The Natural Philosopher wrote:

> I want the user to enter raw text and embedded HTML into a database. No
> trouble with getting it in, but when getting it out, it needs massaging
> for display..
>
> Now I cant find a function in the library that will skip html styles
> like <b>...</b> or <a...></a> code, and yet still tranlsatee e.g. '>' to
> '&gt;' etc etc.

You can strip the tags with strip_tags() (note the $allowable_tags
parameter) and later escape the HTML special characters with
htmlspecialchars() (note the $encoding parameter, which may be necessary
to prevent UTF-7 injection attacks). Simplified:

>>> $str='alpha <b>beta < gamma</b> delta';
>>> htmlspecialchars(strip_tags($str));
'alpha beta &lt; gamma delta'

--
Christoph M. Becker
Re: Thorny string translation issue. [message #180787 is a reply to message #180782] Mon, 18 March 2013 20:40 Go to previous messageGo to next message
SwissCheese is currently offline  SwissCheese
Messages: 17
Registered: December 2012
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
On 03/18/2013 07:24 PM, The Natural Philosopher wrote:
>
>
> The problem.
>
> I want the user to enter raw text and embedded HTML into a database. No
> trouble with getting it in, but when getting it out, it needs massaging
> for display..
>
> Now I cant find a function in the library that will skip html styles
> like <b>...</b> or <a...></a> code, and yet still tranlsate e.g. '>' to
> '&gt;' etc etc.
>
> Anyone done this before?
>
>

http://www.php.net/manual/en/function.htmlspecialchars.php#101592

--
Norman
Registered Linux user #461062
-Have you been to www.php.net yet?-
Re: Thorny string translation issue. [message #180795 is a reply to message #180786] Tue, 19 March 2013 05:33 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 993
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 19/03/13 00:26, Christoph Becker wrote:
> The Natural Philosopher wrote:
>
>> I want the user to enter raw text and embedded HTML into a database. No
>> trouble with getting it in, but when getting it out, it needs massaging
>> for display..
>>
>> Now I cant find a function in the library that will skip html styles
>> like <b>...</b> or <a...></a> code, and yet still tranlsatee e.g. '>' to
>> '&gt;' etc etc.
>
> You can strip the tags with strip_tags() (note the $allowable_tags
> parameter) and later escape the HTML special characters with
> htmlspecialchars() (note the $encoding parameter, which may be necessary
> to prevent UTF-7 injection attacks). Simplified:
how do I put the tags I want back though.


Specifically consider the example

<b> <= </b>

where I want to display a bold 'arrow'

strip tags will just remove the tags I want.

But htmlspecialchars simply turns all tags into (displayed ) raw html.


>
>>>> $str='alpha <b>beta < gamma</b> delta';
>>>> htmlspecialchars(strip_tags($str));
> 'alpha beta &lt; gamma delta'
>

yes, but that's not the output I want.




--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
Re: Thorny string translation issue. [message #180796 is a reply to message #180787] Tue, 19 March 2013 05:38 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 993
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 19/03/13 00:40, SwissCheese wrote:
> On 03/18/2013 07:24 PM, The Natural Philosopher wrote:
>>
>>
>> The problem.
>>
>> I want the user to enter raw text and embedded HTML into a database. No
>> trouble with getting it in, but when getting it out, it needs massaging
>> for display..
>>
>> Now I cant find a function in the library that will skip html styles
>> like <b>...</b> or <a...></a> code, and yet still tranlsate e.g. '>' to
>> '&gt;' etc etc.
>>
>> Anyone done this before?
>>
>>
>
> http://www.php.net/manual/en/function.htmlspecialchars.php#101592
>
That of course was my first port of call, but it doesn't do the needed job.

I want a version of that that is intelligent, and knows what are HTML
tags, and leaves those alone, or what are simply isolated < and > chars.

I know a lot of sites use things like [URL] and [B] to encode markup for
this reason I suppose. to a special chars on them and THEN replace [
with < and so on..but then what happens if you actually want to encode a
'[' ?




--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
Re: Thorny string translation issue. [message #180797 is a reply to message #180795] Tue, 19 March 2013 07:45 Go to previous messageGo to next message
BootNic is currently offline  BootNic
Messages: 10
Registered: November 2010
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
In article <ki9bcd$gu7$1(at)news(dot)albasani(dot)net>, The Natural Philosopher
<tnp(at)invalid(dot)invalid> wrote:

>> The Natural Philosopher wrote:

>>> I want the user to enter raw text and embedded HTML into a database.
>>> No trouble with getting it in, but when getting it out, it needs
>>> massaging for display..

>>> Now I cant find a function in the library that will skip html styles
>>> like <b>...</b> or <a...></a> code, and yet still tranlsatee e.g. '>'
>>> to '&gt;' etc etc.

[snip]

> Specifically consider the example

> <b> <= </b>

> where I want to display a bold 'arrow'

http://php.net/manual/en/book.tidy.php

function tUp($content) {
$config = array(
'indent' => 0,
'output-html' => 1,
'wrap' => 0,
'doctype' => 'strict',
'sort-attributes' => 'alpha',
'output-encoding' => 'utf8',
'char-encoding' => 'utf8'
);
if (extension_loaded("tidy")) {

$tidy = new tidy;
$tidy->parseString($content, $config, 'utf8');
$tidy->cleanRepair();
$body = $tidy->Body();
$content = $body->value;
$content = trim(preg_replace("`<(/)?body>`", "", $content));

}
return $content;
}
print tUp('<b> <= </b>');

[snip]



--
BootNic Tue Mar 19, 2013 07:45 am
The only thing wrong with immortality is that it tends to go on forever.
*Herb Caen*
Re: Thorny string translation issue. [message #180798 is a reply to message #180782] Tue, 19 March 2013 09:01 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 3/18/2013 7:24 PM, The Natural Philosopher wrote:
>
>
> The problem.
>
> I want the user to enter raw text and embedded HTML into a database. No
> trouble with getting it in, but when getting it out, it needs massaging
> for display..
>
> Now I cant find a function in the library that will skip html styles
> like <b>...</b> or <a...></a> code, and yet still tranlsatee e.g. '>' to
> '&gt;' etc etc.
>
> Anyone done this before?
>
>

Yup. Pay someone who knows how to program to do it. It's well beyond
your capabilities.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: Thorny string translation issue. [message #180801 is a reply to message #180782] Tue, 19 March 2013 08:58 Go to previous messageGo to next message
Peter H. Coffin is currently offline  Peter H. Coffin
Messages: 245
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On Mon, 18 Mar 2013 23:24:31 +0000, The Natural Philosopher wrote:
>
>
> The problem.
>
> I want the user to enter raw text and embedded HTML into a database. No
> trouble with getting it in, but when getting it out, it needs massaging
> for display..
>
> Now I cant find a function in the library that will skip html styles
> like <b>...</b> or <a...></a> code, and yet still tranlsatee e.g. '>' to
> '&gt;' etc etc.
>
> Anyone done this before?

Sorry to say, either it's HTML or it's not and bare anglebraces are
not HTML. There's no way to reliably tokenize only some angles without
serious context analysis, and even a reasonably complex regular
expression isn't going to be able to get EVERYTHING right. Your only
path that I see is to load a DTD and essentially pre-render the document
in memory to handle the real tags, convert everthing that's left to
entities, and then compare/build the two copies, taking the tags from
the raw file and the entities from the rendered-and-converted one. And
even that could screw things up at least differently from some other
means of rendering things, so results could vary.

--
Surely the 98% of DNA we share with monkeys must be enough to stop
people from sinking this low.
-- Frossie
Re: Thorny string translation issue. [message #180810 is a reply to message #180795] Tue, 19 March 2013 19:56 Go to previous message
Christoph Becker is currently offline  Christoph Becker
Messages: 91
Registered: June 2012
Karma: 0
Member
add to buddy list
ignore all messages by this user
The Natural Philosopher wrote:

> yes, but that's not the output I want.

Sorry, I had misunderstood your request. If you want to keep all
(X)HTML tags and replace special characters not belonging to tags,
BootNic's solution using tidy is appropriate. But be aware that tidy
may not filter possibly malicious user input:

>>> tUp('<img src="" onclick="doSomethingBad()">')
'<img src="" onclick="doSomethingBad()">'

If you need this, you may have a look at <http://htmlpurifier.org/>.

--
Christoph M. Becker
Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: Hot list for BA/QA- Load Runner, Software Tester, Business Analyst/ Project Coordinator & SAP Business Object Developer
Next Topic: split array and string from string (trust me it will make sense when you read)
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Dec 16 10:05:46 EST 2017

Total time taken to generate the page: 0.00753 seconds