FUDforum: comp.lang.php » PHP functions to convert markup efficiently

Home » Imported messages » comp.lang.php » PHP functions to convert markup efficiently

Show: Today's Messages :: Polls :: Message Navigator

Re: PHP functions to convert markup efficiently [message #183877 is a reply to message #183811]

Sat, 23 November 2013 19:21

James Harris
Messages: 11
Registered: November 2013

Karma:

Junior Member

"James Harris" <james(dot)harris(dot)1(at)gmail(dot)com> wrote in message
news:l6ljfj$fue$1(at)dont-email(dot)me...

> I am looking for a way to mark up text in a way that PHP would be able to
> efficiently and quickly convert to HTML.

In case anyone is interested, here is what I have come up with so far.

The markup is designed to be fast to parse rather than to be beautiful.
However, it doesn't look too bad, IMO. I'll explain the markup first and
then the PHP which carries out the conversion.

This is very much experimental at this stage. I may well have to change any
of this including the tag formats. But it is working code as it stands.

There are simple tags which have a one-to-one translation. Here are some
examples. The markup is on the left and what it translates to on the right.

@(hr) --> <hr>
@(b) --> <b>
@(/b) --> </b>
@(nl) --> <br>
@(at) --> @

For example, "Please @(b)STOP@(/b) here" will print STOP in bold and the
rest non-bold.

There are markup tags with simple parameters such as these.

@(h,2) --> <h2>
@(/h,2) --> </h2>

And there are tags which are more inclusive such as these.

@(sect,2,Section X) --> <h2>Section X</h2>
@(link,Local Page) --> <a href="Local Page">Local Page</a>
@(link,http://xe.com,XE) --> <a href="http://xe.com">XE</a>

As you can see, a markup tag is identified by an @ sign followed by an
opening delimiter. The opening delimiter is "(" in all the above cases but
could be a different character. Each opening delimiter character has a
corresponding closing delimiter. For most punctuation characters the closing
delimiter is the same as the opening delimiter but for pairable bracket
characters the logical closing bracket is used. Therefore the following all
mean the same.

@(i)
@[i]
@|i|
@*i*

The point is that the person writing the code can and must choose a closing
delimiter that does not appear in the text between the delimiters. This is
to help recognition speed; the complete tag can be isolated without needing
to consider context such as quoted strings.

I haven't performed timing comparisons but I took Christoph's advice for
speed and chose to use PHPs inbuilt functions which are likely written in C.
I try to avoid calling them repeatedly so as to avoid call overhead. As a
result, the markup parsing works as follows. Feel free to criticise.

First, the page of marked-up text has htmlspecialchars() applied and then is
split on @ symbols using a single call to PHP's explode(). This creates an
array of strings which, for the sake of something to name them, I call
Sections. The PHP code is, in essence, as follows.

$contents = file_get_contents($target_page);
$contents = htmlspecialchars($contents, ENT_NOQUOTES);
$sects = explode("@", $contents);
$contents = ""; //Original text no longer needed

The first section, $sects[0], is what preceded the first @ sign. It is not
marked up so it is written verbatim and then split off using the following
code.

echo $sects[0];
$sects = array_slice($sects, 1);

Second, for each remaining section the initial character (which followed an
@ sign) is taken as an opening delimiter and a matching closing delimiter is
chosen. Then explode(,,2) is called to split the section into just two
parts: before and after the closing delimiter. The most important part of
that is

$sectparts = explode($delimiter, substr($sect, 1), 2);

This converts each section into two parts: a tag and some text.

Third, so that tag parameters can include whatever is necessary, especially
for where the include commas in quoted strings, I use the CSV module as
follows.

$tagparts = str_getcsv($sectparts[0]);

That divides the complete tag into manageable parts. All that's left is to
deal with each part as in

switch ($tagparts[0]) {
case "at": echo "@"; break;
case "b": echo "<b>"; break;
etc.

Finally, once the tag has been written the following non-tag text is written
with

echo $sectparts[1];

That's it so far. I may have missed something fundamental but so far it
seems to work well. It is simple and flexible and the code is very short. No
need for a complex package. There are a few functions I would rather have
not had to use but PHP seems to require them. In any case, the code avoids
things which might slow it down such as large packages, char-by-char
processing (except, presumably, in the CSV module) and regular expressions.
So it should be fast as it stands.

James

Report message to a moderator

[Message index]

		PHP functions to convert markup efficiently By: James Harris on Thu, 21 November 2013 18:30
		Re: PHP functions to convert markup efficiently By: The Natural Philosoph on Thu, 21 November 2013 18:39
		Re: PHP functions to convert markup efficiently By: James Harris on Thu, 21 November 2013 18:57
		Re: PHP functions to convert markup efficiently By: Christoph Michael Bec on Thu, 21 November 2013 19:10
		Re: PHP functions to convert markup efficiently By: James Harris on Thu, 21 November 2013 19:29
		Re: PHP functions to convert markup efficiently By: Christoph Michael Bec on Thu, 21 November 2013 20:18
		Re: PHP functions to convert markup efficiently By: Salvatore on Thu, 21 November 2013 19:03
		Re: PHP functions to convert markup efficiently By: Thomas 'PointedEars' on Thu, 21 November 2013 20:27
		Re: PHP functions to convert markup efficiently By: Arno Welzel on Fri, 22 November 2013 10:33
		Re: PHP functions to convert markup efficiently By: James Harris on Fri, 22 November 2013 12:48
		Re: PHP functions to convert markup efficiently By: Jerry Stuckle on Fri, 22 November 2013 13:53
		Re: PHP functions to convert markup efficiently By: James Harris on Sat, 23 November 2013 06:21
		Re: PHP functions to convert markup efficiently By: Jerry Stuckle on Sat, 23 November 2013 11:59
		Re: PHP functions to convert markup efficiently By: Christoph Michael Bec on Sat, 23 November 2013 12:12
		Re: PHP functions to convert markup efficiently By: Arno Welzel on Fri, 22 November 2013 15:42
		Re: PHP functions to convert markup efficiently By: James Harris on Sat, 23 November 2013 16:38
		Re: PHP functions to convert markup efficiently By: Richard Yates on Sat, 23 November 2013 21:24
		Re: PHP functions to convert markup efficiently By: James Harris on Sat, 23 November 2013 22:07
		Re: PHP functions to convert markup efficiently By: Arno Welzel on Sun, 24 November 2013 16:09
		Re: PHP functions to convert markup efficiently By: James Harris on Sun, 24 November 2013 18:11
		Re: PHP functions to convert markup efficiently By: Arno Welzel on Sun, 24 November 2013 22:18
		Re: PHP functions to convert markup efficiently By: James Harris on Sun, 24 November 2013 23:44
		Re: PHP functions to convert markup efficiently By: Arno Welzel on Mon, 25 November 2013 07:11
		Re: PHP functions to convert markup efficiently By: James Harris on Sat, 23 November 2013 19:21

Previous Topic:	changing iframe source via php
Next Topic:	converting numbers to ascii values

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Nov 26 23:08:38 GMT 2024

Total time taken to generate the page: 0.04348 seconds