Re: PHP functions to convert markup efficiently [message #183877 is a reply to message #183811] |
Sat, 23 November 2013 19:21 |
James Harris
Messages: 11 Registered: November 2013
Karma:
|
Junior Member |
|
|
"James Harris" <james(dot)harris(dot)1(at)gmail(dot)com> wrote in message
news:l6ljfj$fue$1(at)dont-email(dot)me...
> I am looking for a way to mark up text in a way that PHP would be able to
> efficiently and quickly convert to HTML.
In case anyone is interested, here is what I have come up with so far.
The markup is designed to be fast to parse rather than to be beautiful.
However, it doesn't look too bad, IMO. I'll explain the markup first and
then the PHP which carries out the conversion.
This is very much experimental at this stage. I may well have to change any
of this including the tag formats. But it is working code as it stands.
There are simple tags which have a one-to-one translation. Here are some
examples. The markup is on the left and what it translates to on the right.
@(hr) --> <hr>
@(b) --> <b>
@(/b) --> </b>
@(nl) --> <br>
@(at) --> @
For example, "Please @(b)STOP@(/b) here" will print STOP in bold and the
rest non-bold.
There are markup tags with simple parameters such as these.
@(h,2) --> <h2>
@(/h,2) --> </h2>
And there are tags which are more inclusive such as these.
@(sect,2,Section X) --> <h2>Section X</h2>
@(link,Local Page) --> <a href="Local Page">Local Page</a>
@(link,http://xe.com,XE) --> <a href="http://xe.com">XE</a>
As you can see, a markup tag is identified by an @ sign followed by an
opening delimiter. The opening delimiter is "(" in all the above cases but
could be a different character. Each opening delimiter character has a
corresponding closing delimiter. For most punctuation characters the closing
delimiter is the same as the opening delimiter but for pairable bracket
characters the logical closing bracket is used. Therefore the following all
mean the same.
@(i)
@[i]
@|i|
@*i*
The point is that the person writing the code can and must choose a closing
delimiter that does not appear in the text between the delimiters. This is
to help recognition speed; the complete tag can be isolated without needing
to consider context such as quoted strings.
I haven't performed timing comparisons but I took Christoph's advice for
speed and chose to use PHPs inbuilt functions which are likely written in C.
I try to avoid calling them repeatedly so as to avoid call overhead. As a
result, the markup parsing works as follows. Feel free to criticise.
First, the page of marked-up text has htmlspecialchars() applied and then is
split on @ symbols using a single call to PHP's explode(). This creates an
array of strings which, for the sake of something to name them, I call
Sections. The PHP code is, in essence, as follows.
$contents = file_get_contents($target_page);
$contents = htmlspecialchars($contents, ENT_NOQUOTES);
$sects = explode("@", $contents);
$contents = ""; //Original text no longer needed
The first section, $sects[0], is what preceded the first @ sign. It is not
marked up so it is written verbatim and then split off using the following
code.
echo $sects[0];
$sects = array_slice($sects, 1);
Second, for each remaining section the initial character (which followed an
@ sign) is taken as an opening delimiter and a matching closing delimiter is
chosen. Then explode(,,2) is called to split the section into just two
parts: before and after the closing delimiter. The most important part of
that is
$sectparts = explode($delimiter, substr($sect, 1), 2);
This converts each section into two parts: a tag and some text.
Third, so that tag parameters can include whatever is necessary, especially
for where the include commas in quoted strings, I use the CSV module as
follows.
$tagparts = str_getcsv($sectparts[0]);
That divides the complete tag into manageable parts. All that's left is to
deal with each part as in
switch ($tagparts[0]) {
case "at": echo "@"; break;
case "b": echo "<b>"; break;
etc.
Finally, once the tag has been written the following non-tag text is written
with
echo $sectparts[1];
That's it so far. I may have missed something fundamental but so far it
seems to work well. It is simple and flexible and the code is very short. No
need for a complex package. There are a few functions I would rather have
not had to use but PHP seems to require them. In any case, the code avoids
things which might slow it down such as large packages, char-by-char
processing (except, presumably, in the CSV module) and regular expressions.
So it should be fast as it stands.
James
|
|
|