FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » CTYPE
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: CTYPE [message #182675 is a reply to message #182673] Mon, 26 August 2013 08:25 Go to previous messageGo to previous message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma:
Senior Member
Fiver wrote:

> Your original question was answered, but it seems you have a few
> misconceptions about regular expressions. If you into learning how to do
> things properly, you shouldn't dismiss them out of hand.
>
> ad 1) "they're never clear to me"
>
> That's a fair point, as regexes can quickly become hard to read. There's
> a way around that, however: the /x modifier. Take for example the regex
> in the post you replied to (just as an example, I'm not judging its
> correctness for now):
>
> preg_match("/^[a-z]\d[a-z] \d[a-z]\d$/i", $postcode)
>
> This can also be written as
>
> $regex = '/
> ^ # anchored at the start of the string
> [a-z] # one letter a-z
> \d # one digit 0-9
> [a-z] # one letter a-z
> \s # one character of white space *
> \d # one digit 0-9
> [a-z] # one letter a-z
> \d # one digit 0-9
> $ # anchored at the end of the string
> /ix';
>
> preg_match($regex, $postcode)

JFTR: This applies to *Perl* and *Perl-compatible* regular expressions
(PCRE) only. PHP also supports POSIX Extended REGular expressions (ERE)
that do not have that feature. However, the POSIX ERE functions (ereg*())
are deprecated since PHP 5.3 in favor of the PCRE functions (preg_*()).

PHP also supports commenting in PCRE like so:

$regex = '/'
. '^(?# anchored at the start of the string)'
. '[a-z](?# one letter a-z)'
. '\\d(?# one digit 0-9)'
. '[a-z](?# one letter a-z)'
. '\\s(?# one character of white space)'
. '\\d(?# one digit 0-9)'
. '[a-z](?# one letter a-z)'
. '\\d(?# one digit 0-9)'
. '$(?# anchored at the end of the string)
. '/ix';

<http://php.net/manual/en/regexp.reference.comments.php>

(In this case, I would pick the PCRE_EXTENDED variant, too. “(?#…)” is
better suited for short inline comments.)


Another way, which can be combined with either one, is to use another
delimiter. For example, attempting to match an “http:” URI with

'/http:\\/\\/(?:[^\\/\\s]+)\\/?/'

is safe, but rather hard to read. It can be written better readable as

'#http://(?:[^/\\s]+)/?#'

or, with PCRE_EXTENDED enabled,

'! http:// (?:[^/\\s]+) /? !x'

This can be further simplified to

'! http:// (?: .+?) /? !x'

and even inline-commented as suggested above (“#” changed to “!” because of
unescaped “#” delimiting single-line inline comments).

> […] a more exact way to match "one space character" would be "[ ]").

Or simply " ". The character class has the slight advantage over the simple
assertion that it does not matter how many spaces you write between the
brackets. However, character class building might prove to be more
expensive than the simple assertion, and even if it looks as if there are
only spaces in-between, a tab character or another non-breaking whitespace
character could have slipped in – a bug waiting to happen that is easily
spotted with the simple assertion (it just does not match then).

> ad 2) "they're too slow"
>
> That may have been true in other languages and many, many years ago, but
> it's certainly not the case with PHP today. This regex runs over 2
> million times per second on a single core of my middle-aged laptop. It
> should be more than fast enough, considering I could check every
> Canadian's zip code in under 15 seconds with it... I don't know what
> you're building, but I can guarantee that this won't be the reason for
> any noticeable slow-downs.

It is still true that regular expressions are (or can be) comparably
expensive with regard to runtime and memory usage, which is why some (non-
PHP) runtime environments disable them by default for security reasons.
However, if you are using regular expressions for *pattern* matching, the
program will likely run faster than if you had used plain string operations.
Because the native machine code (of PHP) has been optimized for that
purpose.

(Rule of thumb: Explicitly prevent regular expressions from becoming too
greedy even though in the end they would not match more than they should.
And do not capture what you do not want to process. It is *unnecessary*
backtracking and capturing that makes regular expressions more expensive
than they could be.)

For example,

preg_match('/foo/', 'foobar') > 0

has no advantage over

strpos('foobar', 'foo') !== false

But

preg_match('/\\bfoo/u', 'foobar') > 0

has as it matches “foo” only at the *start* of a *Unicode* word.

It should also be noted that (Perl-compatible) regular expressions are
included in one of the ten areas you need to be knowledgable in (“Strings &
Patterns”), in order to pass the Zend Certified Engineer PHP 5.3 test:

<http://www.zend.com/services/certification/php-5-certification/>


PointedEars
--
When all you know is jQuery, every problem looks $(olvable).
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Re: korean character sets
Next Topic: Android app Developers requirements
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Nov 27 10:27:13 GMT 2024

Total time taken to generate the page: 0.04091 seconds