FUDforum: comp.lang.php

Home » Imported messages » comp.lang.php » CTYPE

Show: Today's Messages :: Polls :: Message Navigator

Re: CTYPE [message #182675 is a reply to message #182673]

Mon, 26 August 2013 08:25

Thomas 'PointedEars'
Messages: 701
Registered: October 2010

Karma:

Senior Member

Fiver wrote:

> Your original question was answered, but it seems you have a few
> misconceptions about regular expressions. If you into learning how to do
> things properly, you shouldn't dismiss them out of hand.
>
> ad 1) "they're never clear to me"
>
> That's a fair point, as regexes can quickly become hard to read. There's
> a way around that, however: the /x modifier. Take for example the regex
> in the post you replied to (just as an example, I'm not judging its
> correctness for now):
>
> preg_match("/^[a-z]\d[a-z] \d[a-z]\d$/i", $postcode)
>
> This can also be written as
>
> $regex = '/
> ^ # anchored at the start of the string
> [a-z] # one letter a-z
> \d # one digit 0-9
> [a-z] # one letter a-z
> \s # one character of white space *
> \d # one digit 0-9
> [a-z] # one letter a-z
> \d # one digit 0-9
> $ # anchored at the end of the string
> /ix';
>
> preg_match($regex, $postcode)

JFTR: This applies to *Perl* and *Perl-compatible* regular expressions
(PCRE) only. PHP also supports POSIX Extended REGular expressions (ERE)
that do not have that feature. However, the POSIX ERE functions (ereg*())
are deprecated since PHP 5.3 in favor of the PCRE functions (preg_*()).

PHP also supports commenting in PCRE like so:

$regex = '/'
. '^(?# anchored at the start of the string)'
. '[a-z](?# one letter a-z)'
. '\\d(?# one digit 0-9)'
. '[a-z](?# one letter a-z)'
. '\\s(?# one character of white space)'
. '\\d(?# one digit 0-9)'
. '[a-z](?# one letter a-z)'
. '\\d(?# one digit 0-9)'
. '$(?# anchored at the end of the string)
. '/ix';

<http://php.net/manual/en/regexp.reference.comments.php>

(In this case, I would pick the PCRE_EXTENDED variant, too. “(?#…)” is
better suited for short inline comments.)

Another way, which can be combined with either one, is to use another
delimiter. For example, attempting to match an “http:” URI with

'/http:\\/\\/(?:[^\\/\\s]+)\\/?/'

is safe, but rather hard to read. It can be written better readable as

'#http://(?:[^/\\s]+)/?#'

or, with PCRE_EXTENDED enabled,

'! http:// (?:[^/\\s]+) /? !x'

This can be further simplified to

'! http:// (?: .+?) /? !x'

and even inline-commented as suggested above (“#” changed to “!” because of
unescaped “#” delimiting single-line inline comments).

> […] a more exact way to match "one space character" would be "[ ]").

Or simply " ". The character class has the slight advantage over the simple
assertion that it does not matter how many spaces you write between the
brackets. However, character class building might prove to be more
expensive than the simple assertion, and even if it looks as if there are
only spaces in-between, a tab character or another non-breaking whitespace
character could have slipped in – a bug waiting to happen that is easily
spotted with the simple assertion (it just does not match then).

> ad 2) "they're too slow"
>
> That may have been true in other languages and many, many years ago, but
> it's certainly not the case with PHP today. This regex runs over 2
> million times per second on a single core of my middle-aged laptop. It
> should be more than fast enough, considering I could check every
> Canadian's zip code in under 15 seconds with it... I don't know what
> you're building, but I can guarantee that this won't be the reason for
> any noticeable slow-downs.

It is still true that regular expressions are (or can be) comparably
expensive with regard to runtime and memory usage, which is why some (non-
PHP) runtime environments disable them by default for security reasons.
However, if you are using regular expressions for *pattern* matching, the
program will likely run faster than if you had used plain string operations.
Because the native machine code (of PHP) has been optimized for that
purpose.

(Rule of thumb: Explicitly prevent regular expressions from becoming too
greedy even though in the end they would not match more than they should.
And do not capture what you do not want to process. It is *unnecessary*
backtracking and capturing that makes regular expressions more expensive
than they could be.)

For example,

preg_match('/foo/', 'foobar') > 0

has no advantage over

strpos('foobar', 'foo') !== false

But

preg_match('/\\bfoo/u', 'foobar') > 0

has as it matches “foo” only at the *start* of a *Unicode* word.

It should also be noted that (Perl-compatible) regular expressions are
included in one of the ten areas you need to be knowledgable in (“Strings &
Patterns”), in order to pass the Zend Certified Engineer PHP 5.3 test:

<http://www.zend.com/services/certification/php-5-certification/>

PointedEars
--
When all you know is jQuery, every problem looks $(olvable).

Report message to a moderator

[Message index]

		CTYPE By: bill on Sun, 25 August 2013 17:51
		Re: CTYPE By: Richard Yates on Sun, 25 August 2013 18:21
		Re: CTYPE By: bill on Sun, 25 August 2013 19:00
		Re: CTYPE By: Richard Yates on Sun, 25 August 2013 19:05
		Re: CTYPE By: J.O. Aho on Sun, 25 August 2013 18:38
		Re: CTYPE By: bill on Sun, 25 August 2013 19:28
		Re: CTYPE By: Richard Yates on Sun, 25 August 2013 20:16
		Re: CTYPE By: bill on Mon, 26 August 2013 15:49
		Re: CTYPE By: Christoph Michael Bec on Sun, 25 August 2013 20:19
		Re: CTYPE By: bill on Mon, 26 August 2013 16:15
		Re: CTYPE By: Fiver on Sun, 25 August 2013 20:42
		Re: CTYPE By: Thomas 'PointedEars' on Mon, 26 August 2013 08:25
		Re: CTYPE By: The Natural Philosoph on Mon, 26 August 2013 08:29
		Re: CTYPE By: bill on Mon, 26 August 2013 16:37
		Re: CTYPE By: Norman Peelman on Sun, 25 August 2013 19:11
		Re: CTYPE By: Tim Streater on Sun, 25 August 2013 22:04
		Re: CTYPE THREAD RESOLVED By: bill on Sun, 25 August 2013 19:35
		Re: CTYPE THREAD RESOLVED By: Fiver on Mon, 26 August 2013 18:07
		Re: CTYPE THREAD RESOLVED By: The Natural Philosoph on Mon, 26 August 2013 19:19
		Re: CTYPE THREAD RESOLVED By: Fiver on Mon, 26 August 2013 20:51
		Re: CTYPE THREAD RESOLVED By: Jerry Stuckle on Mon, 26 August 2013 20:57
		Re: CTYPE THREAD RESOLVED By: bill on Tue, 27 August 2013 21:20
		Re: CTYPE THREAD RESOLVED By: Jerry Stuckle on Tue, 27 August 2013 23:47
		Re: CTYPE THREAD RESOLVED By: Tim Streater on Mon, 26 August 2013 21:07
		Re: CTYPE THREAD RESOLVED By: bill on Wed, 28 August 2013 12:49
		Re: CTYPE THREAD RESOLVED By: bill on Tue, 27 August 2013 21:18

Previous Topic:	Re: korean character sets
Next Topic:	Android app Developers requirements

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Nov 07 00:22:25 GMT 2024

Total time taken to generate the page: 0.04044 seconds