FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Zip Codes ctype? Pregmatch?
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Zip Codes ctype? Pregmatch? [message #182648 is a reply to message #182643] Wed, 21 August 2013 12:53 Go to previous messageGo to previous message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma:
Senior Member
Norman Peelman wrote:

> On 08/20/2013 01:27 PM, Twayne wrote:
>> I'm attempting to check for US and Canadian zip codes (postal codes).
>> The US is easy; mostly just be sure it's five numerics and except
>> "00000" and "99999". But Canadian is a different story because:
>> It consists of alternating alpha and numeric characters (AnAnAn) but
>> not the entire alphabet. 8 N.A. English letters are not used, as in
>> DFIOQUW AND Z or put another way, they only use 18 letters in their
>> postal codes.
>> I haven't see a single example in all my research to check if the
>> 1st, 3rd, and 5th characters are alpha and th 2nd, 4th and 6th
>> characters are numeric.
>>
>> I've tried preg_match and strpos without succees, likely due to my own
>> weakness with preg_match, and regex creates an incredibly long statement
>> I'm sure it's not right to put upon the servers; they slow down even my
>> local server XAMPP & PHP 5.3 on win 7.
>>
>> Might anyone have a better method?
>>
>> Or know of any functions anywhere that could be modified to be used?
>
> US Zip code:
> [0-9]{5}(-{0,1}[0-9]{4}){0,1}
^^^^^ ^^^^^^^^^^ ^^^^^
In Perl-Compatible Regular Expressions (PCRE), as also used by PHP's preg_*
functions, the following shorthands are available:

- “*” for “{0,}”
- “?” for “{0,1}”
- “+” for “{1,}”
- “\d” for “[0-9]” (includes more numeric characters in “UTF-8 mode”)

Thus, the above expression can be simplified to

\d{5}(-?\d{4})?

However, the specification above says that “00000” and “99999” are _not_
valid U.S. ZIP codes, so to be exact you cannot just use either “[0-9]{5}”
or “\d{5}”; but you would have to use, for example, a zero-width negative
lookahead:

$possibleZips = array('00000', '00001', '99998', '99999');
foreach ($possibleZips as $possibleZip)
{
preg_match('^(?![09]{5})\\d{5}(?:-?\\d{4})?$', $possibleZip, $matches);
var_dump($possibleZip);
var_dump($matches);
}

(thanks to Anubhava: <http://stackoverflow.com/a/9609624/855543>)

> Canadian zip code (all one line, don't miss the space!):
> ([A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}
^ ^^^^^^^^
> {1}([0-9]{1}[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})

“{1}” is superfluous in all regular expression flavours (in BRE the escaped
variant is superfluous). An expression that matches, matches exactly one
time unless a following quantifier says otherwise.

In a character class expression, ranges are _not_ delimited by comma.
A comma there is a *literal* comma instead (just like most other special
characters lose, and “-” gains meaning), and repetitions are ignored:

[A-C,E,G-H,J-N,P,R-T,V,X,Y]

matches the same strings as

[A-CEG-HJ-NPR-TVXY,]

So unless you want to allow commas in ZIP codes, you need to remove them
from the respective character class.

Thus, the above expression would have to be changed, and can be simplified
to

^(?:[A-CEG-HJ-NPR-TVXY]\d){3}$

(The “^” makes sure that the second, fourth, aso. character must be a digit.
Let \s* follow it if you want to allow leading whitespace. Likewise for “$”
and trailing whitespace.)

Anyhow, if an expression is repeated, and this repetition cannot be handled
with a quantifier like above, in programming languages like PHP that allow
this, code is easier readable if you assign the repeated expression to a
variable, and have the variable reference expanded:

$cdn_letter = '[A-CEG-HJ-NPR-TVXY]';
$pattern = "^{$cdn_letter}\\d{$cdn_letter}\\d{$cdn_letter}\\d\$";

[In certain programming languages, libraries like my JSX:regexp.js [1] are
useful that allow you to define and use your own character class escape
sequences, eliminating the need for variable expansion: "\\p{cdnLetter}".]

Note that expansion/repetition is semantically different from expression
backreferences:

$pattern2 = "([A-CEG-HJ-NPR-TVXY])\\d\\1\\d\\1\\d";

$pattern would match "A1B2C3"; $pattern2 would match "A1A2A3", but not
"A1B2C3".


PointedEars
___________
[1] <http://PointedEars.de/scripts/test/regexp> p.
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Where's the error?
Next Topic: Xml Loading special Characters
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Mon Nov 25 02:42:26 GMT 2024

Total time taken to generate the page: 0.05936 seconds