On 2013-08-21 8:53 AM, Thomas 'PointedEars' Lahn wrote:
> Norman Peelman wrote:
>
>> On 08/20/2013 01:27 PM, Twayne wrote:
>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>> The US is easy; mostly just be sure it's five numerics and except
>>> "00000" and "99999". But Canadian is a different story because:
....
>>
>> US Zip code:
>> [0-9]{5}(-{0,1}[0-9]{4}){0,1}
> ^^^^^ ^^^^^^^^^^ ^^^^^
> In Perl-Compatible Regular Expressions (PCRE), as also used by PHP's preg_*
> functions, the following shorthands are available:
>
> - “*” for “{0,}”
> - “?” for “{0,1}”
> - “+” for “{1,}”
> - “\d” for “[0-9]” (includes more numeric characters in “UTF-8 mode”)
>
> Thus, the above expression can be simplified to
>
> \d{5}(-?\d{4})?
>
> However, the specification above says that “00000” and “99999” are _not_
> valid U.S. ZIP codes, so to be exact you cannot just use either “[0-9]{5}”
> or “\d{5}”; but you would have to use, for example, a zero-width negative
> lookahead:
>
> $possibleZips = array('00000', '00001', '99998', '99999');
> foreach ($possibleZips as $possibleZip)
> {
> preg_match('^(?![09]{5})\\d{5}(?:-?\\d{4})?$', $possibleZip, $matches);
> var_dump($possibleZip);
> var_dump($matches);
> }
>
> (thanks to Anubhava: <http://stackoverflow.com/a/9609624/855543>)
>
>> Canadian zip code (all one line, don't miss the space!):
>> ([A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}
> ^ ^^^^^^^^
>> {1}([0-9]{1}[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})
>
> “{1}” is superfluous in all regular expression flavours (in BRE the escaped
> variant is superfluous). An expression that matches, matches exactly one
> time unless a following quantifier says otherwise.
>
> In a character class expression, ranges are _not_ delimited by comma.
> A comma there is a *literal* comma instead (just like most other special
> characters lose, and “-” gains meaning), and repetitions are ignored:
>
> [A-C,E,G-H,J-N,P,R-T,V,X,Y]
>
> matches the same strings as
>
> [A-CEG-HJ-NPR-TVXY,]
>
> So unless you want to allow commas in ZIP codes, you need to remove them
> from the respective character class.
>
> Thus, the above expression would have to be changed, and can be simplified
> to
>
> ^(?:[A-CEG-HJ-NPR-TVXY]\d){3}$
>
> (The “^” makes sure that the second, fourth, aso. character must be a digit.
> Let \s* follow it if you want to allow leading whitespace. Likewise for “$”
> and trailing whitespace.)
>
> Anyhow, if an expression is repeated, and this repetition cannot be handled
> with a quantifier like above, in programming languages like PHP that allow
> this, code is easier readable if you assign the repeated expression to a
> variable, and have the variable reference expanded:
>
> $cdn_letter = '[A-CEG-HJ-NPR-TVXY]';
> $pattern = "^{$cdn_letter}\\d{$cdn_letter}\\d{$cdn_letter}\\d\$";
>
> [In certain programming languages, libraries like my JSX:regexp.js [1] are
> useful that allow you to define and use your own character class escape
> sequences, eliminating the need for variable expansion: "\\p{cdnLetter}".]
>
> Note that expansion/repetition is semantically different from expression
> backreferences:
>
> $pattern2 = "([A-CEG-HJ-NPR-TVXY])\\d\\1\\d\\1\\d";
>
> $pattern would match "A1B2C3"; $pattern2 would match "A1A2A3", but not
> "A1B2C3".
>
>
> PointedEars
> ___________
> [1] <http://PointedEars.de/scripts/test/regexp> p.
>
Woof! A veritable cornucopia of information which I've already dedicated
to a file on my hard drive! I wasn't aware of most of that and it's
going to be really handy soon's I understand it all, for now and the
future.
One slight correction: the Canadian valid letters are:
abc e gh jklmn p rst vxy .
Not sure where it went astray; if you need clarification visit the
Canadian Postal Code reference; don't have the URL itself. Besides, it's
always best to verify ANY information from any source on the 'net.
If you happen to know the Canadian system at all, the fuller breadk-down is:
$aRegion = array(
'nl' => 'a',
'ns' => 'b',
'pe' => 'c',
'nb' => 'e',
'qc' => array('g', 'h', 'j'),
'on' => array('k', 'l', 'm', 'n', 'p'),
'mb' => 'r',
'sk' => 's',
'ab' => 't',
'bc' => 'v',
'nt' => 'x',
'nu' => 'x',
'yt' => 'y'
);
Also verifiable at the Canadian Postal website, including a map.
Thanks much!
Twayne`
|