FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » preg_match() oddities and question
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: preg_match() oddities and question [message #176065 is a reply to message #176064] Tue, 22 November 2011 12:12 Go to previous messageGo to previous message
Sandman is currently offline  Sandman
Messages: 32
Registered: August 2011
Karma:
Member
In article <jag24m$4nj$1(at)softins(dot)clara(dot)co(dot)uk>,
tony(at)mountifield(dot)org (Tony Mountifield) wrote:

>> So I have this regexp:
>>
>> if (preg_match("/^(.*?)\s*(\d*?)\s*([A-Z,a-z,-]*?)$/", $search, $m)){
>
> You don't need the commas in the character class, unless you want to
> match a literal comma, in which case you only need it once.

Right, thanks :)

>> $streetname = uc_words($m[1]);
>> $streetnumber = trim($m[2]);
>> $streetletter = strtoupper($m[3]);
>> $search = trim($streetname . SPACE . $streetnumber .
>> $streetletter);
>> }
>>
>> The desired result is taki9ng the input ($search) and split it into
>> its parts as an address, right? $search can be, for example, "foo
>> street 34", "longstreet 45b", "longstreet 45 b" or just "longstreet".
>
> What about "foo street"? (i.e. with a space, but no number)

Exactly, that gets this:

Array
(
[0] => foo street
[1] => foo
[2] =>
[3] => street
)

Which is incorrect. IN fact, the last group SHOULD be defined as
([A-Za-z]{0,1}) but that still messes it up like:

Array
(
[0] => foo street
[1] => foo stree
[2] =>
[3] => t
)

So I've tried variations for that as well.

<snip>

> And you would also get:
>
> Array
> (
> [0] => foo street
> [1] => foo
> [2] =>
> [3] => street
> )
>
>> As you can see, the last group "([A-Z,a-z,-]*?)" matches the entire
>> search term since there are no digits and the first group is
>> non-greedy. And if I make the first group greedy, "longstreet" is
>> matched correctly, but it also catches the entire "longstreet 45b"
>> when searching for that.
>
> Yes, you need to define your rules more closely. Not at the regex level,
> but actually at the logic/decision level. If you can make rules that
> can unambiguously specify how all kinds of input should be parsed,
> then you can look at how to represent that in regexes. You might need
> some additional logic to operate on the parsed result.

What you're basically suggesting is a series of regexp to find out
what "style" an adress is given in, and then parse out the parts?
Because I'm not sure how I would be able to do it without a series if
if/else preg_match():es?

>> Also, when searching for a term in swedish characters, I get this:
>>
>> Array
>> (
>> [0] => vikavÀgen
>> [1] => vikavÀ
>> [2] =>
>> [3] => gen
>> )
>>
>> Which is quite odd to me, why isn't "vikavÀgen" matched the same
>> (undesired) way that "oongstreet". I have tried the /u modifier, and
>> made sure that it was utf8-encoded, but it didn't make a difference
>> (incoming encoding is ISO 8859-1).
>>
>> Why the difference, and how do I correctly parse out parts as needed?
>
> That's because À is not in the set A-Za-z. If you want a character class
> that properly recognises locale-specific letters, you need to change your
> character class above to this:
>
> [[:alpha:]\-]
>
> Hope this helps!

That explains the difference, thank you very much for that. Now I
still need to figure out a global parse routine or criteria for
parsing out the address parts...







--
Sandman[.net]
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Amazing Website!!!
Next Topic: session handler auto log out
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Nov 27 15:35:46 GMT 2024

Total time taken to generate the page: 0.04197 seconds