FUDforum: comp.lang.php » Extracting multiple matches from a string using preg

Home » Imported messages » comp.lang.php » Extracting multiple matches from a string using preg_replace

Show: Today's Messages :: Polls :: Message Navigator

Extracting multiple matches from a string using preg_replace [message #178582]

Mon, 09 July 2012 08:50

jwcarlton
Messages: 76
Registered: December 2010

Karma: 0

Member

I'm working with a message board database that already has a bunch of YouTube links in the comments, and I'm trying to replace all of the links with a new alternate.

The existing strings are like:

$this_comment = '<a href="

" target="_new">http://www.youtube.com/watc...vidid</a> <a href="

" target="_new">http://www.youtube.com/watc...vidid_2</a>';

Notice that this string has 2 separate YouTube links.

If you're not familiar, YouTube has several possible link formats, so using parse_url() doesn't really work:

youtube.com/v/{vidid}
youtube.com/vi/{vidid}
youtube.com/?v={vidid}
youtube.com/?vi={vidid}
youtube.com/watch?v={vidid}
youtube.com/watch?vi={vidid}
youtu.be/{vidid}
youtube.com/v/{vidid}?feature=autoshare&version=3&autohide=1&au toplay=1

I've written the regex to find the ID and replace the link correctly, but it only works with the first link that it finds. How do I make it work with all of the matching links in the string?

Here's what I have:

// Fetch the VIDID
$this_id = preg_replace("#.*?<a href=\" http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?)[&.+]*\ " target=\"_new\">.*?<\/a>.*#",
"$2", $this_comment);

// I'm not sure why preg_replace isn't catching the extra variables;
// I thought the [&.+]* would do this? Either way, this is a workaround:
list($this_id) = explode("&", $this_id);

// Replace link
if ($this_id) {
$new_link = "Example replacement: $this_id";

$this_comment = preg_replace("#<a href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*" . $this_id . "[&.+]*\" target=\"_new\">.*?<\/a>#",
"$new_link", $this_comment);
}

Report message to a moderator

Re: Extracting multiple matches from a string using preg_replace [message #178583 is a reply to message #178582]

Mon, 09 July 2012 12:44

Captain Paralytic
Messages: 204
Registered: September 2010

Karma: 0

Senior Member

On Jul 9, 9:50 am, Jason C <jwcarl...@gmail.com> wrote:
> I'm working with a message board database that already has a bunch of YouTube links in the comments, and I'm trying to replace all of the links with a new alternate.
>
> The existing strings are like:
>
> $this_comment = '<a href="

" target="_new">http://www.youtube.com/watc...vidid</a> <a href="

" target="_new">http://www.youtube.com/watc...vidid_2</a>';
>
> Notice that this string has 2 separate YouTube links.
>
> If you're not familiar, YouTube has several possible link formats, so using parse_url() doesn't really work:
>
> youtube.com/v/{vidid}
> youtube.com/vi/{vidid}
> youtube.com/?v={vidid}
> youtube.com/?vi={vidid}
> youtube.com/watch?v={vidid}
> youtube.com/watch?vi={vidid}
> youtu.be/{vidid}
> youtube.com/v/{vidid}?feature=autoshare&version=3&autohide=1&au toplay=1
>
> I've written the regex to find the ID and replace the link correctly, but it only works with the first link that it finds. How do I make it work with all of the matching links in the string?
>
> Here's what I have:
>
> // Fetch the VIDID
> $this_id = preg_replace("#.*?<a href=\" http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?)[&.+]*\ " target=\"_new\">.*?<\/a>.*#",
> "$2", $this_comment);
>
> // I'm not sure why preg_replace isn't catching the extra variables;
> // I thought the [&.+]* would do this? Either way, this is a workaround:
> list($this_id) = explode("&", $this_id);
>
> // Replace link
> if ($this_id) {
> $new_link = "Example replacement: $this_id";
>
> $this_comment = preg_replace("#<a href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*" . $this_id . "[&.+]*\" target=\"_new\">.*?<\/a>#",
> "$new_link", $this_comment);

Well I'm very confused by all this. First of all, why are you using
preg_replace to extract the vidid? I would have thought that a job
better suited to preg_match.

Next, in your string assigned to $this_comment, the first vidid is
different to the other 2, so why are you expecting $this_id to match
all of them?

Report message to a moderator

Re: Extracting multiple matches from a string using preg_replace [message #178584 is a reply to message #178582]

Mon, 09 July 2012 13:29

Peter H. Coffin
Messages: 245
Registered: September 2010

Karma: 0

Senior Member

On Mon, 9 Jul 2012 01:50:39 -0700 (PDT), Jason C wrote:

> I'm working with a message board database that already has a bunch of
> YouTube links in the comments, and I'm trying to replace all of the
> links with a new alternate.
>
> The existing strings are like:
>
> $this_comment = '<a href="

"
> target="_new">http://www.youtube.com/watc...vidid</a> <a
> href="

"
> target="_new">http://www.youtube.com/watc...vidid_2</a>';
>
> Notice that this string has 2 separate YouTube links.
>
> If you're not familiar, YouTube has several possible link formats, so
> using parse_url() doesn't really work:
>
> youtube.com/v/{vidid}
> youtube.com/vi/{vidid}
> youtube.com/?v={vidid}
> youtube.com/?vi={vidid}
> youtube.com/watch?v={vidid}
> youtube.com/watch?vi={vidid}
> youtu.be/{vidid}
> youtube.com/v/{vidid}?feature=autoshare&version=3&autohide=1&au toplay=1
>
>
> I've written the regex to find the ID and replace the link correctly,
> but it only works with the first link that it finds. How do I make it
> work with all of the matching links in the string?

That's the drawback to using preg_replace() for this. You can't capture
all the bits you want to extract because you *must* enumerate them.
preg_match_all() returns an array of matches, which is what you want if
you don't know how many you're going to get back going in.

> Here's what I have:
>
> // Fetch the VIDID $this_id = preg_replace("#.*?<a
> href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?
> )[&.+]*\" target=\"_new\">.*?<\/a>.*#", "$2", $this_comment);
^^ -- enumerated result
>
> // I'm not sure why preg_replace isn't catching the extra variables;
> // I thought the [&.+]* would do this? Either way, this is a
> workaround: list($this_id) = explode("&", $this_id);

Define "catching" in this context. If you want it back, you need to
paren-tag it so it goes into an enumerated output slot.

> // Replace link if ($this_id) { $new_link = "Example replacement:
> $this_id";
>
> $this_comment = preg_replace("#<a
> href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*"
> . $this_id . "[&.+]*\" target=\"_new\">.*?<\/a>#", "$new_link",
> $this_comment); }

--
"'I'm not sleeping with a jr. high schooler! I have a life-sized doll
that looks like one.' Uh huh. That sounds SO much less pathetic."
-- Piro's Conscience www.megatokyo.com

Report message to a moderator

Re: Extracting multiple matches from a string using preg_replace [message #178586 is a reply to message #178583]

Mon, 09 July 2012 18:58

jwcarlton
Messages: 76
Registered: December 2010

Karma: 0

Member

On Monday, July 9, 2012 8:44:06 AM UTC-4, Captain Paralytic wrote:
> Well I'm very confused by all this. First of all, why are you using
> preg_replace to extract the vidid? I would have thought that a job
> better suited to preg_match.

Probably just a lack of knowledge on my part. I thought that preg_match was used to find if the regex was true or false, and then preg_replace would be used to replace whatever.

From Peter's reply, I don't think that either of them are the right command.. But for the sake of my own learning, how would I have modified my script (catching only one) to use preg_match instead of preg_replace?

And, if preg_replace works, then what's the advantage? Speed?

> Next, in your string assigned to $this_comment, the first vidid is
> different to the other 2, so why are you expecting $this_id to match
> all of them?

No, that was the point; they're not going to match, so I need to modify the script to replace ALL of the existing links with the ID that's in that link.

That's why I turned to you guys. My only thought was to put the script in a function, then use a while() loop to keep running the function until there were no more links. I couldn't get it to work, though, and I didn't like the idea of using a loop on it, anyway, so I thought you guys might have a better suggestion.

Report message to a moderator

Re: Extracting multiple matches from a string using preg_replace [message #178587 is a reply to message #178584]

Mon, 09 July 2012 19:00

jwcarlton
Messages: 76
Registered: December 2010

Karma: 0

Member

&feature=related

Returns this as $this_id:

123456&feature=related

when I just want the "123456".

Report message to a moderator

Re: Extracting multiple matches from a string using preg_replace [message #178590 is a reply to message #178584]

Tue, 10 July 2012 09:08

jwcarlton
Messages: 76
Registered: December 2010

Karma: 0

Member

On Monday, July 9, 2012 9:29:15 AM UTC-4, Peter H. Coffin wrote:
> That's the drawback to using preg_replace() for this. You can't capture
> all the bits you want to extract because you *must* enumerate them.
> preg_match_all() returns an array of matches, which is what you want if
> you don't know how many you're going to get back going in.

Just a note for anyone else reading this later, preg_match_all() did work perfectly. I changed:

$this_id = preg_match(...);

to this:

preg_match_all("#<a href=\" http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?)[&.+]*\ " target=\"_new\">.*?<\/a>#m",
$this_comment, $matches);

This gives me a multidimensional array of $matches, where $matches[2] is the array that holds the values from $2.

So after finding the array, it's a simple matter of putting the second preg_replace() in a foreach loop:

foreach ($matches[2] as $this_id) {
// I'm still not sure why $this_id is keeping the other params
list($this_id) = explode("&", $this_id);

$this_comment = preg_replace(...);
}

Thanks for the help, Peter! If you happen to see the error I'm making with the extra params (forcing me to use explode to get rid of them), I'd appreciate any insight. The workaround is working, though, so it's not a big deal... just sloppy, I guess.

Report message to a moderator

Previous Topic:	Invitacion a INFOSOFT 2012 [PUCP]
Next Topic:	PHP does not flush output on IIS7 ..

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Fri Nov 22 03:03:55 GMT 2024

Total time taken to generate the page: 0.02699 seconds