FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Please test/review my Regex to locate hyperlink in text
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Please test/review my Regex to locate hyperlink in text [message #172485] Mon, 21 February 2011 06:06 Go to next message
Simon is currently offline  Simon
Messages: 29
Registered: February 2011
Karma: 0
Junior Member
Hi,

My requirements are as follow,

1) Find all hyperlinks in a given text document...
2) Parse for certain links and replace them if need be, (in another
parser function).
3) Any attributes given in the hyperlink can be ignored and will be
faithfully returned in the matches.
4) All JavaScript and the likes are pre-stripped so the text can be
assumed to be 'safe', (if it is not safe then it is not the job of this
regex to handle it).

// -----------------------------------
// my pattern...
$pattern = '/<a (.*?)href=[\"\']??(.*?)\/\/(.*?)[\s\"\'](.*?)>(.*?)<\/a>/i';

// the call back function
$body = preg_replace_callback($pattern, 'my_parser', $body);

// -----------------------------------

The way I see it this should work for...

- <a href='example.com'>some text</a>
- <a href="example.com">some text</a>
- <a href=example.com>some text</a>
- <a href='http://example.com'>some text</a>
- <a href="http://example.com">some text</a>
- <a href=http://example.com>some text</a>

- <a href='example.com' tagret=_blank>some text</a>
- <a href="example.com" tagret=_blank>some text</a>
- <a href=example.com tagret=_blank>some text</a>
- <a href='http://example.com' tagret=_blank>some text</a>
- <a href="http://example.com" tagret=_blank>some text</a>
- <a href=http://example.com tagret=_blank>some text</a>

Can you poke holes in my regex please :)
Any suggestions/better regexs?

Many thanks

Simon
Re: Please test/review my Regex to locate hyperlink in text [message #172486 is a reply to message #172485] Mon, 21 February 2011 08:21 Go to previous message
alvaro.NOSPAMTHANX is currently offline  alvaro.NOSPAMTHANX
Messages: 277
Registered: September 2010
Karma: 0
Senior Member
El 21/02/2011 7:06, Simon escribió/wrote:
> My requirements are as follow,
>
> 1) Find all hyperlinks in a given text document...
> 2) Parse for certain links and replace them if need be, (in another
> parser function).
> 3) Any attributes given in the hyperlink can be ignored and will be
> faithfully returned in the matches.
> 4) All JavaScript and the likes are pre-stripped so the text can be
> assumed to be 'safe', (if it is not safe then it is not the job of this
> regex to handle it).
>
> // -----------------------------------
> // my pattern...
> $pattern = '/<a
> (.*?)href=[\"\']??(.*?)\/\/(.*?)[\s\"\'](.*?)>(.*?)<\/a>/i';
>
> // the call back function
> $body = preg_replace_callback($pattern, 'my_parser', $body);
>
> // -----------------------------------
>
> The way I see it this should work for...
>
> - <a href='example.com'>some text</a>
> - <a href="example.com">some text</a>
> - <a href=example.com>some text</a>
> - <a href='http://example.com'>some text</a>
> - <a href="http://example.com">some text</a>
> - <a href=http://example.com>some text</a>
>
> - <a href='example.com' tagret=_blank>some text</a>
> - <a href="example.com" tagret=_blank>some text</a>
> - <a href=example.com tagret=_blank>some text</a>
> - <a href='http://example.com' tagret=_blank>some text</a>
> - <a href="http://example.com" tagret=_blank>some text</a>
> - <a href=http://example.com tagret=_blank>some text</a>
>
> Can you poke holes in my regex please :)
> Any suggestions/better regexs?

If you are looking for <a> tags then it isn't a plain text document,
it's an HTML document. Unless it's just an exercise to learn how to use
regular expressions, you can simply do something like this:

<?php

$url = 'http://www.google.com';

$html = file_get_contents($url);
$doc = new DOMDocument;
libxml_use_internal_errors(TRUE);
$doc->loadHTML($html);
libxml_use_internal_errors(FALSE);

$links = $doc->getElementsByTagName('a');
foreach($links as $a){
echo $a->nodeValue . ': ' . $a->getAttribute('href') . PHP_EOL;
}

?>

Afterwards, you can analyse URLs with parse_url():

http://es.php.net/manual/en/function.parse-url.php


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: She is lovely and romantic. meet her onlinev
Next Topic: Stats comp.lang.php (last 7 days)
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Nov 26 00:42:21 GMT 2024

Total time taken to generate the page: 0.03958 seconds