FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Please test/review my Regex to locate hyperlink in text
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Please test/review my Regex to locate hyperlink in text [message #172486 is a reply to message #172485] Mon, 21 February 2011 08:21 Go to previous message
alvaro.NOSPAMTHANX is currently offline  alvaro.NOSPAMTHANX
Messages: 277
Registered: September 2010
Karma:
Senior Member
El 21/02/2011 7:06, Simon escribió/wrote:
> My requirements are as follow,
>
> 1) Find all hyperlinks in a given text document...
> 2) Parse for certain links and replace them if need be, (in another
> parser function).
> 3) Any attributes given in the hyperlink can be ignored and will be
> faithfully returned in the matches.
> 4) All JavaScript and the likes are pre-stripped so the text can be
> assumed to be 'safe', (if it is not safe then it is not the job of this
> regex to handle it).
>
> // -----------------------------------
> // my pattern...
> $pattern = '/<a
> (.*?)href=[\"\']??(.*?)\/\/(.*?)[\s\"\'](.*?)>(.*?)<\/a>/i';
>
> // the call back function
> $body = preg_replace_callback($pattern, 'my_parser', $body);
>
> // -----------------------------------
>
> The way I see it this should work for...
>
> - <a href='example.com'>some text</a>
> - <a href="example.com">some text</a>
> - <a href=example.com>some text</a>
> - <a href='http://example.com'>some text</a>
> - <a href="http://example.com">some text</a>
> - <a href=http://example.com>some text</a>
>
> - <a href='example.com' tagret=_blank>some text</a>
> - <a href="example.com" tagret=_blank>some text</a>
> - <a href=example.com tagret=_blank>some text</a>
> - <a href='http://example.com' tagret=_blank>some text</a>
> - <a href="http://example.com" tagret=_blank>some text</a>
> - <a href=http://example.com tagret=_blank>some text</a>
>
> Can you poke holes in my regex please :)
> Any suggestions/better regexs?

If you are looking for <a> tags then it isn't a plain text document,
it's an HTML document. Unless it's just an exercise to learn how to use
regular expressions, you can simply do something like this:

<?php

$url = 'http://www.google.com';

$html = file_get_contents($url);
$doc = new DOMDocument;
libxml_use_internal_errors(TRUE);
$doc->loadHTML($html);
libxml_use_internal_errors(FALSE);

$links = $doc->getElementsByTagName('a');
foreach($links as $a){
echo $a->nodeValue . ': ' . $a->getAttribute('href') . PHP_EOL;
}

?>

Afterwards, you can analyse URLs with parse_url():

http://es.php.net/manual/en/function.parse-url.php


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
[Message index]
 
Read Message
Read Message
Previous Topic: She is lovely and romantic. meet her onlinev
Next Topic: Stats comp.lang.php (last 7 days)
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Nov 27 07:51:26 GMT 2024

Total time taken to generate the page: 0.05975 seconds