setup of curl-multi: looping over a bunch of sites [how to adress the array] [message #170792] |
Fri, 26 November 2010 21:52 |
matze
Messages: 5 Registered: November 2010
Karma:
|
Junior Member |
|
|
hello dear php-friends
i currently work on a little parser project
i have to find solutions for the
a. fetching part
b. parser part
here we go - the target urls:
see the overview: http://dms-schule.bildung.hessen.de/index.html
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html
Search by pressing the button "type" and then choose all schools with
the mouse!
Results 2400 schools
Here i can provide some "more help for getting the target!" -
btw: see some details for this target-server:
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school= 9009
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school= 9742
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school= 9871
well - you see i have to itterate over the sites - with a function /(a
loop)
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school= 1000
to 10000
BTW - after fetching the page i have to see which one are empty -
those ones do not need to be
parsed!
Well - i want to do this with curl-multi since this is the most
advanced way to do this:
I see i have an array that can be filled -...
but i have to think about the string-concatenation - i guess that i
have make some
sophisticated string concatenation.
this one does not fit -
for($i=1;$i<=$match[1];$i++)
{
$url = "http://www.example.com/page?page={$i}";
and besides this i have an array - i c an fill the array.
can you help me how to run in a loop with
<?php
/************************************\
* Multi interface in PHP with curl *
* Requires PHP 5.0, Apache 2.0 and *
* Curl *
*************************************
* Writen By Cyborg 19671897 *
* Bugfixed by Jeremy Ellman *
\***********************************/
$urls = array(
"http://www.google.com/",
"http://www.altavista.com/",
"http://www.yahoo.com/"
);
$mh = curl_multi_init();
foreach ($urls as $i => $url) {
$conn[$i]=curl_init($url);
curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);//return data
as string
curl_setopt($conn[$i],CURLOPT_FOLLOWLOCATION,1);//follow
redirects
curl_setopt($conn[$i],CURLOPT_MAXREDIRS,2);//maximum redirects
curl_setopt($conn[$i],CURLOPT_CONNECTTIMEOUT,10);//timeout
curl_multi_add_handle ($mh,$conn[$i]);
}
do { $n=curl_multi_exec($mh,$active); } while ($active);
foreach ($urls as $i => $url) {
$res[$i]=curl_multi_getcontent($conn[$i]);
curl_multi_remove_handle($mh,$conn[$i]);
curl_close($conn[$i]);
}
curl_multi_close($mh);
print_r($res);
?>
|
|
|