Google sitemap of FUDforum [message #28343] |
Wed, 19 October 2005 06:40 |
Hurry
Messages: 33 Registered: October 2005
Karma: 0
|
Member |
|
|
Hello! It will be really great if there can be a way to make a Google sitemap.xml file of our FUDforum's categories, topics and posts. My experience that all the sitemap.xml files I had loaded in the http://www.google.com/webmasters/sitemaps/ get very quickly and regularly spidered by google. I hope there is some way to do it or that Ilia mayadd this feature.
|
|
|
|
|
Re: Google sitemap of FUDforum [message #41155 is a reply to message #41119] |
Thu, 29 May 2008 14:48 |
rush2112
Messages: 15 Registered: April 2005
Karma: 0
|
Junior Member |
|
|
Here's a little file I wrote to do that:
<?php
include ("GLOBALS.php");
$dbh=mysql_connect ("$DBHOST", "$DBHOST_USER", "$DBHOST_PASSWORD") or die ('No Connect: ' . mysql_error());
mysql_select_db ("$DBHOST_DBNAME");
$query = "SELECT thread_id, MAX(`post_stamp`) from `fud26_msg` group by thread_id";
$result = mysql_query($query) or die ("$admtext[cannotexecutequery]: $query");
echo "Writing forum sitemap to the file<br><br>";
while( $row = mysql_fetch_array($result) ) {
$thread_id = $row["thread_id"];
$post_stamp = $row["MAX(`post_stamp`)"];
$post_time = date("H:i:s",$post_stamp);
$post_date = date("Y-m-d",$post_stamp);
$filetext = "<url><loc>" . $WWW_ROOT . "index.php/t/$thread_id/</loc>";
$filetext .= "<lastmod>" . $post_date . "T" . $post_time . "+00:00</lastmod><changefreq>weekly</changefreq></url>\n";
print $filetext;
}
?>
Just save the code into a php file and place it in your FUDforum directory.
Run the file from a browser, then view the 'Page Source'
You can copy (excluding the first line) and paste the info into your_forum_sitemap.xml file and Google should be happy.
(Make sure you have the site map protocol included at the top of your sitemap file - https://www.google.com/webmasters/tools/docs/en/protocol.html )
I use PATH_INFO style URLs so my output is something like: http://www.mysite.com/forum/index.php/t/3422/
If you dont use PATH_INFO, you will need to change this line:
$filetext = "<url><loc>" . $WWW_ROOT . "index.php/t/$thread_id/</loc>";
to something like this:
$filetext = "<url><loc>" . $WWW_ROOT . "index.php?t=msg&th=" . $thread_id . "&start=0&/</loc>";
I know it's a bit clunky and one of these days I'll get around to getting the script to write the sitemap file directly.
HTH,
Rush
[Updated on: Thu, 29 May 2008 14:55] Report message to a moderator
|
|
|
|
|
|
|
Re: Google sitemap of FUDforum [message #162690 is a reply to message #161265] |
Fri, 02 July 2010 06:36 |
Ernesto
Messages: 413 Registered: August 2005
Karma: 0
|
Senior Member |
|
|
This tool worked great, however perhaps it should check with the permissions, so that it validates all the threads as "Guest" or "Registered user" - Now it grabs everything on the site and a site such as mine, where 90% of the content is "private" and shouldnt be indexed by Google (Since it will just bump into error links) this script isn't overly helpful I am afraid.
I could of course manually alter the SQL to fit the proper forum_IDs where guest has access to, but I think a real sollution could be nice? It's above my head though, so I cant help with that one.
Ginnunga Gaming
|
|
|
|
Re: Google sitemap of FUDforum [message #162696 is a reply to message #162694] |
Sat, 03 July 2010 10:01 |
Ernesto
Messages: 413 Registered: August 2005
Karma: 0
|
Senior Member |
|
|
Yes, if you change
// Limit topics to what the user has access to.
if ($auth_as_user) {
$join = 'INNER JOIN fud30_group_cache g1 ON g1.user_id=2147483647 AND g1.resource_id=f.id
LEFT JOIN fud30_group_cache g2 ON g2.user_id='. $auth_as_user .' AND g2.resource_id=f.id
LEFT JOIN fud30_mod mm ON mm.forum_id=t.forum_id AND mm.user_id='. $auth_as_user .' ';
$lmt = '(mm.id IS NOT NULL OR (COALESCE(g2.group_cache_opt, g1.group_cache_opt) & 2) > 0)';
} else {
$join = 'INNER JOIN fud30_group_cache g1 ON g1.user_id=0 AND g1.resource_id=t.forum_id ';
$lmt = '(g1.group_cache_opt & 2) > 0';
}
to this:
// Limit topics to what the user has access to.
if ($auth_as_user) {
$join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=2147483647 AND g1.resource_id=f.id
LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g2 ON g2.user_id='. $auth_as_user .' AND g2.resource_id=f.id
LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'mod mm ON mm.forum_id=t.forum_id AND mm.user_id='. $auth_as_user .' ';
$lmt = '(mm.id IS NOT NULL OR (COALESCE(g2.group_cache_opt, g1.group_cache_opt) & 2) > 0)';
} else {
$join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=0 AND g1.resource_id=t.forum_id ';
$lmt = '(g1.group_cache_opt & 2) > 0';
}
Ginnunga Gaming
|
|
|
Re: Google sitemap of FUDforum [message #162697 is a reply to message #162696] |
Sat, 03 July 2010 10:17 |
Ernesto
Messages: 413 Registered: August 2005
Karma: 0
|
Senior Member |
|
|
Oh yes, there is another slight overlook also.
$filetext = "<url>\n";
if ($FUD_OPT_2 & 32768) { // USE_PATH_INFO
$filetext .= "\t<loc>${WWW_ROOT}index.php/t/${thread_id}/</loc>\n";
} else {
$filetext .= "\t<loc>${WWW_ROOT}index.php?t=msg&th=${thread_id}&start=0</loc>\n";
}
Should index.php really be written in clear? Shouldn't it be replaced by ${ROOT} or something? Like below:
$filetext = "<url>\n";
if ($FUD_OPT_2 & 32768) { // USE_PATH_INFO
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}t/${thread_id}/${thread_title_SEO}/</loc>\n";
} else {
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}?t=msg&th=${thread_id}&start=0</loc>\n";
}
With my SEO tweak the whole code looks like this now:
(inner joined msg table to get thread subject so i could mangle all chars away and lowercase it)
note: without tweaks to users.inc.t threads who start with a number will be interpreted as "&start=20" (20=number) and the sitemap link wont work, i fixed this with an is_numeric check in users.inc.t, still would break on a thread where subject actually is a number, but well, I can live with that. - Another fix could be to just start the SEO subject with a -.
PLEASE note that my str_replace code is UGLY and should be corrected by someone that is properly skilled with str_replace or regular expressions. I have no clue about that.
#!/usr/bin/php -q
<?php
/**
* copyright : (C) 2001-2010 Advanced Internet Designs Inc.
* email : forum(at)prohost(dot)org
* $Id$
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; version 2 of the License.
**/
/* Google sitemap settings. */
$frequency = 'weekly';
$priority = '0.5';
$auth_as_user = 0; // User 0 == anonymous.
set_time_limit(0);
ini_set('memory_limit', '128M');
define('forum_debug', 1);
unset($_SERVER['REMOTE_ADDR']);
if (strncmp($_SERVER['argv'][0], '.', 1)) {
require (dirname($_SERVER['argv'][0]) .'/GLOBALS.php');
} else {
require (getcwd() .'/GLOBALS.php');
}
fud_use('err.inc');
fud_use('db.inc');
// Limit topics to what the user has access to.
if ($auth_as_user) {
$join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=2147483647 AND g1.resource_id=f.id
LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g2 ON g2.user_id='. $auth_as_user .' AND g2.resource_id=f.id
LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'mod mm ON mm.forum_id=t.forum_id AND mm.user_id='. $auth_as_user .' ';
$lmt = '(mm.id IS NOT NULL OR (COALESCE(g2.group_cache_opt, g1.group_cache_opt) & 2) > 0)';
} else {
$join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=0 AND g1.resource_id=t.forum_id ';
$lmt = '(g1.group_cache_opt & 2) > 0';
}
$c = uq('SELECT t.id, t.last_post_date, t.root_msg_id, m.id, m.subject FROM '. $GLOBALS['DBHOST_TBL_PREFIX'] .'thread t '. $join .'
inner join '. $GLOBALS['DBHOST_TBL_PREFIX'] .'msg m ON t.root_msg_id = m.id
WHERE '. $lmt .' ORDER BY t.last_post_date DESC LIMIT 50000');
echo "Writing sitemap.xml file to ${GLOBALS['WWW_ROOT_DISK']}\n";
$fh = fopen($GLOBALS['WWW_ROOT_DISK'].'/sitemap.xml', 'w');
$xmlhead = <<<EOF
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">\n
EOF;
fwrite($fh, $xmlhead);
while ($r = db_rowarr($c)) {
$thread_id = $r[0];
// $post_stamp = date('H:i:s', $r[1]) .'T'. date('Y-m-d', $r[1]);
$post_stamp = date('H:i:s\TY-m-d', $r[1]);
$thread_title_SEO = str_replace(" ","-",$r[4]);
$thread_title_SEO = strtolower($thread_title_SEO);
$thread_title_SEO = preg_replace('/[^a-z0-9_]/i', '-', $thread_title_SEO);
$thread_title_SEO = preg_replace('/_[_]*/i', '-', $thread_title_SEO);
$thread_title_SEO = str_replace('---', '-', $thread_title_SEO);
$thread_title_SEO = str_replace('--', '-', $thread_title_SEO);
$thread_title_SEO = str_replace('-s-', 's-', $thread_title_SEO);
$thread_title_SEO = str_replace("%","",$thread_title_SEO);
$filetext = "<url>\n";
if ($FUD_OPT_2 & 32768) { // USE_PATH_INFO
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}t/${thread_id}/${thread_title_SEO}/</loc>\n";
} else {
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}?t=msg&th=${thread_id}&start=0</loc>\n";
}
$filetext .= "\t<lastmod>${post_stamp}+00:00</lastmod>\n";
$filetext .= "\t<changefreq>$frequency</changefreq>\n";
$filetext .= "\t<priority>$priority</priority>\n";
$filetext .= "</url>\n";
fwrite($fh, $filetext);
}
fwrite($fh, "</urlset>\n");
fclose($fh);
$google = 'www.google.com';
echo "Notify $google...";
if($fp = @fsockopen($google, 80)) {
$req = "GET /webmasters/sitemaps/ping?sitemap=". urlencode($GLOBALS['WWW_ROOT'].'sitemap.xml') ." HTTP/1.1\r\n".
"Host: $google\r\n".
"User-Agent: FUDforum $FORUM_VERSION\r\n".
"Connection: Close\r\n\r\n";
fwrite($fp, $req);
while(!feof($fp)) {
if( @preg_match('~^HTTP/\d\.\d (\d+)~i', fgets($fp, 128), $m) ) {
echo ' status: '. intval($m[1]) ."\n";
break;
}
}
fclose($fp);
}
echo "Done!\n";
?>
Ginnunga Gaming
|
|
|
|
|
|
|
|
|