FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum » How To » Google sitemap of FUDforum
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Google sitemap of FUDforum [message #162697 is a reply to message #162696] Sat, 03 July 2010 10:17 Go to previous messageGo to previous message
Ernesto is currently offline  Ernesto   Sweden
Messages: 413
Registered: August 2005
Karma:
Senior Member
Oh yes, there is another slight overlook also.
$filetext = "<url>\n";
                if ($FUD_OPT_2 & 32768) {       // USE_PATH_INFO
                        $filetext .= "\t<loc>${WWW_ROOT}index.php/t/${thread_id}/</loc>\n";
                } else {
                        $filetext .= "\t<loc>${WWW_ROOT}index.php?t=msg&amp;th=${thread_id}&amp;start=0</loc>\n";
                }

Should index.php really be written in clear? Shouldn't it be replaced by ${ROOT} or something? Like below:

$filetext = "<url>\n";
                if ($FUD_OPT_2 & 32768) {       // USE_PATH_INFO
                        $filetext .= "\t<loc>${WWW_ROOT}${ROOT}t/${thread_id}/${thread_title_SEO}/</loc>\n";
                } else {
                        $filetext .= "\t<loc>${WWW_ROOT}${ROOT}?t=msg&amp;th=${thread_id}&amp;start=0</loc>\n";
                }




With my SEO tweak the whole code looks like this now:

(inner joined msg table to get thread subject so i could mangle all chars away and lowercase it)

note: without tweaks to users.inc.t threads who start with a number will be interpreted as "&start=20" (20=number) and the sitemap link wont work, i fixed this with an is_numeric check in users.inc.t, still would break on a thread where subject actually is a number, but well, I can live with that. - Another fix could be to just start the SEO subject with a -.

PLEASE note that my str_replace code is UGLY and should be corrected by someone that is properly skilled with str_replace or regular expressions. I have no clue about that.
#!/usr/bin/php -q
<?php
/**
* copyright            : (C) 2001-2010 Advanced Internet Designs Inc.
* email                : forum(at)prohost(dot)org
* $Id$
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; version 2 of the License.
**/

        /* Google sitemap settings. */
        $frequency    = 'weekly';
        $priority     = '0.5';
        $auth_as_user = 0;      // User 0 == anonymous.

        set_time_limit(0);
        ini_set('memory_limit', '128M');
        define('forum_debug', 1);
        unset($_SERVER['REMOTE_ADDR']);

        if (strncmp($_SERVER['argv'][0], '.', 1)) {
                require (dirname($_SERVER['argv'][0]) .'/GLOBALS.php');
        } else {
                require (getcwd() .'/GLOBALS.php');
        }

        fud_use('err.inc');
        fud_use('db.inc');

        // Limit topics to what the user has access to.
        if ($auth_as_user) {
                $join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=2147483647 AND g1.resource_id=f.id
                                LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g2 ON g2.user_id='. $auth_as_user .' AND g2.resource_id=f.id
                                LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'mod mm ON mm.forum_id=t.forum_id AND mm.user_id='. $auth_as_user .' ';
                $lmt  = '(mm.id IS NOT NULL OR (COALESCE(g2.group_cache_opt, g1.group_cache_opt) & 2) > 0)';
        } else {
                $join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=0 AND g1.resource_id=t.forum_id ';
                $lmt  = '(g1.group_cache_opt & 2) > 0';
        }
$c = uq('SELECT t.id, t.last_post_date, t.root_msg_id, m.id, m.subject FROM '. $GLOBALS['DBHOST_TBL_PREFIX'] .'thread t '. $join .'
                inner join '. $GLOBALS['DBHOST_TBL_PREFIX'] .'msg m ON t.root_msg_id = m.id
                WHERE '. $lmt .' ORDER BY t.last_post_date DESC LIMIT 50000');

        echo "Writing sitemap.xml file to ${GLOBALS['WWW_ROOT_DISK']}\n";
        $fh = fopen($GLOBALS['WWW_ROOT_DISK'].'/sitemap.xml', 'w');
        $xmlhead = <<<EOF
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
                            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">\n
EOF;
        fwrite($fh, $xmlhead);

        while ($r = db_rowarr($c)) {
                $thread_id = $r[0];
                // $post_stamp = date('H:i:s', $r[1]) .'T'. date('Y-m-d', $r[1]);
                $post_stamp = date('H:i:s\TY-m-d', $r[1]);

                $thread_title_SEO = str_replace(" ","-",$r[4]);
                $thread_title_SEO = strtolower($thread_title_SEO);
                $thread_title_SEO = preg_replace('/[^a-z0-9_]/i', '-', $thread_title_SEO);
                $thread_title_SEO = preg_replace('/_[_]*/i', '-', $thread_title_SEO);
                $thread_title_SEO = str_replace('---', '-', $thread_title_SEO);
                $thread_title_SEO = str_replace('--', '-', $thread_title_SEO);
                $thread_title_SEO = str_replace('-s-', 's-', $thread_title_SEO);
                $thread_title_SEO = str_replace("%","",$thread_title_SEO);

                $filetext = "<url>\n";
                if ($FUD_OPT_2 & 32768) {       // USE_PATH_INFO
                        $filetext .= "\t<loc>${WWW_ROOT}${ROOT}t/${thread_id}/${thread_title_SEO}/</loc>\n";
                } else {
                        $filetext .= "\t<loc>${WWW_ROOT}${ROOT}?t=msg&amp;th=${thread_id}&amp;start=0</loc>\n";
                }
                $filetext .= "\t<lastmod>${post_stamp}+00:00</lastmod>\n";
                $filetext .= "\t<changefreq>$frequency</changefreq>\n";
                $filetext .= "\t<priority>$priority</priority>\n";
                $filetext .= "</url>\n";
fwrite($fh, $filetext);
        }

        fwrite($fh, "</urlset>\n");
        fclose($fh);

        $google = 'www.google.com';
        echo "Notify $google...";
        if($fp = @fsockopen($google, 80)) {
                $req = "GET /webmasters/sitemaps/ping?sitemap=". urlencode($GLOBALS['WWW_ROOT'].'sitemap.xml') ." HTTP/1.1\r\n".
                       "Host: $google\r\n".
                       "User-Agent: FUDforum $FORUM_VERSION\r\n".
                       "Connection: Close\r\n\r\n";
                fwrite($fp, $req);
                while(!feof($fp)) {
                        if( @preg_match('~^HTTP/\d\.\d (\d+)~i', fgets($fp, 128), $m) ) {
                                echo ' status: '. intval($m[1]) ."\n";
                                break;
                        }
                }
                fclose($fp);
        }

        echo "Done!\n";
?>



[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Chrome extension
Next Topic: Require approval of all posts
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Thu May 16 03:13:41 GMT 2024

Total time taken to generate the page: 0.04664 seconds