FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum » FUDforum Suggestions » recognise google (/other) rather than having zillions of anon users
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
recognise google (/other) rather than having zillions of anon users [message #26846] Wed, 17 August 2005 09:06 Go to next message
djzort is currently offline  djzort   Australia
Messages: 30
Registered: May 2005
Karma: 0
Member
ive noticed that in the 'logged in users' invision (and possibly others) have your normal list of clickable users and an unclickable Google.com.

would be nice as currently when google and others hit you have 13+ anonymous users (if your forums arent closed to google with robots.txt)

perhaps a handy robots.txt editor would also be a feature for admins unawair of it.
Re: recognise google (/other) rather than having zillions of anon users [message #26850 is a reply to message #26846] Wed, 17 August 2005 14:00 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Most people want their forum to be indexed by search engines, so a default robots.txt would be a bad idea. People who want to avoid indexing usually know enough to use robots.txt already.

FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #26867 is a reply to message #26850] Thu, 18 August 2005 11:51 Go to previous messageGo to next message
djzort is currently offline  djzort   Australia
Messages: 30
Registered: May 2005
Karma: 0
Member
Ilia wrote on Thu, 18 August 2005 00:00

Most people want their forum to be indexed by search engines, so a default robots.txt would be a bad idea. People who want to avoid indexing usually know enough to use robots.txt already.



just detect google probing then
Re: recognise google (/other) rather than having zillions of anon users [message #27043 is a reply to message #26867] Thu, 25 August 2005 04:00 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
I'm curious why this topic started. Just didn't seem obvious to me. So now, recently Google has been rummaging through my forums, as small as they are. I can read all the messages in one sitting and without a drink of water. BUT, Google seems to think it's interesting and I'll post the stats on that as output by AwStats below. Anyone else getting this “harsh"attack" from Google?

robot            hits         bandwidth         lastvisit
------------------------------------------------------------
Googlebot      298469+38       5.79 GB        24 Aug 2005 - 22:42


Okay, so that is obviously this months Googlebot stats alone. Matter of fact, I'd have to check to be sure, but I'd say over ten days, fourteen tops. Next closest bot in bandwidth is only at 48.24MB.

I am well versed in robots.txt, not the point here. Smile


kenjb
Re: recognise google (/other) rather than having zillions of anon users [message #27045 is a reply to message #27043] Thu, 25 August 2005 05:08 Go to previous messageGo to next message
Anonymous   Australia
is it just me or is *every* feature request just rejected with some smart mouth comment?

have a robots.txt editor. it would be neat to have a 'property' of topics and forums to allow robots or not. and then fudforum would just generate the robots.txt. viola.

also

detect google. so in the 'currently online users' have
<link>user</link>,<link>user</link>,<link>user</link>,google bot, <link>user</link>,<link>user</link>

that or how about we have a pi computation page. just have php compute pi to infinity and spew it out. or perhaps a fast fourier transforms page. or maybe a page that differentiates delta? just in case little easy things arent worthy features?
Re: recognise google (/other) rather than having zillions of anon users [message #27056 is a reply to message #27045] Thu, 25 August 2005 13:42 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Good features are added, inappropriate ones are not, that's all there is to it. Detecting search engine requires extra code for EVERY SINGLE request made to the forum, that is highly ineffecient. Especially if the only result of this is that on member list you see 1 vs 50 for Google's spider.

FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #27060 is a reply to message #27056] Thu, 25 August 2005 13:53 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
I'm not looking for a feature, I'm just asking if anyone else is getting this type of inappropriate attention from Googlebot.

As of this morning, just in overnights time the bandwidth consumed by Googlebot went from 5.79 GB to 6.45 GB.

So, my original question although maybe I should have started my own thread elsewhere, and I apologize for not doing that, is, Anyone else seeing Googlebot eat up that much bandwidth while parsing FUD forums?


Just wanted to add this:

There are 1 members(s), 0 invisible members and 2240 guest(s) visiting this board. 


kenjb

[Updated on: Thu, 25 August 2005 13:56]

Report message to a moderator

Re: recognise google (/other) rather than having zillions of anon users [message #27062 is a reply to message #27060] Thu, 25 August 2005 13:57 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Are you certain these are made by genuine GoogleBot and not someone pretending to be GoogleBot? I manage a number of forums and all get indexed by Google and generally Google consumes no more then 25 megabytes per day.

FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #27063 is a reply to message #27062] Thu, 25 August 2005 14:03 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
Yep, I'm sure. Here is one line out of my log:

66.249.65.146 - - [22/Aug/2005:13:31:16 -0500] "GET 
/forums/index.php?t=index&cat=2&c=8:1_6:1_7:1_5:1_4:1_3:1_2:1&rid=0
 HTTP/1.1" 200 18389 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


and that IP comes back as Network Owner: Google Inc.

If it keeps up, to my dismay I'll have to block her. That kind of defeats the purpose of being on the web then.

EDIT:
made it wrap to prevent scrolling to the right


kenjb

[Updated on: Sun, 04 September 2005 19:08]

Report message to a moderator

Re: recognise google (/other) rather than having zillions of anon users [message #27064 is a reply to message #27063] Thu, 25 August 2005 14:05 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
The IP may belong to google or appear to, but that is not Googlebot IP.
The valid GoogleBot IP range can be found here:
http://www.searchengineworld.com/spiders/ip_addresses/google.htm

And this IP address is not on the list.


FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #27065 is a reply to message #27064] Thu, 25 August 2005 14:16 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
you can fake a lot of things, but it's hard to fake from this site:

http://www.senderbase.org/search?searchString=66.249.65.231

are other IP along with that one that come from the same block that are parsing the forum as well.

I have (had) a php callback on that particular page that calls for a random thumbnail picture. I have removed that callback this morning to see if that is part of the reason why Google is running through so many iterations. I'd be happy to snip out a piece of the log if you care to see how it looks.


kenjb
Re: recognise google (/other) rather than having zillions of anon users [message #27068 is a reply to message #27065] Thu, 25 August 2005 14:32 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Well, the IP does appear to be valid. That said I have not encountered or heard of GoogleBot going insane with reaching the page. If anything it looks like a bug in the bot itself.

FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #27069 is a reply to message #27068] Thu, 25 August 2005 14:36 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
I agree.

Thanks for your replies to this, uhm, thing. I'll post more if I find out anything, just fyi.


kenjb
Re: recognise google (/other) rather than having zillions of anon users [message #27080 is a reply to message #27069] Thu, 25 August 2005 16:28 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
Okay, I've attached a tail from my web log for those web gurus that wonder what it might look like due to this Google thing.

Also, after removing the php call back function that normally adds a random thumbnail picture to the page, the result is that Google can now access page loads FASTER than it did before causing no damage but showing "3217 guest(s) visiting this board" instead of the normal average of like, 1500+.
  • Attachment: log_snip.txt
    (Size: 90.54KB, Downloaded 965 times)


kenjb
Re: recognise google (/other) rather than having zillions of anon users [message #27096 is a reply to message #27056] Fri, 26 August 2005 01:04 Go to previous messageGo to next message
Anonymous   Australia
Ilia wrote on Thu, 25 August 2005 09:42

Good features are added, inappropriate ones are not, that's all there is to it. Detecting search engine requires extra code for EVERY SINGLE request made to the forum, that is highly ineffecient. Especially if the only result of this is that on member list you see 1 vs 50 for Google's spider.



whats wrong with an interface to robots.txt then?
that requires NO extra code for EVERY SINGLE request.

just have an option per forum and then per topic
something like 'allow search engine probes [x]'

robots.txt wipes out everything under it so if you turned it off on a forum you couldnt re-enable it on a per topic basis. but you could enable for a forum and then disable per topic.

sounds simple to me.
Re: recognise google (/other) rather than having zillions of anon users [message #27097 is a reply to message #27096] Fri, 26 August 2005 02:03 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Can you clarify but what you mean interface? Right now you can use the forum's file manager to upload your robots.txt and block unwanted spiders. Do you want to forum to generate this file for you based on given rules or something like that?

FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #27296 is a reply to message #27097] Sat, 03 September 2005 15:19 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
Another approach to this might be to add a Google Sitemap:

https://www.google.com/webmasters/sitemaps

I've no idea how much work it might be to implement this to be generated automatically, but if it could be done then Google would be able to distinguish new/updated threads from old threads and could be advised there is no need to crawl old threads.



Simon Child
Re: recognise google (/other) rather than having zillions of anon users [message #27300 is a reply to message #27296] Sat, 03 September 2005 19:16 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Looks like it is a beta portion of Google's functionality and you still need to submit it manually. It really won't address anon user "problem" (which I personally don't consider to be a problem), resulting of Google search bots not propogating cookies.

FUDforum Core Developer
Re: recognise google (/other) rather than having zillions of anon users [message #27306 is a reply to message #27300] Sat, 03 September 2005 20:11 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
Ilia wrote on Sat, 03 September 2005 20:16

you still need to submit it manually.


You can set that up as a cron job.

Quote:

It really won't address anon user "problem" (which I personally don't consider to be a problem), resulting of Google search bots not propogating cookies.


No, I was thinking more of those who report a heavy use of bandwidth as googlebot repeatedly crawls their forums.

Just an idea. I have recently installed it (generated automatically, resubmitted by a cron job) on a TYPO3 based website that I run, and it seemed like an interesting idea though I have no stats to prove its benefit.


Simon Child
Re: recognise google (/other) rather than having zillions of anon users [message #27328 is a reply to message #27080] Sun, 04 September 2005 19:04 Go to previous messageGo to next message
kenjb is currently offline  kenjb   United States
Messages: 67
Registered: September 2004
Karma: 0
Member
My final update to the google intrusion Wink or whatever you prefer to call it.

Google stopped parsing my site at 11.99 GB total bandwidth. My web site hosting company had a script that automatically detected the high bandwidth usage that shut off my domain web page access along with sftp, ftp, and telnet access as well. Then after I called them on it they claimed it was a mistake and turn it all back on for me, that second.

So now at least I know how much bandwidth it takes to parse every known combination possible on my web sites forum. Your mileage may vary.

Now I have become.com going through the forums. Now become.com will be disallowed in a heartbeat if they try the google parsing trick.

Trying to keep it on the light side once a while. Smile


kenjb
Re: recognise google (/other) rather than having zillions of anon users [message #31536 is a reply to message #27328] Thu, 04 May 2006 14:30 Go to previous message
matthieu_phpmv is currently offline  matthieu_phpmv   France
Messages: 44
Registered: November 2004
Karma: 0
Member
if you have problem with googlebot or others bots that take too much bandwich, try this topic, this could help you
http://fudforum.org/forum/index.php?t=msg&goto=31535&#msg_31535
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: registered users readable forums/categories
Next Topic: QuickReply
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat May 18 19:43:29 GMT 2024

Total time taken to generate the page: 0.04351 seconds