FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » General » General Chit Chat » FUD Question  () 1 Vote
Show: Today's Messages :: Unread Messages :: Show Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
FUD Question [message #657] Thu, 21 February 2002 11:22 Go to next message
B@rt is currently offline  B@rt   Netherlands
Messages: 6
Registered: February 2002
Location: Hilversum, The Netherland...
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user

Hi all,

I couldn't seem to find a place for FUD user questions, so I'm hoping someone is willing to help me here.. Here's my question:

At this moment, we are running phorumon a busy community site (100k pageviews/day; > 100k messages in the database). I've been looking for a new forum server and FUD definately has all the features that I am looking for. BUT....

The fact that so many items are stored on the filesystem instead of in the database (message bodies, user settings) scares the heck out of me. For example, it seems like each thread gets stored in /data/messages. What happens when you have more than, say, 1000 threads? Are they all stored in the same subdirectory? Or does FUD create 'buckets' like for example message 1-500, 501-1000 etc. I'm afraid that I'm not sure if our operating system (FreeBSD) can handle that.

Is there something I've missed while installing? Maybe it's possible to force everything to be stored in the database?

Does anyone else here have experience with running FUD on a large site?

Thx,

B@rt


Cheers,<br><br>B@rt<br><br>--<br>Web Monkey, Project Manager, 3D Addict
Re: FUD Question [message #658 is a reply to message #657] Thu, 21 February 2002 11:41 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
B@rt wrote on Thu, 21 February 2002 11:22 AM


The fact that so many items are stored on the filesystem instead of in the database (message bodies, user settings) scares the heck out of me. For example, it seems like each thread gets stored in /data/messages. What happens when you have more than, say, 1000 threads? Are they all stored in the same subdirectory? Or does FUD create 'buckets' like for example message 1-500, 501-1000 etc. I'm afraid that I'm not sure if our operating system (FreeBSD) can handle that.



FUDforum stores each thread in a file of its own and has 1 file of private messages. This is quite fast, in 90% of the cases it will be faster then MySQL because there is less overhead. On a rather old test machine running Linux we've tested to over 100,000+ individual threads and it was VERY fast. For message reading we rely on fast sequential seeks, which are heavily optimized for in any operating system.
If you are using an old file system at over 100,000 threads you may see very small slowdowns, something like 100 microseconds extra per fopen call (there are 1,000,000 microseconds within a second). Also, for common threads you'll notice that the more times they are seen the faster they become, the reason being is because UNIX kernels cache the file system using the available RAM, so a commonly opened file is likely to be in this RAM cache affectively making reading from that file instant.
In addition we create database views for threads, which allows you to scale to virtually limitless # of threads without taking a performance hit unlike in any other forum software today.

I am currently running FUDforum on a site with just under 50,000 messages and 1200+ threads. You can take a look for yourself at that forum at: http://www.mediaminer.org/forum/


FUDforum Core Developer
Re: FUD Question [message #659 is a reply to message #658] Thu, 21 February 2002 11:48 Go to previous messageGo to next message
B@rt is currently offline  B@rt   Netherlands
Messages: 6
Registered: February 2002
Location: Hilversum, The Netherland...
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user

Hi Protoss,

thanks for your answer! I tried your forum and it's quite fast indeed. Still I'm a bit worried: Am I correct in assuming then that on a system with 100,000 threads there will be 100,000 threads stored in *one* directory? That sounds like a 'dirty' solution to me..

B@rt


Cheers,<br><br>B@rt<br><br>--<br>Web Monkey, Project Manager, 3D Addict
Re: FUD Question [message #660 is a reply to message #659] Thu, 21 February 2002 12:03 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
B@rt wrote on Thu, 21 February 2002 11:48 AM

Hi Protoss,

thanks for your answer! I tried your forum and it's quite fast indeed. Still I'm a bit worried: Am I correct in assuming then that on a system with 100,000 threads there will be 100,000 threads stored in *one* directory? That sounds like a 'dirty' solution to me..

B@rt


Yes, all of threads will be stored in a single directory. This won't be a problem, unless you intend to run ls -l in that directory all the time Razz

It may seem "dirty" but consider the alternative SQL approach. 1st there will be a limit to a length of the message, because you'll have to define the maximum length of the field. MySQL does not cache the TEXT/BLOB and like fields because it would be a waste of memory, and since all the data is kept in one file kernel will not cache that file unless you have a large amount of available RAM (won't happen in most cases).
When doing a MySQL insert MySQL always sets a lock on the table. So, since all the messages are stored in one table, if you have many people posting at the same time they'll have to wait for all the previous inserts to go through. On a very busy forum that may cause serious delayes during message posting.
Another problem is that when you get data from MySQL there is a lot of in-between overhead. MySQL needs to get the data and allocate memory for it, then php's mysql module needs to get the data parse it and allocate memory for it. Then when you fetch the data for yourself another copy of the data is made this time for the php script. In the end you have 3 copies of the same data in memory in addition to various php wrappers around the data.
This as you can imagine takes quite a bit of memory & cpu.
When working with files there is no MySQL step, the data is fetched directly into php, and absolutely no parsing of the data is done real time.
Another problem you may encounter is that a file system (at least most old ones) had a limitation that allowed a file to be no larger then 2 gigabytes. On a large forum that would restrict growth and cause all kinds of problem.

I have a question, which file system do you use? I myself have a few FreeBSD boxes with a stock file system which came with the FreeBSD 3.3 release, albeit it is slower the Linux's ReiserFS when there are lots & lots of files but it is still quite fast. I have a directory structure for storing images with ~10 million images split across 1024 directories. I've had absolutely no problems with speed, while reading files from those directories.

The bottom line to my rant Wink, is that it is MOST UNLIKELY that you will suffer any performance loss over using FUDforum's way to store messages.


FUDforum Core Developer
Re: FUD Question [message #661 is a reply to message #660] Thu, 21 February 2002 12:11 Go to previous messageGo to next message
B@rt is currently offline  B@rt   Netherlands
Messages: 6
Registered: February 2002
Location: Hilversum, The Netherland...
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user

All right - I'll give it a shot! Razz Let's see how I can integrate this baby into the new community site I'm building at the moment.

Cheers, and thanks again for taking the time for those answers!

B@rt


Cheers,<br><br>B@rt<br><br>--<br>Web Monkey, Project Manager, 3D Addict
Re: FUD Question [message #662 is a reply to message #661] Thu, 21 February 2002 12:12 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
B@rt wrote on Thu, 21 February 2002 12:11 PM

All right - I'll give it a shot! Razz Let's see how I can integrate this baby into the new community site I'm building at the moment.

Cheers, and thanks again for taking the time for those answers!

B@rt


NP Smile



FUDforum Core Developer
Re: FUD Question [message #668 is a reply to message #657] Thu, 21 February 2002 21:27 Go to previous messageGo to next message
hackie is currently offline  hackie   Canada
Messages: 177
Registered: January 2002
Karma: 0
Senior Member
Core Developer
add to buddy list
ignore all messages by this user

B@rt wrote on Fri, 22 February 2002 1:22 AM

Hi all,

I couldn't seem to find a place for FUD user questions, so I'm hoping someone is willing to help me here.. Here's my question:

At this moment, we are running phorumon a busy community site (100k pageviews/day; > 100k messages in the database). I've been looking for a new forum server and FUD definately has all the features that I am looking for. BUT....

The fact that so many items are stored on the filesystem instead of in the database (message bodies, user settings) scares the heck out of me. For example, it seems like each thread gets stored in /data/messages. What happens when you have more than, say, 1000 threads? Are they all stored in the same subdirectory? Or does FUD create 'buckets' like for example message 1-500, 501-1000 etc. I'm afraid that I'm not sure if our operating system (FreeBSD) can handle that.

Is there something I've missed while installing? Maybe it's possible to force everything to be stored in the database?

Does anyone else here have experience with running FUD on a large site?

Thx,

B@rt



Well, we think it is better to store the message bodies on the file system for a rather large amount of reasons, prot' there listed some of them, there are more.

Consider, first of all there is of course file system caching of files, while MySQL not chaching BLOBS/TEXT. It gets more complicated, consider the overhead of storing such large chunks of data in the database, well, to retrieve it you would have to transfer all this data over a socket, sure you can make it faster by using UNIX sockets, but still, that is a huge amount of pointless overhead, as opposed to the file system!
There is a disadvantage to the file system storing of course, that is, you can't run the forum web server and store the message bodies on a different machine (unless you use NFS of course), but we think it's a fine trade off.

MOREOVER! Early version of FUDforum did use DB for storing bodies, but we converted it for performance considerations to the FS code you see today. It took us about 15MIN Smile... So, if you want to convert FUDforum to use the DB to store bodies, it would take you all of about.. oh.. 15min and a small script to read them back in.....

In addition the only things stored in those files are message bodies, the messages themselves are of course in the database.


cc intelligence.c -o intelligence
$ ./intelligence
Segmentation fault
Re: FUD Question [message #674 is a reply to message #657] Fri, 22 February 2002 00:09 Go to previous messageGo to next message
ironstorm is currently offline  ironstorm   Canada
Messages: 89
Registered: February 2002
Location: Toronto, Ontario, Canada
Karma: 0
Member
add to buddy list
ignore all messages by this user

My two cents, FUD is very well designed with repect to the storage of threads and such...

It's a very good idea to stay away from storing large blocks of text in database fields for tables with a lot of rows... Most DB servers have no choice but to dump memo/text/blob fields into a big file...

When you hit 10,000 threads there is no way that all of that data is going to fit into the DB's memory (at least until we get 64-bit processors to address more then 4GB Wink )... so that means that the DB server must open up and find your text body within that very large file...

IMO you'll find that most OS's are much better at dealing with 10,000 small files adding up to 3GB then 1 big file adding up to 3GB, especially when you have to rewrite the file when you are "compacting" all the rows that have been deleted (typically a DB will just mark a record's memo space as delete and wait till compact is issued to rewrite the memo file cause of performace hit on rewriting large files).

There are other draw backs too, like putting all your eggs in one basket, I don't know how well you can recover a 3GB memo/blob/text file if you find you have a couple of bad sectors on your HDD (where seperate files just means you lose some threads). Grin

Re: FUD Question [message #2172 is a reply to message #657] Fri, 03 May 2002 11:46 Go to previous messageGo to next message
snafu is currently offline  snafu   United States
Messages: 27
Registered: April 2002
Karma: 0
Junior Member
add to buddy list
ignore all messages by this user
If switching between full and partial database storage is a small task, maybe there should be an admin utility that converts between the two?

Being able to use "full database mode" might conceivably be handy, in spite of reduced performance. Especially in environments where data and databases are shared between many applications and machines.

What do you think?
Re: FUD Question [message #2173 is a reply to message #2172] Fri, 03 May 2002 12:03 Go to previous message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
add to buddy list
ignore all messages by this user
To "truly" do full database mode all data will need to be stored in database this includes file attachments, avatars, smilies etc... No bulletin board does this, since that would be extremely silly to store binary data in the database. Just dumping the message bodies into DB like VBulletin and phpBB2 do won't accomplish anything beyond making the forum slower. Adding such functionality would simply be a hack to make FUDforum implement bad functionality of other BB systems. Which is why I resist adding such functionality.

In both UNIX and Windows using NFS (UNIX) or Samba (Windows) data on a drive can be shared by many machines, it is a common practice to share data like this for other applications. So, there is nothing to stop anyone from sharing the data across a network or even Internet.

You should also realize that THE only major datablock stored on disk are the message bodies all other info (presumably user settings, thread info, etc...) is in MySQL database, so it is already VERY easily accessible.

If you open FUD2 code, you can reasonably easily make it write the message bodies to the database rather then disk. If you are familiar with PHP it should take no more then 1 hour or so for you to do that.


FUDforum Core Developer
Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: Bug: Freshmeat screwed up
Next Topic: Meaning of FUD
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Oct 21 08:26:44 EDT 2017

Total time taken to generate the page: 0.00784 seconds