FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum Development » Converters » Yahoo Groups -> mbox -> maillist.php -> missing posts
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Yahoo Groups -> mbox -> maillist.php -> missing posts [message #15672] Tue, 30 December 2003 20:34 Go to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
Looking at converting a yahoogroups group to FUD, so trying to transfer the existing archive.

I've collected the archive from Yahoo using this script:

http://www.lpthe.jussieu.fr/~zeitlin/yahoo2mbox.html

Now it is in mbox format, and it appears to be a valid mbox format e.g. if I view it with Elm it shows the correct number of messages and they are readable.

I load it into FUD 2.5.2 using:

cat archive | formail -s /path/to/php /path/to/maillist.php 1

Using 'Slow Reply Match' to recreate the threads, and subject mangling to remove the [listname] and body mangling to remove some of the advertising dross, it all looks good.

But it only loads about half of the messages, and the rest go missing for no obvious reason. It's not just dying early, it is missing messages out from early on in the archive. I've examined the archive file and can see no clues as to why some messages are imported and others are not. Some are email postings and some are posted from website. A user might have some messages imported whilst others by the same user are not.

I've experimented tidying up the archive file manually (removing adverts and wrapped Received lines from the first few messages, that sort of thing). I tried reordering the first few messages - one which loaded fine when first no longer loaded when moved to second in the archive.

I've tried feeding the archive through formail:

cat archive | formail > archive2

and it appears to quote lots (all?) the From_ lines, except the first line, whereas I thought it was supposed to quote only bogus From_ lines? So perhaps there is a problem with my archive and so formail is not breaking it up properly? (but note that Elm can read it properly).

I've found some fragments of text in messages/msg_1 but can't see how to interpret that - maybe there are clues in there?

Anyone got any clues for me?

Thanks


Simon Child
Re: Yahoo Groups -> mbox -> maillist.php -> missing posts [message #15673 is a reply to message #15672] Tue, 30 December 2003 21:52 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
FUDforum's import script can only handle 1 message at a time. So unless you modify the script to handle mbox format you need something else to do it and pipe the messages one at a time to the script.

FUDforum Core Developer
Re: Yahoo Groups -> mbox -> maillist.php -> missing posts [message #15678 is a reply to message #15673] Tue, 30 December 2003 22:27 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
Ilia wrote on Wed, 31 December 2003 03:52

FUDforum's import script can only handle 1 message at a time. So unless you modify the script to handle mbox format you need something else to do it and pipe the messages one at a time to the script.


That's what 'formail -s' does - it breaks up an mbox into single messages and sends them to the script one at a time. From the man page:

"The input will be split up into separate mail messages, and piped into a program one by one (a new program is started for every part)".

So that part is working (I got 308 messages in by running that command once, but another 296 didn't make it).

I got that idea from this message:

http://fud.prohost.org/forum/index.php?t=msg&goto=9774#msg_9774

My problem is that not all the messages are getting converted into FUD. I'm not clear whether this is due to my archive format (seems alright) or my use of formail (seems alright, and has worked for others) or something else.

One thought is whether some messages are being dropped since the script is sending them too fast - will FUD cope if there are several instances of maillist.php running at the same time, as there may well be since formail will start up a new instance of it for each message that it extracts from the mbox?

Another possiblity is that the archive appears alright (e.g. to my eye, and to Elm) but in fact the message boundaries are unclear and formail is struggling with them.

Thanks for your interest - one more question - is there some documentation to tell me about the file appearing in messages/msg_1 - will that give me any clues?







Simon Child
Re: Yahoo Groups -> mbox -> maillist.php -> missing posts [message #15680 is a reply to message #15678] Tue, 30 December 2003 22:38 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
There should not be a problem with >1 instance of the script running at one time. However, I would not recommend running more instances then you have CPUs, since doing so would be performance inhibitive.

If you are importing messages through multiple processes make sure that they are imported sequentially (from oldest to newest) otherwise the message association maybe broken.

If you can isolate a few messages that cannot be imported, feel free to send those to me and I'll try to determine why are they not being imported.


FUDforum Core Developer
Re: Yahoo Groups -> mbox -> maillist.php -> missing posts [message #15681 is a reply to message #15680] Wed, 31 December 2003 00:28 Go to previous messageGo to next message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
Ilia wrote on Wed, 31 December 2003 04:38

If you can isolate a few messages that cannot be imported, feel free to send those to me and I'll try to determine why are they not being imported.


Thanks, I'll get back to you on that offer if I remain stuck. But I've thought of a couple more things I can try first:

  • feed in the suspect messages singly and see if they are accepted
  • Using this: http://batleth.sapienti-sat.org/projects/mb2md/ I have split the mbox into single messages, and so if I can work out the shell scripting I can feed genuine single messages to maillist.php

I can't do this one message at a time by hand - I'm trying to migrate two lists - one list has >600 posts and the other has > 11,000 Rolling Eyes


Simon Child
Re: Yahoo Groups -> mbox -> maillist.php -> missing posts [message #21573 is a reply to message #15681] Sat, 04 December 2004 02:04 Go to previous message
srchild is currently offline  srchild   United Kingdom
Messages: 88
Registered: December 2003
Location: UK
Karma: 1
Member
srchild wrote on Wed, 31 December 2003 06:28

I've thought of a couple more things I can try first:

  • feed in the suspect messages singly and see if they are accepted
  • Using this: http://batleth.sapienti-sat.org/projects/mb2md/ I have split the mbox into single messages, and so if I can work out the shell scripting I can feed genuine single messages to maillist.php



Almost 12 months later I have returned to this project, and am now succeeding Smile

So I thought I'd post the success details in case others are trying this (Migrate Yahoogroup -> FUD).

I got the archive from Yahoo using Yahoo2mbox http://www.tt-solutions.com/en/products/yahoo2mbox/

I tidied it up a bit to remove some of the adverts etc (some done by hand, some done using regex search replace in vim)

I converted it to maildir format using mb2md.pl http://batleth.sapienti-sat.org/projects/mb2md/

I then fed it to maillist.php, using a sleep so that it could keep up. This seemed to be the key point. with a sleep of 0.5 (FreeBSD supports sleep for fractions of a second) I still lost about 50% of messages, but with a sleep of one second between messages I didn't lose a single one.

for i in /path/to/maildir/cur/*
do
cat $i |/usr/local/bin/php /path/to/FUDforum/scripts/maillist.php 4
sleep 1
done



Simon Child
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Where to start?
Next Topic: Conversion phpBB 2.0.5 => FUDforum problems
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Fri Apr 19 08:22:18 GMT 2024

Total time taken to generate the page: 0.02254 seconds