FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Parsing mbox files with Windows Php
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Parsing mbox files with Windows Php [message #181682 is a reply to message #181678] Fri, 24 May 2013 23:40 Go to previous messageGo to previous message
Chuck Anderson is currently offline  Chuck Anderson
Messages: 63
Registered: September 2010
Karma:
Member
Jerry Stuckle wrote:
> On 5/24/2013 5:57 PM, Chuck Anderson wrote:
>>
>> I have been slowly building a Php/MySQL based IMAP email client. The
>> Php IMAP functions are well documented and easy to use. I store
>> messages in a MySQL database (attachments in the file system) for ease
>> of organization, maintenance, backup and searching. I like it enough
>> that I want to add my existing archive of email in Thunderbird. To do
>> so, I need to parse mbox files and extract message headers, parts, and
>> attachments. Thankfully, the Php IMAP functions can be used to open and
>> parse an mbox file (or even a single .eml file) as well as opening a
>> stream to the server.
>>
>> Using this functionality I can write a script to run on my WAMP
>> development machine that reads the Thunderbird folder structure, parses
>> the mbox files and saves individual messages along with their
>> folder/subfolder path in the Thunderbird folder hierarchy. It should be
>> as easy as pointing it to the top of the folder structure and letting it
>> do all the work from there.
>>
>> Unfortunately, it appears that the Windows Php binary is unable to
>> connect to an mbox file, so to make my job easy, I would have to upload
>> the entire folder structure (it is about 200MBs) to my shared host and
>> process it there. I would much rather "toy around" with this process on
>> my Windows development machine and not on the shared host.
>>
>> There is a Php bug filed for this, but it was determined that the
>> "underlying c-client function is unable to open a file."
>> https://bugs.php.net/bug.php?id=39880 - closed as "not a bug."
>>
>> $mbox = imap_open('pathto/mboxfile', '', '') // works on *nix, but not
>> on Windows.
>> - Warning: imap_open(): Couldn't open stream mboxfiles/Inbox ....
>> - Notice: Unknown: Can't open mailbox mboxfiles/Inbox: no such mailbox
>> (errflg=2) in Unknown on line 0.
>> (This second error is the one coming from the underlying c-client
>> function.)
>>
>> I believe this is a dead end but want to check if anyone has happened to
>> figure out a way to make this work in Windows - or if you know of a
>> separate mbox file parser that would be fairly simple to integrate with
>> my current Php IMAP based scripts - dependent on the output of
>> imap_fetchstructure, imap_headerinfo, imap_fetchbody(parts).
>>
>> If not, I will probably build a form that lets me select individual
>> (smaller) sections of the folder hierarchy to process individually
>> (tedious and prone to error).
>>
>
> Are you sure Thunderbird's files are in imap format?

I don't have the spec, but I have read as much. And the imap functions
do parse them nicely - on *nix.

> I didn't think they were - I thought they were in some
> Thunderbird-specific format.

Thunderbird (I am still in version 2) uses the mbox format. (I believe
that means that the beginning of a new email is denoted by a blank line
followed by "From - ...."). The rest is the usual email format, headers
until a double line feed denoting the beginning of the body. (Body
lines that happen to begin with "From" have to have "From" escaped in
those files.)

When saving a single email to a .eml file, before you can not parse it
with imap functions you must insert a dummy "From - " line (separate
from the usual From: header, this is "From - date") to the beginning of
the headers. Thunderbird places those into the mbox (multiple emails) file.

>
> But also, if you're opening 'pathto/mboxfile', why is it complaining
> about 'mboxfiles...'?

Sorry. I was using generic terms and switched terms (sloppy). The
error says it can not open the folder/file I specified. When I run the
exact same script on my shared Linux server (after uploading a sample
Thunderbird mail file), the imap functions produce output exactly like
they would if reading from a stream on the imap server.

--
*****************************
Chuck Anderson • Boulder, CO
http://cycletourist.com
Turn Off, Tune Out, Drop In
*****************************
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: why php echo does not show up in HTML?
Next Topic: Exhaustive memory allocation using arrays
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sun Nov 24 06:32:07 GMT 2024

Total time taken to generate the page: 0.03345 seconds