Re: Parsing mbox files with Windows Php [message #181707 is a reply to message #181705] |
Sat, 25 May 2013 22:39 |
Chuck Anderson
Messages: 63 Registered: September 2010
Karma:
|
Member |
|
|
Peter H. Coffin wrote:
> On Fri, 24 May 2013 15:57:14 -0600, Chuck Anderson wrote:
>
>> I have been slowly building a Php/MySQL based IMAP email client. The
>> Php IMAP functions are well documented and easy to use. I store
>> messages in a MySQL database (attachments in the file system) for ease
>> of organization, maintenance, backup and searching. I like it enough
>> that I want to add my existing archive of email in Thunderbird. To do
>> so, I need to parse mbox files and extract message headers, parts, and
>> attachments. Thankfully, the Php IMAP functions can be used to open and
>> parse an mbox file (or even a single .eml file) as well as opening a
>> stream to the server.
>>
>> Using this functionality I can write a script to run on my WAMP
>> development machine that reads the Thunderbird folder structure, parses
>> the mbox files and saves individual messages along with their
>> folder/subfolder path in the Thunderbird folder hierarchy. It should be
>> as easy as pointing it to the top of the folder structure and letting it
>> do all the work from there.
>>
>> Unfortunately, it appears that the Windows Php binary is unable to
>> connect to an mbox file, so to make my job easy, I would have to upload
>> the entire folder structure (it is about 200MBs) to my shared host and
>> process it there. I would much rather "toy around" with this process on
>> my Windows development machine and not on the shared host.
>>
>> There is a Php bug filed for this, but it was determined that the
>> "underlying c-client function is unable to open a file."
>> https://bugs.php.net/bug.php?id=39880 - closed as "not a bug."
>>
>> $mbox = imap_open('pathto/mboxfile', '', '') // works on *nix, but not
>> on Windows.
>> - Notice: Unknown: Can't open mailbox mboxfiles/Inbox: no such mailbox
>> (errflg=2) in Unknown on line 0.
>> (This second error is the one coming from the underlying c-client
>> function.)
>>
>
> Okay, step one: quit munging stuff around and give us the EXACT code,
> the EXACT contents of variables involved, and the EXACT error messages.
> You're not revealing national secrets by posting paths to filenames, and
> what the problem is may be in what you're changing to be more general.
>
I used generic paths and filenames because I had tried several variations.
I have learned that imap_open only works on a file - on a *nix web
server - if the path to the mbox file is relative to $HOME. I have not
found any official documentation, just forum posts saying so, and ... it
is the only way I have been able to make it work.
So, imap_open works for me (on the remote Linux host) if, and only if, I
use:
imap_open ('public_html/mboxfiles/Trash', '', '')
// Trash is an mbox file I uploaded directly from my Thunderbird Profile
On Windows
c:/localhost is the document root
c:/localhost/imap is where the Php scripts are located.
c:/localhost/imap/mboxfiles is where the mbox file "Trash" is located
I have tried:
imap_open('localhost/imap/mboxfiles/Trash', '', ''); // equivalent of
being relative to $HOME on *nix
imap_open('/imap/mboxfiles/Trash', '', ''); // absolute path from
document root
imap_open('mboxfiles/Trash', '', ''); // relative path
imap_open('c:/localhost/imap/mboxfiles/Trash', '', ''); // real path on disk
On Windows I always get these two errors (the file path changes
accordingly):
- Warning: imap_open(): Couldn't open stream
localhost/utilities/imap/eml/Trash in
localhost\utilities\imap\imap_save_mbox_file.php on line 127
- Notice: Unknown: Can't open mailbox
localhost/utilities/imap/eml/Trash: no such mailbox (errflg=2) in
Unknown on line 0
I have read that the second error is coming from the c-client (errflg=2
comes from there).
>> I believe this is a dead end but want to check if anyone has happened to
>> figure out a way to make this work in Windows - or if you know of a
>> separate mbox file parser that would be fairly simple to integrate with
>> my current Php IMAP based scripts - dependent on the output of
>> imap_fetchstructure, imap_headerinfo, imap_fetchbody(parts).
>>
>> If not, I will probably build a form that lets me select individual
>> (smaller) sections of the folder hierarchy to process individually
>> (tedious and prone to error).
>>
>
> Of course, by the time you're done with that, the 200MB transfer would
> have been LONG finished and you'd have your goal accomplished on the
> hosted server.... (:
>
I know ô¿Ô¬ .... It is not the upload that concerns me. It is the
resource usage when testing it on the entire structure all at once
(certainly more than one time). Perhaps it is no big deal. I simply do
not know.
Also, if done on the server I will need to upload files, process them to
a database, download a database dump, and then load that to the local
database. For simplicity alone, I would rather to do it locally with one
step - process to database. .... It is always easier to test and debug
on a local development machine.
I hoped someone else had cracked this nut, but I'm beginning to believe
that it is not crackable, so .... I will probably upload the entire
Thunderbird Mail folder structure to the remote host and try running my
script during low traffic hours. Once I verify it has worked 100%
(archived everything and saved the folder structure properly), I will
not have a need to do it again.
--
*****************************
Chuck Anderson • Boulder, CO
http://cycletourist.com
Turn Off, Tune Out, Drop In
*****************************
|
|
|