fgetcsv -- No error reporting? [message #172810] |
Thu, 03 March 2011 16:30 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
Hi all,
Trying to process a fairly large csv file, and it's bombing out early on me.. This quick test script describes the problem:
# cat test.php
<?php
ini_set("display_errors", true);
error_reporting(E_ALL | E_STRICT);
$file = "my.csv";
$fHandle = fopen($file, "r");
$rowNum = 0;
while (fgetcsv($fHandle)) ++$rowNum;
printf ("Lines: %d\nLastRow: %d\n", count(file($file)), $rowNum);
# php test.php
Lines: 329360
LastRow: 328141
There are no multi-line entries in the file, so it seems to be legitimately returning false for some reason about 1200 lines early. A visual inspection of the file around line 328,141 doesn't reveal any errors, and no errors are being triggered from PHP/fgetcsv.
Any ideas on how to diagnose what's going on here?
Thanks,
Matt
|
|
|
Re: fgetcsv -- No error reporting? [message #172811 is a reply to message #172810] |
Thu, 03 March 2011 17:09 |
alvaro.NOSPAMTHANX
Messages: 277 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
El 03/03/2011 17:30, matt escribió/wrote:
> Hi all,
>
> Trying to process a fairly large csv file, and it's bombing out early on me. This quick test script describes the problem:
>
> # cat test.php
> <?php
>
> ini_set("display_errors", true);
> error_reporting(E_ALL | E_STRICT);
>
> $file = "my.csv";
> $fHandle = fopen($file, "r");
>
> $rowNum = 0;
> while (fgetcsv($fHandle)) ++$rowNum;
>
> printf ("Lines: %d\nLastRow: %d\n", count(file($file)), $rowNum);
>
> # php test.php
> Lines: 329360
> LastRow: 328141
>
>
> There are no multi-line entries in the file, so it seems to be legitimately returning false for some reason about 1200 lines early. A visual inspection of the file around line 328,141 doesn't reveal any errors, and no errors are being triggered from PHP/fgetcsv.
>
> Any ideas on how to diagnose what's going on here?
Since a record can legitimately expand over more than one line, you
can't just load the file into an editor and go to line X. I'm not sure
about how fgetcsv() works but it's possible that calling ftell($handle)
allows you to keep track of the file position where each loop starts
reading from. You can then fseek() and fread() to print the file
fragment for manual inspection.
(I suppose that you already thought about using var_dump() to print/log
the output of successful calls and identify the first broken record.)
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
|
|
|
Re: fgetcsv -- No error reporting? [message #172812 is a reply to message #172811] |
Thu, 03 March 2011 17:44 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Thursday, March 3, 2011 12:09:15 PM UTC-5, Álvaro G. Vicario wrote:
> El 03/03/2011 17:30, matt escribi�/wrote:
>> Hi all,
>>
>> Trying to process a fairly large csv file, and it's bombing out early on me. This quick test script describes the problem:
>>
>> # cat test.php
>> <?php
>>
>> ini_set("display_errors", true);
>> error_reporting(E_ALL | E_STRICT);
>>
>> $file = "my.csv";
>> $fHandle = fopen($file, "r");
>>
>> $rowNum = 0;
>> while (fgetcsv($fHandle)) ++$rowNum;
>>
>> printf ("Lines: %d\nLastRow: %d\n", count(file($file)), $rowNum);
>>
>> # php test.php
>> Lines: 329360
>> LastRow: 328141
>>
>>
>> There are no multi-line entries in the file, so it seems to be legitimately returning false for some reason about 1200 lines early. A visual inspection of the file around line 328,141 doesn't reveal any errors, and no errors are being triggered from PHP/fgetcsv.
>>
>> Any ideas on how to diagnose what's going on here?
>
>
> Since a record can legitimately expand over more than one line, you
> can't just load the file into an editor and go to line X. I'm not sure
> about how fgetcsv() works but it's possible that calling ftell($handle)
> allows you to keep track of the file position where each loop starts
> reading from. You can then fseek() and fread() to print the file
> fragment for manual inspection.
No, I understand that. I made a faulty assumption that I had no multi-line data (more on that later). The last field of each record is a year, and a regex test showed that every line of the file did indeed end with /\,\d{4}/.
> (I suppose that you already thought about using var_dump() to print/log
> the output of successful calls and identify the first broken record.)
Yes, I did--and got the data from the last line of the file as the last successful record!
Finally, I thought of stepping through with two file handles, one being read by fgets and one by fgetcsv and doing a line-by-line comparison. Culprit turned out to be a number of unmatched double quotes through the file, causing fgetcsv to pull several records into single fields mid-document.
I've forwarded the RFC to the guy who is sending me the CSV files :)
Thanks for your suggestions.
|
|
|
Re: fgetcsv -- No error reporting? [message #172815 is a reply to message #172812] |
Thu, 03 March 2011 19:44 |
Peter H. Coffin
Messages: 245 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On Thu, 3 Mar 2011 09:44:46 -0800 (PST), matt wrote:
> Finally, I thought of stepping through with two file handles, one
> being read by fgets and one by fgetcsv and doing a line-by-line
> comparison. Culprit turned out to be a number of unmatched double
> quotes through the file, causing fgetcsv to pull several records into
> single fields mid-document.
>
> I've forwarded the RFC to the guy who is sending me the CSV files :)
Best of luck with that. 99.5% of the CSV files I've ever dealt with were
created with stuff that was completely out of the control of the user.
Hell, 80% of them were from Excel alone.
--
40. I will be neither chivalrous nor sporting. If I have an unstoppable
superweapon, I will use it as early and as often as possible instead
of keeping it in reserve.
--Peter Anspach's list of things to do as an Evil Overlord
|
|
|
Re: fgetcsv -- No error reporting? [message #172816 is a reply to message #172815] |
Thu, 03 March 2011 20:18 |
matt[1]
Messages: 40 Registered: September 2010
Karma: 0
|
Member |
|
|
On Thursday, March 3, 2011 2:44:56 PM UTC-5, Peter H. Coffin wrote:
> On Thu, 3 Mar 2011 09:44:46 -0800 (PST), matt wrote:
>
>> Finally, I thought of stepping through with two file handles, one
>> being read by fgets and one by fgetcsv and doing a line-by-line
>> comparison. Culprit turned out to be a number of unmatched double
>> quotes through the file, causing fgetcsv to pull several records into
>> single fields mid-document.
>>
>> I've forwarded the RFC to the guy who is sending me the CSV files :)
>
> Best of luck with that. 99.5% of the CSV files I've ever dealt with were
> created with stuff that was completely out of the control of the user.
> Hell, 80% of them were from Excel alone.
I'm dealing with PeopleSoft over here...
|
|
|