Re: reading files with accents in the filename from PHP [message #183116 is a reply to message #183107] |
Wed, 09 October 2013 20:54 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma:
|
Senior Member |
|
|
Thomas Mlynarczyk wrote:
> Erwin Moller schrieb:
>> How can PHP open files on the local filesystem that contain certain
>> characters, like umlauts, accents, etc?
>
> $path = __DIR__ . '\Eugène.txt';
> var_dump( PHP_VERSION, file_exists( $path ) );
>
> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
> (="Windows") encoding.
There is no “ANSI encoding“. Usually “ANSI encoding” means Windows-1252.
[0] It would be either coincidence or strange if this worked, because FAT32
uses the “OEM character set”, i. e. one of the various IBM code pages, 437
for English, and NTFS uses UTF-16BE [1]. The letter “è” has Windows-1252
code 0xE6, IBM437/IBM850 code 0x8A, and Unicode code point U+00E8 [2]
(encoded in UTF-16 as 0xE8 [3]). It follows that you cannot mean
Windows-1252 by “ANSI”.
[0] <http://en.wikipedia.org/wiki/Windows-1252>
[1] <http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx>
[2]
<http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)#Comparison_table>
[3] <http://rishida.net/tools/conversion/>
BTW, you want to upgrade soon:
< http://blogs.technet.com/b/security/archive/2013/08/15/the-risk-of-running- windows-xp-after-support-ends.aspx>
> Doesn't work if stored in UTF8.
The file, or the string?
> So I suspect it's an encoding issue: the accented character "è" is stored
> as \xE8 in the file system,
Yes, it is, but only with NTFS.
> but if your script is UTF8, then your $path will contain \xC3\xA8 instead.
Which cannot work with NTFS.
> With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt" it works when the
> script is UTF8.
But only with NTFS and compatible filesystems.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
|
|
|