FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » reading files with accents in the filename from PHP
Show: Today's Messages :: Unread Messages :: Show Polls :: Message Navigator
| Subscribe to topic | Bookmark topic 
Switch to threaded view of this topic Create a new topic Submit Reply
reading files with accents in the filename from PHP [message #183106] Wed, 09 October 2013 06:33 Go to next message
Erwin Moller is currently offline  Erwin Moller
Messages: 228
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Hello comp.lang.php,

How can PHP open files on the local filesystem that contain certain
characters, like umlauts, accents, etc?

I am currently developing on Win2008 with PHP 5.5.1 (CGI/FastCGI).
So the underlying filesystem is NTFS.

Files with names like the following are inaccessible:
ierländer.pdf
Eugène.pdf
etc.
All files without these characters ARE readable.

I test with:
$bIsReadable = ((file_exists($path)) && (is_readable($path)));
and these kind of filenames ALWAYS return false.

The files with the troublesome names open fine with a PDF reader.)

(It wouldn't be my own choice to store files under such names, but
that's how it is now.)

Background: Currently I am working on a Full Text Index on a whole bunch
of PDF files, and I need to access them from PHP in the process.

Does anybody know how to fix this problem?
How do I open files with accents in their filename?

Regards,
Erwin Moller

PS: I found this, which might be relevant:
http://evertpot.com/filesystem-encoding-and-php/
but it doesn't solve the issue.


--
"That which can be asserted without evidence, can be dismissed without
evidence."
-- Christopher Hitchens
Re: reading files with accents in the filename from PHP [message #183107 is a reply to message #183106] Wed, 09 October 2013 06:58 Go to previous messageGo to next message
Thomas Mlynarczyk is currently offline  Thomas Mlynarczyk
Messages: 131
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Erwin Moller schrieb:

> How can PHP open files on the local filesystem that contain certain
> characters, like umlauts, accents, etc?

$path = __DIR__ . '\Eugène.txt';
var_dump( PHP_VERSION, file_exists( $path ) );

Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
(="Windows") encoding. Doesn't work if stored in UTF8. So I suspect it's
an encoding issue: the accented character "è" is stored as \xE8 in the
file system, but if your script is UTF8, then your $path will contain
\xC3\xA8 instead. With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt"
it works when the script is UTF8.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
Re: reading files with accents in the filename from PHP [message #183108 is a reply to message #183107] Wed, 09 October 2013 07:16 Go to previous messageGo to next message
Erwin Moller is currently offline  Erwin Moller
Messages: 228
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 10/9/2013 12:58 PM, Thomas Mlynarczyk wrote:
> Erwin Moller schrieb:
>
>> How can PHP open files on the local filesystem that contain certain
>> characters, like umlauts, accents, etc?
>
> $path = __DIR__ . '\Eugène.txt';
> var_dump( PHP_VERSION, file_exists( $path ) );
>

That didn't help since my files are not stored in working dir.

> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
> (="Windows") encoding. Doesn't work if stored in UTF8.

Strange situation.
I changed my PHP-files encoding to UTF-8, but the problem still occurred.


So I suspect it's
> an encoding issue: the accented character "è" is stored as \xE8 in the
> file system, but if your script is UTF8, then your $path will contain
> \xC3\xA8 instead. With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt"
> it works when the script is UTF8.
>

THAT helped!

I added a replace:
$path = str_replace("è","\xE8",$path);

and now it IS readable from PHP.

Now I wonder if I should make a whole list of such replaces....
Sounds horrid, doesn't it?

But your idea brought me to the following idea:
$path = utf8_decode($path);

Which works flawlessly (on my set of only 13000 filenames)!

So at least I have it fixed for NTFS.
Thanks for pointing my head in the right direction.

Regards,
Erwin Moller


> Greetings,
> Thomas
>


--
"That which can be asserted without evidence, can be dismissed without
evidence."
-- Christopher Hitchens
Re: reading files with accents in the filename from PHP [message #183109 is a reply to message #183108] Wed, 09 October 2013 07:18 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 993
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 09/10/13 12:16, Erwin Moller wrote:
> On 10/9/2013 12:58 PM, Thomas Mlynarczyk wrote:
>> Erwin Moller schrieb:
>>
>>> How can PHP open files on the local filesystem that contain certain
>>> characters, like umlauts, accents, etc?
>>
>> $path = __DIR__ . '\Eugène.txt';
>> var_dump( PHP_VERSION, file_exists( $path ) );
>>
>
> That didn't help since my files are not stored in working dir.
>
>> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
>> (="Windows") encoding. Doesn't work if stored in UTF8.
>
> Strange situation.
> I changed my PHP-files encoding to UTF-8, but the problem still occurred.
>
>
> So I suspect it's
>> an encoding issue: the accented character "è" is stored as \xE8 in the
>> file system, but if your script is UTF8, then your $path will contain
>> \xC3\xA8 instead. With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt"
>> it works when the script is UTF8.
>>
>
> THAT helped!
>
> I added a replace:
> $path = str_replace("è","\xE8",$path);
>
> and now it IS readable from PHP.
>
> Now I wonder if I should make a whole list of such replaces....
> Sounds horrid, doesn't it?
>
> But your idea brought me to the following idea:
> $path = utf8_decode($path);
>
> Which works flawlessly (on my set of only 13000 filenames)!
>
> So at least I have it fixed for NTFS.
> Thanks for pointing my head in the right direction.
>
> Regards,
> Erwin Moller
>
>

and thanks for raising and solving that one..

tucked away in case its ever needed..

>> Greetings,
>> Thomas
>>
>
>


--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
Re: reading files with accents in the filename from PHP [message #183110 is a reply to message #183109] Wed, 09 October 2013 11:06 Go to previous messageGo to next message
Erwin Moller is currently offline  Erwin Moller
Messages: 228
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 10/9/2013 1:18 PM, The Natural Philosopher wrote:
> On 09/10/13 12:16, Erwin Moller wrote:
>> On 10/9/2013 12:58 PM, Thomas Mlynarczyk wrote:
>>> Erwin Moller schrieb:
>>>
>>>> How can PHP open files on the local filesystem that contain certain
>>>> characters, like umlauts, accents, etc?
>>>
>>> $path = __DIR__ . '\Eugène.txt';
>>> var_dump( PHP_VERSION, file_exists( $path ) );
>>>
>>
>> That didn't help since my files are not stored in working dir.
>>
>>> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
>>> (="Windows") encoding. Doesn't work if stored in UTF8.
>>
>> Strange situation.
>> I changed my PHP-files encoding to UTF-8, but the problem still occurred.
>>
>>
>> So I suspect it's
>>> an encoding issue: the accented character "è" is stored as \xE8 in the
>>> file system, but if your script is UTF8, then your $path will contain
>>> \xC3\xA8 instead. With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt"
>>> it works when the script is UTF8.
>>>
>>
>> THAT helped!
>>
>> I added a replace:
>> $path = str_replace("è","\xE8",$path);
>>
>> and now it IS readable from PHP.
>>
>> Now I wonder if I should make a whole list of such replaces....
>> Sounds horrid, doesn't it?
>>
>> But your idea brought me to the following idea:
>> $path = utf8_decode($path);
>>
>> Which works flawlessly (on my set of only 13000 filenames)!
>>
>> So at least I have it fixed for NTFS.
>> Thanks for pointing my head in the right direction.
>>
>> Regards,
>> Erwin Moller
>>
>>
>
> and thanks for raising and solving that one..
>
> tucked away in case its ever needed..
>

I don't have a good feeling about my "fix".
It worked, but I don't know exactly what is going on.

I actually hoped PHP would handle such things 'the right way', whatever
that might be. ;-)
Now I wonder what happens if my code happens to run on some *nix OS.
Ideally my PHP code is OS agnostic.

Regards,
Erwin Moller


--
"That which can be asserted without evidence, can be dismissed without
evidence."
-- Christopher Hitchens
Re: reading files with accents in the filename from PHP [message #183111 is a reply to message #183110] Wed, 09 October 2013 11:51 Go to previous messageGo to next message
Peter H. Coffin is currently offline  Peter H. Coffin
Messages: 245
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On Wed, 09 Oct 2013 17:06:21 +0200, Erwin Moller wrote:
> I don't have a good feeling about my "fix".
> It worked, but I don't know exactly what is going on.
>
> I actually hoped PHP would handle such things 'the right way', whatever
> that might be. ;-)
> Now I wonder what happens if my code happens to run on some *nix OS.
> Ideally my PHP code is OS agnostic.

PHP is; it's the OS you're running on that's playing up with how it's
encoding the file names. That's beyond PHP's control and PHP is bending
to the will of whatever encoding your source file is in, mostly by
ignoring it.

--
"'I'm not sleeping with a jr. high schooler! I have a life-sized doll
that looks like one.' Uh huh. That sounds SO much less pathetic."
-- Piro's Conscience www.megatokyo.com
Re: reading files with accents in the filename from PHP [message #183113 is a reply to message #183110] Wed, 09 October 2013 14:06 Go to previous messageGo to next message
J.O. Aho is currently offline  J.O. Aho
Messages: 194
Registered: September 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
On 09/10/13 17:06, Erwin Moller wrote:
>> On 09/10/13 12:16, Erwin Moller wrote:
>>> On 10/9/2013 12:58 PM, Thomas Mlynarczyk wrote:
>>>> Erwin Moller schrieb:
>>>>
>>>> > How can PHP open files on the local filesystem that contain certain
>>>> > characters, like umlauts, accents, etc?
>>>> So I suspect it's
>>>> an encoding issue: the accented character "è" is stored as \xE8 in the
>>>> file system, but if your script is UTF8, then your $path will contain
>>>> \xC3\xA8 instead. With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt"
>>>> it works when the script is UTF8.
>>> THAT helped!
>>>
>>> I added a replace:
>>> $path = str_replace("è","\xE8",$path);

easier to use iconv and conver the string to the file systems character
setup. http://www.php.net/manual/en/function.iconv.php

But to not have to do work arounds, use the same charset as the file
system when you write your scripts, a mixed charsetup usually will cause
issues when forgetting to convert.


>>> and now it IS readable from PHP.
>>>
>>> Now I wonder if I should make a whole list of such replaces....
>>> Sounds horrid, doesn't it?
> I don't have a good feeling about my "fix".
> It worked, but I don't know exactly what is going on.
>
> I actually hoped PHP would handle such things 'the right way', whatever
> that might be. ;-)

No there is no magic in PHP that would it change what you have hardcoded
in the script to something else just for you using a file system which
don't use the same character setup as you wrote the script in.

> Now I wonder what happens if my code happens to run on some *nix OS.
> Ideally my PHP code is OS agnostic.

Depends on which charset is used for the file system, if they use utf-8,
then no issue, of they use big5 or something else, then you have an
issue again.
You will most likely end up with issus with the file paths as other
operating systems uses / instead of \ (which is used as an escape
character).

--

//Aho
Re: reading files with accents in the filename from PHP [message #183116 is a reply to message #183107] Wed, 09 October 2013 16:54 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Thomas Mlynarczyk wrote:

> Erwin Moller schrieb:
>> How can PHP open files on the local filesystem that contain certain
>> characters, like umlauts, accents, etc?
>
> $path = __DIR__ . '\Eugène.txt';
> var_dump( PHP_VERSION, file_exists( $path ) );
>
> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
> (="Windows") encoding.

There is no “ANSI encoding“. Usually “ANSI encoding” means Windows-1252.
[0] It would be either coincidence or strange if this worked, because FAT32
uses the “OEM character set”, i. e. one of the various IBM code pages, 437
for English, and NTFS uses UTF-16BE [1]. The letter “è” has Windows-1252
code 0xE6, IBM437/IBM850 code 0x8A, and Unicode code point U+00E8 [2]
(encoded in UTF-16 as 0xE8 [3]). It follows that you cannot mean
Windows-1252 by “ANSI”.

[0] <http://en.wikipedia.org/wiki/Windows-1252>
[1] <http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx>
[2]
<http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)#Comparison_table>
[3] <http://rishida.net/tools/conversion/>

BTW, you want to upgrade soon:

< http://blogs.technet.com/b/security/archive/2013/08/15/the-risk-of-running- windows-xp-after-support-ends.aspx>

> Doesn't work if stored in UTF8.

The file, or the string?

> So I suspect it's an encoding issue: the accented character "è" is stored
> as \xE8 in the file system,

Yes, it is, but only with NTFS.

> but if your script is UTF8, then your $path will contain \xC3\xA8 instead.

Which cannot work with NTFS.

> With an "explicit" $path = __DIR__ . "\Eug\xE8ne.txt" it works when the
> script is UTF8.

But only with NTFS and compatible filesystems.


PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Re: reading files with accents in the filename from PHP [message #183119 is a reply to message #183116] Wed, 09 October 2013 17:33 Go to previous messageGo to next message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Thomas 'PointedEars' Lahn wrote:

> Thomas Mlynarczyk wrote:
>
>> Erwin Moller schrieb:
>>> How can PHP open files on the local filesystem that contain certain
>>> characters, like umlauts, accents, etc?
>>
>> $path = __DIR__ . '\Eugène.txt';
>> var_dump( PHP_VERSION, file_exists( $path ) );
>>
>> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
>> (="Windows") encoding.
>
> There is no “ANSI encoding“. Usually “ANSI encoding” means Windows-1252.
> [0] It would be either coincidence or strange if this worked, because FAT32
> uses the “OEM character set”, i. e. one of the various IBM code pages, 437
> for English, and NTFS uses UTF-16BE [1]. The letter “è” has Windows-1252
> code 0xE6, IBM437/IBM850 code 0x8A, and Unicode code point U+00E8 [2]
> (encoded in UTF-16 as 0xE8 [3]). It follows that you cannot mean
> Windows-1252 by “ANSI”.

The letter "è" is encoded in CP-1252 as /0xE8/[1]. In UTF-16 it is
encoded by *two* bytes: 0x00 0xE8 (or vice versa, depending on the
endianess).

I have created a file "tèst" on a German Windows XP on NTFS, and started
a PHP shell:

>>> $fs = glob('t?st')
>>> $fs[0]
't\350st'

Apparently, the file name is *read* by PHP as if it was encoded in
CP-1252. Either the description on MSDN[2] is wrong, or PHP uses a
Windows API that converts the filename's encoding. I presume the
latter, being aware (but not (yet) convinced) that there might be
another reason for this behavior.

[1] <http://en.wikipedia.org/wiki/Windows-1252>
[2]
<http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx>

> BTW, you want to upgrade soon:

Me too. :)

--
Christoph M. Becker
Re: reading files with accents in the filename from PHP [message #183121 is a reply to message #183119] Wed, 09 October 2013 18:41 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Christoph Michael Becker wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Thomas Mlynarczyk wrote:
>>> Erwin Moller schrieb:
>>>> How can PHP open files on the local filesystem that contain certain
>>>> characters, like umlauts, accents, etc?
>>>
>>> $path = __DIR__ . '\Eugène.txt';
>>> var_dump( PHP_VERSION, file_exists( $path ) );
>>>
>>> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
>>> (="Windows") encoding.
>>
>> There is no “ANSI encoding“. Usually “ANSI encoding” means Windows-1252.
>> [0] It would be either coincidence or strange if this worked, because
>> [FAT32 uses the “OEM character set”, i. e. one of the various IBM code
>> pages, 437 for English, and NTFS uses UTF-16BE [1]. The letter “è” has
>> Windows-1252 code 0xE6, IBM437/IBM850 code 0x8A, and Unicode code point
>> U+00E8 [2] (encoded in UTF-16 as 0xE8 [3]). It follows that you cannot
>> mean Windows-1252 by “ANSI”.
>
> The letter "è" is encoded in CP-1252 as /0xE8/[1].

You are correct (to some extent); I must have slipped into the wrong row.

My point is, however, that _Windows_-1252 is very likely _not_ what is
expected by the filesystem. By “coincidence”, the code *points* for
Windows-1252 and Unicode are the same from 0+00A0 to U+00FF, and the used
character is within that range. This code will break for characters whose
Unicode code point is above U+007F but outside this range. In general, it
will be unreliable because Windows-1252 does not have the interleaved zero-
octet that UTF-16 has (NTFS), and Windows-1252 and IBM437 & friends (FAT32)
are incompatible above 0x7F.

> In UTF-16 it is encoded by *two* bytes:

Two octets, to be precise. I was aware of that (as you could see further
below) but I oversimplified here.

> 0x00 0xE8 (or vice versa, depending on the endianess).

Because NTFS uses UTF-16_LE_ (as Windows uses _little-endian_ throughout),
it is E8 00 there.

> I have created a file "tèst" on a German Windows XP on NTFS, and started
> a PHP shell:
>
>>>> $fs = glob('t?st')
>>>> $fs[0]
> 't\350st'
>
> Apparently, the file name is *read* by PHP as if it was encoded in
> CP-1252.

Interesting. 0350 would correspond to 232 and 0xE8, indeed.

> Either the description on MSDN[2] is wrong,

Unlikely.

> or PHP uses a Windows API that converts the filename's encoding.

It would suffice if it discarded all zero-bits in *this* case as the code
would be {74 00} {E8 00} {73 00} {74 00}.

> I presume the latter, being aware (but not (yet) convinced) that there
> might be another reason for this behavior.

It would be interesting to see how this works with NTFS with characters
outside the specified range whose Unicode code point is above U+007F. For
example, U+0100 (“Ā”; LATIN CAPITAL LETTER A WITH MACRON) would be encoded
in one UTF-16 code unit, 0100, which would be encoded in UTF-16LE as 00 10.
Just stripping the zero-octets would result in <LF> (whose code point is
0x10 which is 020). Just reading the octet with the lower address would
result in 0x00 which terminates a C string. If the result is _not_
something equivalent to 't\020st' or 't', something else is happening.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
Re: reading files with accents in the filename from PHP [message #183122 is a reply to message #183110] Wed, 09 October 2013 21:07 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Erwin Moller wrote:

> On 10/9/2013 1:18 PM, The Natural Philosopher wrote:
>> On 09/10/13 12:16, Erwin Moller wrote:
>>> On 10/9/2013 12:58 PM, Thomas Mlynarczyk wrote:
>>>> Erwin Moller schrieb:
>>>> > How can PHP open files on the local filesystem that contain certain
>>>> > characters, like umlauts, accents, etc?
>>>>
>>>> $path = __DIR__ . '\Eugène.txt';
>>>> var_dump( PHP_VERSION, file_exists( $path ) );
>>>
>>> That didn't help since my files are not stored in working dir.
>>>
>>>> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
>>>> (="Windows") encoding. Doesn't work if stored in UTF8.
>>>
>>> Strange situation.
>>> I changed my PHP-files encoding to UTF-8, but the problem still
>>> occurred.
>>> […]
>>> I added a replace:
>>> $path = str_replace("è","\xE8",$path);
>>>
>>> and now it IS readable from PHP.
> […]
> I don't have a good feeling about my "fix".

And you should not.

> It worked, but I don't know exactly what is going on.

Exactly.

> I actually hoped PHP would handle such things 'the right way', whatever
> that might be. ;-)

PHP has no built-in support for character encodings (but it has extensions
for that). Your strings are read octet-wise from lowest to highest address
as they are, that is, as the *editor* encoded the characters between the
string delimiters. If you write “"è"” in an UTF-8 encoded source file, the
character between the delimiters will be encoded C3 A8. If you write the
*same* character in a Windows-1252-encoded source file, it will be encoded
E8.

If your filesystem is FAT32, it will probably expect 8A if its locale is
English (IBM437) or Central European (IBM850), for example. If your
filesystem is NTFS, it will expect E8 00 (UTF-16_LE_; my mistake); if you
omit the zero octet it *might* work, but it does not work reliably.

> Now I wonder what happens if my code happens to run on some *nix OS.

The operating system is not the issue; the filesystem is. However, usually
Linux will run on ext2 to ext4, where AFAIK any character encoding can be
used. So there is a good chance that your code will break there.

> Ideally my PHP code is OS agnostic.

In that case you will probably have to detect the filesystem, and its
encoding, and use the encoding that is expected by the filesystem. Or
prevent such filenames from occurring in the first place.

I suggest to encode PHP source files with UTF-8 _without BOM_. If you write
non-ASCII characters, you know what the encoding is, and you have a greater
character set so that fewer characters need to be escaped.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
Re: reading files with accents in the filename from PHP [message #183562 is a reply to message #183121] Thu, 31 October 2013 16:34 Go to previous message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
add to buddy list
ignore all messages by this user
Thomas 'PointedEars' Lahn wrote:

> It would be interesting to see how this works with NTFS with characters
> outside the specified range whose Unicode code point is above U+007F. For
> example, U+0100 (“Ā”; LATIN CAPITAL LETTER A WITH MACRON) would be encoded
> in one UTF-16 code unit, 0100, which would be encoded in UTF-16LE as 00 10.
> Just stripping the zero-octets would result in <LF> (whose code point is
> 0x10 which is 020). Just reading the octet with the lower address would
> result in 0x00 which terminates a C string. If the result is _not_
> something equivalent to 't\020st' or 't', something else is happening.

U+0010 denotes <DLE>, <LF> is U+000A[1]. Anyway, I created a file
"tĀst" and did:

>>> glob('*')
Array
(
)

Apparently, something else is happening.

FWIW, I tried the following, too:

>>> touch("test")
true

>>> touch("t\x00\x10st")
Warning: touch() expects parameter 1 to be a valid path, string given
in ...

>>> touch("t\x10\x10st")
Warning: touch(): Unable to create file t►►st because Invalid
argument in ...

>>> file_exists("tAAst")
false
>>> touch("t\x41\x41st")
true
>>> file_exists("tAAst")
true

[1] <http://www.unicode.org/charts/PDF/U0000.pdf>

--
Christoph M. Becker
Quick Reply
Formatting Tools:   
  Switch to threaded view of this topic Create a new topic
Previous Topic: PDO - Cannot retrieve warnings with emulated prepares disabled
Next Topic: Secure website
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Thu Oct 19 18:09:11 EDT 2017

Total time taken to generate the page: 0.00882 seconds