FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Random string from selected Unicode character set (test data)
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Random string from selected Unicode character set (test data) [message #177386 is a reply to message #177359] Fri, 23 March 2012 05:36 Go to previous message
gordonb.9boif is currently offline  gordonb.9boif
Messages: 1
Registered: March 2012
Karma:
Junior Member
> I am implementing the script from generatedata.com
>
> But I would like for it to also display Unicode chars, so I can test
> other languages.
>
> I have looked everywhere, but can't seem to find a PHP function that
> lets me do something like
>
> $outstring .= $make_unicode_random('katakana');
>
> thereby selecting things from the code points U+30A0 .. U+30FF

First, get a function that takes a codepoint and generates the UTF
bytes for that codepoint. See below. I chose to put the data in
urlencode format (e.g. "%E3%82%A0" for U+30A0) and then urldecode()
it to a raw byte sequence. It may not be super-efficient but it
works. Note that the codepoint probably needs to be a hex number
(e.g. begins with "0x") although decimal works also. UTF-8 encoding
is essentially a lot of bit shifting and masking.

This function is great as a base for generating code charts (that
is, *ALL* of the characters in a certain range, in a table labelled
with the codepoint, to test out your fonts).

Second, make some tables so that, given a language, you can determine
some code point range(s) which contain characters from that language.
The unicode.org tables of blocks of characters and what scripts
they belong to may be useful here.

Third, make a function that picks some random code points from a
particular language range and outputs them.

Be sure to declare the character set of your web page (if that's
where the output is going) as UTF-8 in the HTTP headers:

header('Content-type: text/html; charset=UTF-8');

> The strings could be defined more broadly like "japanese" etc for
> language separation. Not important.
>
> Point is to get a string of 1..n chars of a certain language group, be
> it Greek, European accents, Japanese, Mandarin, Hangul etc



<?php

function codepoint_to_utf8($code)
{
if ($code <= 0x7f) {
$s = sprintf("%%%02X", $code);
} else if ($code <= 0x7ff) {
$s = sprintf("%%%02X%%%02X",
((($code >> 6) & 0x1f)) | 0xc0,
((($code >> 0) & 0x3f)) | 0x80
);
} else if ($code <= 0xffff) {
$s = sprintf("%%%02X%%%02X%%%02X",
((($code >> 12) & 0x0f)) | 0xe0,
((($code >> 6) & 0x3f)) | 0x80,
((($code >> 0) & 0x3f)) | 0x80
);
} else if ($code <= 0x1fffff) {
$s = sprintf("%%%02X%%%02X%%%02X%%%02X",
((($code >> 18) & 0x07)) | 0xf0,
((($code >> 12) & 0x3f)) | 0x80,
((($code >> 6) & 0x3f)) | 0x80,
((($code >> 0) & 0x3f)) | 0x80
);
} else if ($code < 0x3ffffff) {
/* actually, this is beyond legal Unicode */
$s = sprintf("%%%02X%%%02X%%%02X%%%02X",
((($code >> 24) & 0x03)) | 0xf8,
((($code >> 18) & 0x3f)) | 0x80,
((($code >> 12) & 0x3f)) | 0x80,
((($code >> 6) & 0x3f)) | 0x80,
((($code >> 0) & 0x3f)) | 0x80
);
} else if ($code < 0x7fffffff) {
/* actually, this is beyond legal Unicode */
$s = sprintf("%%%02X%%%02X%%%02X%%%02X%%%02X",
((($code >> 30) & 0x01)) | 0xfc,
((($code >> 24) & 0x3f)) | 0x80,
((($code >> 18) & 0x3f)) | 0x80,
((($code >> 12) & 0x3f)) | 0x80,
((($code >> 6) & 0x3f)) | 0x80,
((($code >> 0) & 0x3f)) | 0x80
);
}
return urldecode($s);
}
?>



UTF-8 Character set (U+0100 - U+01FF)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

10_ Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď
11_ Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
12_ Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į
13_ İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
14_ ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ
15_ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
16_ Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů
17_ Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
18_ ƀ Ɓ Ƃ ƃ Ƅ ƅ Ɔ Ƈ ƈ Ɖ Ɗ Ƌ ƌ ƍ Ǝ Ə
19_ Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ɵ
1A_ Ơ ơ Ƣ ƣ Ƥ ƥ Ʀ Ƨ ƨ Ʃ ƪ ƫ Ƭ ƭ Ʈ Ư
1B_ ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ ƹ ƺ ƻ Ƽ ƽ ƾ ƿ
1C_ ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ
1D_ ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ
1E_ Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ
1F_ ǰ DZ Dz dz Ǵ ǵ Ƕ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Stats comp.lang.php (last 7 days)
Next Topic: Re: openssl_pkcs7_sign with key file
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Nov 26 03:53:34 GMT 2024

Total time taken to generate the page: 0.04780 seconds