FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Sort using utf characters in mysql or php ? best solutions...
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Sort using utf characters in mysql or php ? best solutions... [message #172818] Fri, 04 March 2011 00:58 Go to next message
SM is currently offline  SM
Messages: 3
Registered: March 2011
Karma: 0
Junior Member
Hello,
Using MySQL, i'm selecting a list of songs in spanish that i would
like to sort. Here's a list of names that is returned by the query:

¡Decirevilla!
Alhambra
123 pasitos
África
Arroz
Decir

The sorted list should look like this:

123 pasitos
África
Alhambra
Arroz
¡Decirevilla!
Decir


After all of the research i've read, i've concluded that there is no
reasonable way to achieve this using MySQL. I've tried collation,
charset, etc... but there is no way the character ¡, ?, etc... can by
sorted accordingly to my desired result. Even the Á is not sorted the
way i want to...

Question 1: Is this a reasonable conclusion?

I believe the only way to achieve this is by passing the results to an
array in php and then sort the array using a custom function... all
this using the function usort (need to sort by value and i don't care
about maintaning the key association). Something similar to this:


function normalize($a, $b) {
if ($a == $b) {
return 0;
}

return ($a < $b) ? -1 : 1;
}


$tracks = array();

while ($row = $result->fetch_assoc()) {
$tracks[] = $row;
}

usort($tracks, 'normalize');


Question 2: Is this the best way to achieve a custom sorting?


Here's where i'm hitting a wall:

Question 3: I have no idea how to create the normalize function to
sort the names accordingly to my needs. How do i ignore certain
characters (¡, ?, ', !, ¿) and how do i replace other characters with
the natural equivalent (Á -> A, É -> E, etc..)
I believe that by ignoring certain characters and replacing others, i
can achieve the sorting i'm loojing for...

Question 4: All this make sense? Am i on the right path?


Thanks in advance for all your advice.
Marco
Re: Sort using utf characters in mysql or php ? best solutions... [message #172823 is a reply to message #172818] Fri, 04 March 2011 03:11 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
On 3/3/2011 7:58 PM, SM wrote:
> Hello,
> Using MySQL, i'm selecting a list of songs in spanish that i would
> like to sort. Here's a list of names that is returned by the query:
>
> ¡Decirevilla!
> Alhambra
> 123 pasitos
> África
> Arroz
> Decir
>
> The sorted list should look like this:
>
> 123 pasitos
> África
> Alhambra
> Arroz
> ¡Decirevilla!
> Decir
>
>
> After all of the research i've read, i've concluded that there is no
> reasonable way to achieve this using MySQL. I've tried collation,
> charset, etc... but there is no way the character ¡, ?, etc... can by
> sorted accordingly to my desired result. Even the Á is not sorted the
> way i want to...
>
> Question 1: Is this a reasonable conclusion?
>

You're asking this question in the wrong newsgroup. Try
comp.databases.mysql.

> I believe the only way to achieve this is by passing the results to an
> array in php and then sort the array using a custom function... all
> this using the function usort (need to sort by value and i don't care
> about maintaning the key association). Something similar to this:
>
>
> function normalize($a, $b) {
> if ($a == $b) {
> return 0;
> }
>
> return ($a< $b) ? -1 : 1;
> }
>
>
> $tracks = array();
>
> while ($row = $result->fetch_assoc()) {
> $tracks[] = $row;
> }
>
> usort($tracks, 'normalize');
>
>
> Question 2: Is this the best way to achieve a custom sorting?
>

Possibly - but first ensure it can't be done in MySQL (see my response
to Question 1).

>
> Here's where i'm hitting a wall:
>
> Question 3: I have no idea how to create the normalize function to
> sort the names accordingly to my needs. How do i ignore certain
> characters (¡, ?, ', !, ¿) and how do i replace other characters with
> the natural equivalent (Á -> A, É -> E, etc..)
> I believe that by ignoring certain characters and replacing others, i
> can achieve the sorting i'm loojing for...
>

Replacing characters is one way. Another way would be to create your
own collating sequence in an array of characters. There are many other
ways.

> Question 4: All this make sense? Am i on the right path?
>
>
> Thanks in advance for all your advice.
> Marco
>

I'd still try to do it in MySQL if it's possible.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
Re: Sort using utf characters in mysql or php ? best solutions... [message #172825 is a reply to message #172823] Fri, 04 March 2011 03:49 Go to previous messageGo to next message
SM is currently offline  SM
Messages: 3
Registered: March 2011
Karma: 0
Junior Member
On Mar 3, 10:11 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> On 3/3/2011 7:58 PM, SM wrote:
>
>
>
>> Hello,
>> Using MySQL, i'm selecting a list of songs in spanish that i would
>> like to sort. Here's a list of names that is returned by the query:
>
>> Decirevilla!
>> Alhambra
>> 123 pasitos
>> frica
>> Arroz
>> Decir
>
>> The sorted list should look like this:
>
>> 123 pasitos
>> frica
>> Alhambra
>> Arroz
>> Decirevilla!
>> Decir
>
>> After all of the research i've read, i've concluded that there is no
>> reasonable way to achieve this using MySQL. I've tried collation,
>> charset, etc... but there is no way the character , ?, etc... can by
>> sorted accordingly to my desired result. Even the is not sorted the
>> way i want to...
>
>> Question 1: Is this a reasonable conclusion?
>
> You're asking this question in the wrong newsgroup.  Try
> comp.databases.mysql.
>
>
>
>> I believe the only way to achieve this is by passing the results to an
>> array in php and then sort the array using a custom function... all
>> this using the function usort (need to sort by value and i don't care
>> about maintaning the key association). Something similar to this:
>
>> function normalize($a, $b) {
>>    if ($a == $b) {
>>      return 0;
>>    }
>
>>    return ($a<  $b) ? -1 : 1;
>> }
>
>> $tracks = array();
>
>> while ($row = $result->fetch_assoc()) {
>>    $tracks[] = $row;
>> }
>
>> usort($tracks, 'normalize');
>
>> Question 2: Is this the best way to achieve a custom sorting?
>
> Possibly - but first ensure it can't be done in MySQL (see my response
> to Question 1).
>
>
>
>> Here's where i'm hitting a wall:
>
>> Question 3: I have no idea how to create the normalize function to
>> sort the names accordingly to my needs. How do i ignore certain
>> characters ( , ?, ', !, ) and how do i replace other characters with
>> the natural equivalent ( ->  A, ->  E, etc..)
>> I believe that by ignoring certain characters and replacing others, i
>> can achieve the sorting i'm loojing for...
>
> Replacing characters is one way.  Another way would be to create your
> own collating sequence in an array of characters.  There are many other
> ways.
>
>> Question 4: All this make sense? Am i on the right path?
>
>> Thanks in advance for all your advice.
>> Marco
>
> I'd still try to do it in MySQL if it's possible.
>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================

i will try posting this in the mysql forum.... in the meantime, i will
continue searching ... will try the collating sequence in an array
option...
Re: Sort using utf characters in mysql or php ? best solutions... [message #172828 is a reply to message #172818] Fri, 04 March 2011 08:53 Go to previous messageGo to next message
alvaro.NOSPAMTHANX is currently offline  alvaro.NOSPAMTHANX
Messages: 277
Registered: September 2010
Karma: 0
Senior Member
El 04/03/2011 1:58, SM escribió/wrote:
> Using MySQL, i'm selecting a list of songs in spanish that i would
> like to sort. Here's a list of names that is returned by the query:
>
> ¡Decirevilla!
> Alhambra
> 123 pasitos
> África
> Arroz
> Decir
>
> The sorted list should look like this:
>
> 123 pasitos
> África
> Alhambra
> Arroz
> ¡Decirevilla!
> Decir

"Decir" should come before "¡Decirevilla!" but I get your point.

>
>
> After all of the research i've read, i've concluded that there is no
> reasonable way to achieve this using MySQL. I've tried collation,
> charset, etc... but there is no way the character ¡, ?, etc... can by
> sorted accordingly to my desired result. Even the Á is not sorted the
> way i want to...

These are two different issues. You can absolutely sort Spanish letters
at MySQL with the appropriate rules. You just need to assign a proper
collation to the column: utf8_spanish_ci

CREATE TABLE test (
word VARCHAR(50) NULL
)
COLLATE='utf8_spanish_ci'
ENGINE=InnoDB;

INSERT INTO test (word)
VALUES ('ae'), ('Ád'), ('Ac'), ('áb'), ('aa');

SELECT word FROM test ORDER BY word;

"word"
"aa"
"áb"
"Ac"
"Ád"
"ae"

Compare with:

CREATE TABLE test (
word VARCHAR(50) NULL
)
COLLATE='utf8_bin'
ENGINE=InnoDB;

INSERT INTO test (word)
VALUES ('ae'), ('Ád'), ('Ac'), ('áb'), ('aa');

SELECT word FROM test ORDER BY word;

"word"
"Ac"
"aa"
"ae"
"Ád"
"áb"

The second issue is that you want to ignore character that are not
letters or numbers. Well, just remove them from the string before
comparing. Unluckily, MySQL does not provide regexp based replacement.
So my advice is that you add a secondary column and populate it with PHP:

CREATE TABLE test (
word VARCHAR(50) NULL,
sort_order VARCHAR(50) NULL
)
COLLATE='utf8_spanish_ci'
ENGINE=InnoDB;

INSERT INTO test (word, sort_order) VALUES
('¡Decirevilla!', 'Decirevilla'),
('Alhambra', 'Alhambra'),
('123 pasitos', '123 pasitos'),
('África', 'África'),
('Arroz', 'Arroz'),
('Decir', 'Decir');

SELECT word FROM test ORDER BY sort_order;

"word"
"123 pasitos"
"África"
"Alhambra"
"Arroz"
"Decir"
"¡Decirevilla!"


In order to populate the secondary column, you can use several
approaches, from preg_replace()
(<http://es.php.net/manual/en/function.preg-replace.php>) to
internationalization functions
(<http://es.php.net/manual/en/book.intl.php>).

echo preg_replace('/\\W/iu', '', '¡Hola, mundo! ¿Cómo está usted?');
// Prints: HolamundoCómoestáusted

(adjust the regexp to your liking.)


> Question 1: Is this a reasonable conclusion?
>
> I believe the only way to achieve this is by passing the results to an
> array in php and then sort the array using a custom function... all
> this using the function usort (need to sort by value and i don't care
> about maintaning the key association).

This implies that a query that depends on sort order (e.g., "get first
20 items that start with 'B') will need to retrieve the complete table.

[...]

> Question 3: I have no idea how to create the normalize function to
> sort the names accordingly to my needs. How do i ignore certain
> characters (¡, ?, ', !, ¿) and how do i replace other characters with
> the natural equivalent (Á -> A, É -> E, etc..)

You can use iconv() for that but it's probably not necessary:

http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php



--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
Re: Sort using utf characters in mysql or php ? best solutions... [message #172844 is a reply to message #172828] Sat, 05 March 2011 00:53 Go to previous message
SM is currently offline  SM
Messages: 3
Registered: March 2011
Karma: 0
Junior Member
On Mar 4, 3:53 am, "Álvaro G. Vicario"
<alvaro.NOSPAMTH...@demogracia.com.invalid> wrote:
> El 04/03/2011 1:58, SM escribi /wrote:
>
>
>
>> Using MySQL, i'm selecting a list of songs in spanish that i would
>> like to sort. Here's a list of names that is returned by the query:
>
>> Decirevilla!
>> Alhambra
>> 123 pasitos
>> frica
>> Arroz
>> Decir
>
>> The sorted list should look like this:
>
>> 123 pasitos
>> frica
>> Alhambra
>> Arroz
>> Decirevilla!
>> Decir
>
> "Decir" should come before " Decirevilla!" but I get your point.
>
>
>
>> After all of the research i've read, i've concluded that there is no
>> reasonable way to achieve this using MySQL. I've tried collation,
>> charset, etc... but there is no way the character , ?, etc... can by
>> sorted accordingly to my desired result. Even the is not sorted the
>> way i want to...
>
> These are two different issues. You can absolutely sort Spanish letters
> at MySQL with the appropriate rules. You just need to assign a proper
> collation to the column: utf8_spanish_ci
>
> CREATE TABLE test (
>         word VARCHAR(50) NULL
> )
> COLLATE='utf8_spanish_ci'
> ENGINE=InnoDB;
>
> INSERT INTO test (word)
> VALUES ('ae'), (' d'), ('Ac'), (' b'), ('aa');
>
> SELECT word FROM test ORDER BY word;
>
> "word"
> "aa"
> " b"
> "Ac"
> " d"
> "ae"
>
> Compare with:
>
> CREATE TABLE test (
>         word VARCHAR(50) NULL
> )
> COLLATE='utf8_bin'
> ENGINE=InnoDB;
>
> INSERT INTO test (word)
> VALUES ('ae'), (' d'), ('Ac'), (' b'), ('aa');
>
> SELECT word FROM test ORDER BY word;
>
> "word"
> "Ac"
> "aa"
> "ae"
> " d"
> " b"
>
> The second issue is that you want to ignore character that are not
> letters or numbers. Well, just remove them from the string before
> comparing. Unluckily, MySQL does not provide regexp based replacement.
> So my advice is that you add a secondary column and populate it with PHP:
>
> CREATE TABLE test (
>         word VARCHAR(50) NULL,
>         sort_order VARCHAR(50) NULL
> )
> COLLATE='utf8_spanish_ci'
> ENGINE=InnoDB;
>
> INSERT INTO test (word, sort_order) VALUES
> (' Decirevilla!', 'Decirevilla'),
> ('Alhambra', 'Alhambra'),
> ('123 pasitos', '123 pasitos'),
> (' frica', ' frica'),
> ('Arroz', 'Arroz'),
> ('Decir', 'Decir');
>
> SELECT word FROM test ORDER BY sort_order;
>
> "word"
> "123 pasitos"
> " frica"
> "Alhambra"
> "Arroz"
> "Decir"
> " Decirevilla!"
>
> In order to populate the secondary column, you can use several
> approaches, from preg_replace()
> (<http://es.php.net/manual/en/function.preg-replace.php>) to
> internationalization functions
> (<http://es.php.net/manual/en/book.intl.php>).
>
> echo preg_replace('/\\W/iu', '', ' Hola, mundo! C mo est usted?');
> // Prints: HolamundoC moest usted
>
> (adjust the regexp to your liking.)
>
>> Question 1: Is this a reasonable conclusion?
>
>> I believe the only way to achieve this is by passing the results to an
>> array in php and then sort the array using a custom function... all
>> this using the function usort (need to sort by value and i don't care
>> about maintaning the key association).
>
> This implies that a query that depends on sort order (e.g., "get first
> 20 items that start with 'B') will need to retrieve the complete table.
>
> [...]
>
>> Question 3: I have no idea how to create the normalize function to
>> sort the names accordingly to my needs. How do i ignore certain
>> characters ( , ?, ', !, ) and how do i replace other characters with
>> the natural equivalent ( ->  A, ->  E, etc..)
>
> You can use iconv() for that but it's probably not necessary:
>
> http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in...
>
> --
> --http://alvaro.es- lvaro G. Vicario - Burgos, Spain
> -- Mi sitio sobre programaci n web:http://borrame.com
> -- Mi web de humor satinado:http://www.demogracia.com
> --

that's exactly my point... no collation or charset will work on this
case... I mean, why would you want to sort a character like ¡ or ? or
¿... anyways, that's not the point...

After a day of researching and asking around, i believe that i'm
better off letting MySQL handle the sorting and not the PHP. For that,
i need to build an artificial column in my table that does sort
properly... a column called, say, sortable_title; then ORDER BY
sortable_title... When doing INSERTs and UPDATE's, I will use use a
normalize() function in PHP that produces a list with no punctuation,
all lower case, accents stripped, etc...

Gracias Álvaro for all of your tips and guiding.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Want to free hosting
Next Topic: [ANN]VTD-XML 2.10
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Dec 04 08:52:24 GMT 2024

Total time taken to generate the page: 0.02707 seconds