Page 1 of 1

Sorting an array alphabetically, human-style

Posted: Wed Oct 11, 2006 11:57 pm
by CobraCards
I have some code that pulls records from a mySQL database and displays them in alphabetical order. This works just fine as far as a computer is concerned, but to a human the results look out of whack. For example, take these four arbitrary phrases:

Lois Lane
Superman Robots
Superman, Man of Steel
The Fortress of Solitude

This is correct, character-by-character alphabetical order, but (most) humans don't do that.... I would say that articles (a, an, the) and punctuation are generally ignored, so the order I want is this:

The Fortress of Solitude
Lois Lane
Superman, Man of Steel
Superman Robots

How can I reorder an array like this?

Thanks!

Posted: Thu Oct 12, 2006 12:23 am
by Christopher
You probably need to make a separate column in the table for the sort order. It could contain alternate text or simply a number, but you would sort on that column rather than the actual displayed text.

Posted: Thu Oct 12, 2006 6:42 am
by feyd
artificially, the array could be supplemented or surrogated with the text stripped of punctuation then passed through natcasesort(). Using those results, the records could be rearranged, however performing the sort in the database would be the fastest end-level performance in all likelihood.

Posted: Thu Oct 12, 2006 9:21 am
by bokehman
Here's one possible method:

Code: Select all

<?php 

$titles = array('Superman Robots', 'Lois Lane', 'Superman, Man of Steel', 'The Fortress of Solitude');
$noise_words = array('a', 'an', 'the');

order($titles, $noise_words);

function order(&$titles, $noise_words)
{
	$temp = $titles;
	foreach($noise_words as $k => $v)
	{
		$noise_words[$k] = preg_quote($v, '/');
	}
	$exp = '/\b('.implode('|', $noise_words).')\b/';
	foreach($temp as $k => $v)
	{
		$temp[$k] = trim(preg_replace('/[^\w\s]/', '', preg_replace($exp, '', strtolower($v))));
	}
	array_multisort($temp, $titles, SORT_STRING, SORT_ASC);
}

# test it
print_r($titles);

# prints: Array ( [0] => The Fortress of Solitude [1] => Lois Lane [2] => Superman, Man of Steel [3] => Superman Robots )

?>

Posted: Thu Oct 12, 2006 9:41 am
by timvw
Afaik is 'alphabetic' sorting locale dependend... Eg: In Finnish the characters V and W are treated the same...

Posted: Thu Oct 12, 2006 10:01 am
by bokehman
timvw wrote:Afaik is 'alphabetic' sorting locale dependend... Eg: In Finnish the characters V and W are treated the same...
I didn't see anything about Finnish in the original post; the examples were in English. Nevertheless the following takes the locale into account:

Code: Select all

function order(&$titles, $noise_words, $locale = array('esp'))
{
	setlocale(LC_ALL, $locale);
	$temp = $titles;
	foreach($noise_words as $k => $v)
	{
		$noise_words[$k] = preg_quote($v, '/');
	}
	$exp = '/\b('.implode('|', $noise_words).')\b/';
	foreach($temp as $k => $v)
	{
		$temp[$k] = trim(preg_replace('/[^\w\s]/', '', preg_replace($exp, '', strtolower($v))));
	}
	asort($temp, SORT_LOCALE_STRING);
	foreach(array_keys($temp) as $k)
	{
		$rtn[] = $titles[$k];
	}
	$titles = $rtn;
}