Sorting things out
I recently packed up everything I own and moved. I'd lived in my old place for about nine years and I have the packrat gene on both sides of the family tree, so I had a lot of crap to sort through to figure out what to move and what to trash, as well as which box what should go in. Now that I'm here in the new place, I've had to sort through the remaining stuff to figure out where it all goes. So you might appreciate that sorting has been on my mind a lot lately. (See? It's a topical tie-in. I don't do those often, so I hope it wasn't too awkward.)
I've also been working on an application that does a lot of sorting to prioritize tasks in a workflow. These items often need to be sorted based on multiple criteria, such as how long an application has been waiting for approval, how many times a customer has been called recently, etc. We also have to sort names that have anywhere from two to four components (Hispanic names can have both paternal and maternal names instead of just a surname).
Ruby enumerables can be sorted using the #sort method, which orders contents using the <=> trinary comparison operator. It's pretty simple to override <=> if instances of a class are only ever sorted one way, but when you get to a situation where you need to sort things in different ways depending on the context, you need a more flexible solution
The next obvious thing to try is passing a block to the #sort method. This works great, but it has two drawbacks. One, if the sort criteria are complex, you can end up with some ugly looking code. And B, it's not very DRY for re-using sort criteria.
The approach I came up with for my application makes use of the sorting properties of Ruby arrays. Ruby arrays sort quite nicely using the <=> operator. <=> compares two arrays by comparing all their values until a non-equal result is found for any pairings of values in the two arrays. Here are some example comparisons:
>> [1, 2, 3] <=> [1, 2, 3]
=> 0
>> [0, 2, 3] <=> [1, 2, 3]
=> -1
>> [1, 2, 3] <=> [0, 2, 3]
=> 1
>> [1, 2, [4, 5]] <=> [1, 2, [6, 7]]
=> -1
>> [1, 2, [4, 5]] <=> [1, 2, [2, 3]]
=> 1
There are some things to watch out for, like don't have nil anywhere in the array contents, and shorter arrays sort less than longer arrays where all else is equal. But this is a great feature to build upon.
Based on the array sorting functionality, I defined a concept I call a sorter. A sorter is an array that holds multiple values to use to sort the collection by in order of precedence. For example, if you want to sort people by name using the precedence paternal, maternal, first, middle, you can construct an array of those values and sort the list using that.
class Person < ActiveRecord::Base
def name_sorter
[(paternal_name || ""), (maternal_name || ""), (first_name || ""), (middle_name || "")]
end
end
@people = Person.find(:all).sort_by { |p| p.name_sorter }
Notice I'm making sure that I never have a nil as a value in the sorter, cause that will make things blow up like the ending of a Die Hard movie. If you want to avoid that verbiage, set up the model so that the names default to empty strings. Also notice that I'm using the #sort_by method, instead of the #sort method with a 2-arg block. Actually, let's look at all our options. The following three forms will give equivalent results:
@people = Person.find(:all).sort_by { |p| p.name_sorter }
@people = Person.find(:all).sort_by(&:name_sorter)
@people = Person.find(:all).sort { |a,b| a.name_sorter <=> b.name_sorter }
The first line is the one I'd use in most situations. The second line is slightly more compact, but the symbol-to-proc coercion should be avoided in any code where you care about performance (at least until Ruby 1.9 is ready for prime time). The third line is usually going to be a bad choice, because a sorter array would be created for every time its person was compared, which will be roughly ln(n) times if you are willing to believe my naive assumptions about Ruby's sorting algorithm. If you don't want to trust naiveté, check out the simple performance comparison I ran. This benchmark sorts an array of 60 items using a sorter on 4 values.
user system total real
sort 6.430000 0.000000 6.430000 ( 6.466645)
sort_by 1.510000 0.000000 1.510000 ( 1.509017)
That kind of difference is pretty significant to me. Of course, if your sorter can be cached so you can reuse it for multiple comparisons, a #sort might be better. As always, measure it if you care and you're not sure.
One last thing. What if we just rolled the sort by hand using a block? (The code below assumes names will default to an empty string.)
@people = Person.find(:all).sort do |a,b|
if 0 != (comp = a.last_name <=> b.last_name)
comp
elsif 0 != (comp = a.maternal_name <=> b.maternal_name)
comp
elsif 0 != (comp = a.first_name <=> b.first_name)
comp
else
a.middle_name <=> b.middle_name
end
end
That's fairly ugly and it's actually slower than the #sort_by approach too.
And because I don't believe in secret benchmarks, here is a gist of the benchmark code I used.
Now if you'll excuse me, I have a lot of empty cardboard boxes to toss in the recycling.
- Programming Language:
- Technology:


Recent comments
1 year 23 weeks ago
1 year 23 weeks ago
1 year 25 weeks ago
1 year 27 weeks ago
1 year 42 weeks ago
1 year 45 weeks ago
1 year 45 weeks ago
1 year 45 weeks ago
1 year 46 weeks ago
1 year 48 weeks ago