Unusual ordering

from the Artful SQL Server & Access Tips List


You can sort a resultset in any way you please, so long as you can describe the sorting rules. A colleague once encountered an unusual problem concerning sort orders. The most concise way to explain the problem is to show you the following code. Paste it into Query Analyzer and run it against any test database:

create table TestSort(CodeToSort varchar(10));
insert into TestSort(CodeToSort) values('a1');
insert into TestSort(CodeToSort) values('a2');
insert into TestSort(CodeToSort) values('a21');
insert into TestSort(CodeToSort) values('a10x');
insert into TestSort(CodeToSort) values('a2a');
insert into TestSort(CodeToSort) values('a10');

A sort on this column yields the following result set:

select * from TestSort order by CodeToSort;
CodeToSort
a1
a10
a10x
a2
a21
a2a

But the application's requirements weren't that simple. Management wanted the results in the following sort order:

a1
a2
a2a
a10
a10x
a21

Since it may not be apparent what the rules are, I'll describe the sample strings:

- The 2nd through nth characters are numerics.
- Numeric characters are followed by zero or more alphabetic characters.

Now let's reverse-engineer the rules governing the sort order (the sample doesn't make rules four and five below explicitly clear, but we can infer them):

1. The first character is insignificant.

2. Sort on the numeric value of the numeric characters within the string.

3. In the event of an alphabetic suffix, that string should follow a similar string without the alphabetic suffix (i.e., a10x succeeds a10).

4. The numeric part of the string might be any reasonable number of digits.

5. In the event of two alphabetic suffixes, sort them alphabetically (i.e., a10x precedes a10y).

Our biggest problem is rule number four. We can't just use the SUBSTRING() function to grab the numeric portion. Instead, we have to grab all the contiguous numeric characters, turn them into an integer, and sort on that value.

My solution is to use a user-defined-function (UDF) that extracts the numeric digits from the character string and returns them to us as an integer. Here is a function that performs this task:

CREATE FUNCTION dbo.InnerNumeric_fn
(
    @Source Varchar(20) = 'a20x'
)
RETURNS integer
AS
    BEGIN
        declare @Result varchar(10)
        declare @i int
        declare @j int
        declare @c char(1)
        declare @flag int
        set @j = len(@Source)
        set @i = 2
        set @Result = ''
        while @i <= @j
            begin
            set @c = substring(@Source,@i,1)
            set @flag = isnumeric(@c)
            if (@flag = 1)
                begin
                    set @Result = @Result + @c
                    set @i = @i+1
                end
            else
                break
            end
    
        RETURN CAST(@Result AS INT)
    END

We could generalize the pattern somewhat by supposing that the initial alphabetic portion of the string might be more than one character. In that case, we wouldn't assign the value 2 to the variable I; we would have to step over the alphabetic characters. In addition, the alphabetic suffix might consist of more than one character.

In a worse case, we could also assume that there might be more than one numeric string in the sequence (i.e., axyz200bcd300xy), but then we would need much more clearly defined rules as to the desired sort order. I'll leave these variations as exercises to the interested reader.

The trick isn't in the code but in the accurate and complete description of the rules. Make sure that the person who dictates the rules supplies you with an adequate test set.

The suppliers of said rules will almost always overlook something, so look for holes and exceptions in supplied sets because, if you don't, they may come back to bite you. Before you code a line, make sure that all the stakeholders agree on the rules.



Return to the Artful SQL Server & Access Tips page