sort
Operator
sort — sort values
Synopsis
sort [-r] [-nulls first|last] [<expr> [asc|desc] [, <expr> [asc|desc] ...]]
Description
The sort
operator sorts its input by reading all values until the end of input,
sorting the values according to the provided sort expression(s), and emitting
the values in the sorted order.
The sort expressions act as primary key, secondary key, and so forth. By
default, the sort order is ascending, from lowest value to highest. If
desc
is specified in a sort expression, the sort order for that key is
descending.
SuperPipe follows the SQL convention that, by default, null
values appear last
in either case of ascending or descending sort. This can be overridden
by specifying -nulls first
.
If no sort expression is provided, a sort key is guessed based on heuristics applied
to the values present.
The heuristic examines the first input record and finds the first field in
left-to-right order that is an integer, or if no integer field is found,
the first field that is floating point. If no such numeric field is found, sort
finds
the first field in left-to-right order that is not of the time
data type.
Note that there are some cases (such as the output of a grouped aggregation performed on heterogeneous data) where the first input record to sort
may vary even when the same query is executed repeatedly against the same data.
If you require a query to show deterministic output on repeated execution,
explicit sort expressions must be provided.
If -r
is specified, the sort order for each key is reversed. For clarity
when sorting by named fields, specifying desc
is recommended instead of -r
,
particularly when multiple sort expressions are present. However, sort -r
provides a shorthand if the heuristics described above suffice but reversed
output is desired.
If not all data fits in memory, values are spilled to temporary storage and sorted with an external merge sort.
SuperPipe's sort
is stable
such that values with identical sort keys always have the same relative order
in the output as they had in the input, such as provided by the -s
option in
Unix's "sort" command-line utility.
Note that a total order is defined over the space of all values even between values of different types so sort order is always well-defined even when comparing heterogeneously typed values.
TBD: document the definition of the total order
Examples
A simple sort with a null
echo '2 null 1 3' | super -z -c 'sort this' -
=>
1
2
3
null
With no sort expression, sort will sort by this
for non-records
echo '2 null 1 3' | super -z -c sort -
=>
1
2
3
null
The "nulls last" default may be overridden
echo '2 null 1 3' | super -z -c 'sort -nulls first' -
=>
null
1
2
3
With no sort expression, sort's heuristics will find a numeric key
echo '{s:"bar",k:2}{s:"bar",k:3}{s:"foo",k:1}' | super -z -c sort -
=>
{s:"foo",k:1}
{s:"bar",k:2}
{s:"bar",k:3}
It's best practice to provide the sort key
echo '{s:"bar",k:2}{s:"bar",k:3}{s:"foo",k:1}' | super -z -c 'sort k' -
=>
{s:"foo",k:1}
{s:"bar",k:2}
{s:"bar",k:3}
Sort with a secondary key
echo '{s:"bar",k:2}{s:"bar",k:3}{s:"foo",k:2}' | super -z -c 'sort k,s' -
=>
{s:"bar",k:2}
{s:"foo",k:2}
{s:"bar",k:3}
Sort by secondary key in reverse order when the primary keys are identical
echo '{s:"bar",k:2}{s:"bar",k:3}{s:"foo",k:2}' | super -z -c 'sort k,s desc' -
=>
{s:"foo",k:2}
{s:"bar",k:2}
{s:"bar",k:3}
Sort with a numeric expression
echo '{s:"sum 2",x:2,y:0}{s:"sum 3",x:1,y:2}{s:"sum 0",x:-1,y:-1}' |
super -z -c 'sort x+y' -
=>
{s:"sum 0",x:-1,y:-1}
{s:"sum 2",x:2,y:0}
{s:"sum 3",x:1,y:2}
Case sensitivity affects sorting "lowest value to highest" in string values
echo '{word:"hello"}{word:"Hi"}{word:"WORLD"}' |
super -z -c 'sort' -
=>
{word:"Hi"}
{word:"WORLD"}
{word:"hello"}
Case-insensitive sort by using a string expression
echo '{word:"hello"}{word:"Hi"}{word:"WORLD"}' |
super -z -c 'sort lower(word)' -
=>
{word:"hello"}
{word:"Hi"}
{word:"WORLD"}
Shorthand to reverse the sort order for each key
echo '2 null 1 3' | super -z -c 'sort -r' -
=>
3
2
1
null