Skip to main content
Version: Next

Lateral Subqueries

Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. The inner query may be any pipeline operator sequence (excluding from operators) and may refer to values from the outer sequence.

Note

This pattern rhymes with the SQL pattern of a "lateral join", which runs a subquery for each row of the outer query's results.

Lateral subqueries are created using the scoped form of the over operator. They may be nested to arbitrary depth and accesses to variables in parent lateral query bodies follows lexical scoping.

For example,

echo '{s:"foo",a:[1,2]} {s:"bar",a:[3]}' |
super -z -c 'over a with name=s => (yield {name,elem:this})' -

produces

{name:"foo",elem:1}
{name:"foo",elem:2}
{name:"bar",elem:3}

Here the lateral scope, described below, creates a subquery

yield {name,elem:this}

for each subsequence of values derived from each outer input value. In the example above, there are two input values:

{s:"foo",a:[1,2]}
{s:"bar",a:[3]}

which imply two subqueries derived from the over operator traversing a. The first subquery thus operates on the input values 1, 2 with the variable name set to "foo" assigning 1 and then 2 to this, thereby emitting

{name:"foo",elem:1}
{name:"foo",elem:2}

and the second subquery operates on the input value 3 with the variable name set to "bar", emitting

{name:"bar",elem:3}

You can also import a parent-scope field reference into the inner scope by simply referring to its name without assignment, e.g.,

echo '{s:"foo",a:[1,2]} {s:"bar",a:[3]}' |
super -z -c 'over a with s => (yield {s,elem:this})' -

produces

{s:"foo",elem:1}
{s:"foo",elem:2}
{s:"bar",elem:3}

Lateral Scope

A lateral scope has the form => ( <query> ) and currently appears only the context of an over operator, as illustrated above, and has the form:

over ... with <elem> [, <elem> ...] => ( <query> )

where <elem> has either an assignment form

<var>=<expr>

or a field reference form

<field>

For each input value to the outer scope, the assignment form creates a binding between each <expr> evaluated in the outer scope and each <var>, which represents a new symbol in the inner scope of the <query>. In the field reference form, a single identifier <field> refers to a field in the parent scope and makes that field's value available in the lateral scope via the same name.

Note that any such variable definitions override implied field references of this. If a both a field named x and a variable named x need be referenced in the lateral scope, the field reference should be qualified as this.x while the variable is referenced simply as x.

The <query> is evaluated once per outer value on the sequence generated by the over expression. In the lateral scope, the value this refers to the inner sequence generated from the over expressions. This query runs to completion for each inner sequence and emits each subquery result as each inner sequence traversal completes.

This structure is powerful because any pipeline operator sequence (excluding from operators) can appear in the body of the lateral scope. In contrast to the yield example above, a sort could be applied to each subsequence in the subquery, where sort reads all values of the subsequence, sorts them, emits them, then repeats the process for the next subsequence. For example,

echo '[3,2,1] [4,1,7] [1,2,3]' | 
super -z -c 'over this => (sort this | collect(this))' -

produces

[1,2,3]
[1,4,7]
[1,2,3]

Lateral Expressions

Lateral subqueries can also appear in expression context using the parenthesized form:

( over <expr> [, <expr>...] [with <var>=<expr> [, ... <var>[=<expr>]] | <lateral> )
tip

The parentheses disambiguate a lateral expression from a lateral pipeline operator.

This form must always include a lateral scope as indicated by <lateral>.

The lateral expression is evaluated by evaluating each <expr> and feeding the results as inputs to the <lateral> pipeline. Each time the lateral expression is evaluated, the lateral operators are run to completion, e.g.,

echo '[3,2,1] [4,1,7] [1,2,3]' | super -z -c 'yield (over this | sum(this))' -

produces

6
12
6

This structure generalizes to any more complicated expression context, e.g., we can embed multiple lateral expressions inside of a record literal and use the spread operator to tighten up the output:

echo '[3,2,1] [4,1,7] [1,2,3]' |
super -z -c '
{...(over this | sort this | sorted:=collect(this)),
...(over this | sum:=sum(this))}' -

produces

{sorted:[1,2,3],sum:6}
{sorted:[1,4,7],sum:12}
{sorted:[1,2,3],sum:6}

Because Zed expressions evaluate to a single result, if multiple values remain at the conclusion of the lateral pipeline, they are automatically wrapped in an array, e.g.,

echo '{x:1} {x:[2]} {x:[3,4]}' |
super -z -c 'yield {s:(over x | yield this+1)}' -

produces

{s:2}
{s:3}
{s:[4,5]}

To handle such dynamic input data, you can ensure your downstream pipeline always receives consistently packaged values by explicitly wrapping the result of the lateral scope, e.g.,

echo '{x:1} {x:[2]} {x:[3,4]}' |
super -z -c 'yield {s:(over x | yield this+1 | collect(this))}' -

produces

{s:[2]}
{s:[3]}
{s:[4,5]}

Similarly, a primitive value may be consistently produced by concluding the lateral scope with an operator such as head or tail, or by applying certain aggregate functions such as done with sum above.