Skip to main content
Version: Next

Lateral Subqueries

Lateral subqueries provide a powerful means to apply a Zed query to each subsequence of values generated from an outer sequence of values. The inner query may be any Zed query and may refer to values from the outer sequence.

Lateral subqueries are created using the scoped form of the over operator and may be nested to arbitrary depth.

For example,

echo '{s:"foo",a:[1,2]} {s:"bar",a:[3]}' | zq -z 'over a with name=s => (yield {name,elem:this})' -

produces

{name:"foo",elem:1}
{name:"foo",elem:2}
{name:"bar",elem:3}

Here the lateral scope, described below, creates a subquery

yield {name,elem:this}

for each subsequence of values derived from each outer input value. In the example above, there are two input values:

{s:"foo",a:[1,2]}
{s:"bar",a:[3]}

which imply two subqueries derived from the over operator traversing a. The first subquery thus operates on the input values 1, 2 with the variable name set to "foo" assigning 1 and then 2 to this, thereby emitting

{name:"foo",elem:1}
{name:"foo",elem:2}

and the second subquery operators on the input value 3 with the variable name set to "bar", emitting

{name:"bar",elem:3}

You can also import a parent-scope field reference into the inner scope by simply referring to its name without assignment, e.g.,

echo '{s:"foo",a:[1,2]} {s:"bar",a:[3]}' | zq -z 'over a with s => (yield {s,elem:this})' -

produces

{s:"foo",elem:1}
{s:"foo",elem:2}
{s:"bar",elem:3}

Lateral Scope

A lateral scope has the form => ( <query> ) and currently appears only the context of an over operator, as illustrated above, and has the form:

over ... with <elem> [, <elem> ...] => ( <query> )

where <elem> has either an assignment form

<var>=<expr>

or a field reference form

<field>

For each input value to the outer scope, the assignment form creates a binding between each <expr> evaluated in the outer scope and each <var>, which represents a new symbol in the inner scope of the <query>. In the field reference form, a single identifier <field> refers to a field in the parent scope and makes that field's value available in the lateral scope with the same name.

The <query>, which may be any Zed query, is evaluated once per outer value on the sequence generated by the over expression. In the lateral scope, the value this refers to the inner sequence generated from the over expressions. This query runs to completion for each inner sequence and emits each subquery result as each inner sequence traversal completes.

This structure is powerful because any Zed query can appear in the body of the lateral scope. In contrast to the yield example, a sort could be applied to each subsequence in the subquery, where sort reads all values of the subsequence, sorts them, emits them, then repeats the process for the next subsequence. For example,

echo '[3,2,1] [4,1,7] [1,2,3]' | zq -z 'over this => (sort this | collect(this))' -

produces

[1,2,3]
[1,4,7]
[1,2,3]

Lateral Expressions

Lateral subqueries can also appear in expression context using the parenthesized form:

( over <expr> [, <expr>...] [with <var>=<expr> [, ... <var>[=<expr>]] | <lateral> )

Note that the parentheses disambiguate a lateral expression from a lateral dataflow operator.

This form must always include a lateral scope as indicated by <lateral>, which can be any dataflow operator sequence excluding from operators. As with the over operator, values from the outer scope can be brought into the lateral scope using the with clause.

The lateral expression is evaluated by evaluating each <expr> and feeding the results as inputs to the <lateral> dataflow operators. Each time the lateral expression is evaluated, the lateral operators are run to completion, e.g.,

echo '[3,2,1] [4,1,7] [1,2,3]' | zq -z 'yield (over this | sum(this))' -

produces

6
12
6

This structure generalizes to any more complicated expression context, e.g., we can embed multiple lateral expressions inside of a record literal and use the spread operator to tighten up the output:

echo '[3,2,1] [4,1,7] [1,2,3]' | zq -z '{...(over this | sort this | sorted:=collect(this)),...(over this | sum:=sum(this))}' -

produces

{sorted:[1,2,3],sum:6}
{sorted:[1,4,7],sum:12}
{sorted:[1,2,3],sum:6}