Lateral Subqueries
Lateral subqueries provide a powerful means to apply a Zed query
to each subsequence of values generated from an outer sequence of values.
The inner query may be any pipeline operator sequence (excluding
from
operators) and may refer to values from
the outer sequence.
This pattern rhymes with the SQL pattern of a "lateral join", which runs a subquery for each row of the outer query's results.
Lateral subqueries are created using the scoped form of the
over
operator. They may be nested to arbitrary depth
and accesses to variables in parent lateral query bodies follows lexical
scoping.
For example,
echo '{s:"foo",a:[1,2]} {s:"bar",a:[3]}' |
zq -z 'over a with name=s => (yield {name,elem:this})' -
produces
{name:"foo",elem:1}
{name:"foo",elem:2}
{name:"bar",elem:3}
Here the lateral scope, described below, creates a subquery
yield {name,elem:this}
for each subsequence of values derived from each outer input value. In the example above, there are two input values:
{s:"foo",a:[1,2]}
{s:"bar",a:[3]}
which imply two subqueries derived from the over
operator traversing a
.
The first subquery thus operates on the input values 1, 2
with the variable
name
set to "foo" assigning 1
and then 2
to this
, thereby emitting
{name:"foo",elem:1}
{name:"foo",elem:2}
and the second subquery operates on the input value 3
with the variable
name
set to "bar", emitting
{name:"bar",elem:3}
You can also import a parent-scope field reference into the inner scope by simply referring to its name without assignment, e.g.,
echo '{s:"foo",a:[1,2]} {s:"bar",a:[3]}' |
zq -z 'over a with s => (yield {s,elem:this})' -
produces
{s:"foo",elem:1}
{s:"foo",elem:2}
{s:"bar",elem:3}
Lateral Scope
A lateral scope has the form => ( <query> )
and currently appears
only the context of an over
operator,
as illustrated above, and has the form:
over ... with <elem> [, <elem> ...] => ( <query> )
where <elem>
has either an assignment form
<var>=<expr>
or a field reference form
<field>
For each input value to the outer scope, the assignment form creates a binding
between each <expr>
evaluated in the outer scope and each <var>
, which
represents a new symbol in the inner scope of the <query>
.
In the field reference form, a single identifier <field>
refers to a field
in the parent scope and makes that field's value available in the lateral scope
via the same name.
Note that any such variable definitions override implied field references of
this
. If a both a field named x
and a variable named x
need be
referenced in the lateral scope, the field reference should be qualified as
this.x
while the variable is referenced simply as x
.
The <query>
is evaluated once per outer value
on the sequence generated by the over
expression. In the lateral scope,
the value this
refers to the inner sequence generated from the over
expressions.
This query runs to completion for each inner sequence and emits
each subquery result as each inner sequence traversal completes.
This structure is powerful because any pipeline operator sequence (excluding
from
operators) can appear in the body of
the lateral scope. In contrast to the yield
example above, a sort
could be
applied to each subsequence in the subquery, where sort
reads all values of the subsequence, sorts them, emits them, then
repeats the process for the next subsequence. For example,
echo '[3,2,1] [4,1,7] [1,2,3]' |
zq -z 'over this => (sort this | collect(this))' -
produces
[1,2,3]
[1,4,7]
[1,2,3]
Lateral Expressions
Lateral subqueries can also appear in expression context using the parenthesized form:
( over <expr> [, <expr>...] [with <var>=<expr> [, ... <var>[=<expr>]] | <lateral> )
The parentheses disambiguate a lateral expression from a lateral pipeline operator.
This form must always include a lateral scope as indicated by <lateral>
.
The lateral expression is evaluated by evaluating each <expr>
and feeding
the results as inputs to the <lateral>
pipeline. Each time the
lateral expression is evaluated, the lateral operators are run to completion,
e.g.,
echo '[3,2,1] [4,1,7] [1,2,3]' | zq -z 'yield (over this | sum(this))' -
produces
6
12
6
This structure generalizes to any more complicated expression context, e.g., we can embed multiple lateral expressions inside of a record literal and use the spread operator to tighten up the output:
echo '[3,2,1] [4,1,7] [1,2,3]' |
zq -z '
{...(over this | sort this | sorted:=collect(this)),
...(over this | sum:=sum(this))}' -
produces
{sorted:[1,2,3],sum:6}
{sorted:[1,4,7],sum:12}
{sorted:[1,2,3],sum:6}
Because Zed expressions evaluate to a single result, if multiple values remain at the conclusion of the lateral pipeline, they are automatically wrapped in an array, e.g.,
echo '{x:1} {x:[2]} {x:[3,4]}' |
zq -z 'yield {s:(over x | yield this+1)}' -
produces
{s:2}
{s:3}
{s:[4,5]}
To handle such dynamic input data, you can ensure your downstream pipeline always receives consistently packaged values by explicitly wrapping the result of the lateral scope, e.g.,
echo '{x:1} {x:[2]} {x:[3,4]}' |
zq -z 'yield {s:(over x | yield this+1 | collect(this))}' -
produces
{s:[2]}
{s:[3]}
{s:[4,5]}
Similarly, a primitive value may be consistently produced by concluding the
lateral scope with an operator such as head
or
tail
, or by applying certain aggregate functions
such as done with sum
above.