Skip to main content
Version: v1.8.0

Zed and Schools Data

This document provides a beginner's overview of the Zed language using the zq command and real-world data relating to California schools and test scores.

1. Getting Started

If you want to follow along by running the examples, simply install zq and copy the data files used here into your working directory:

curl https://raw.githubusercontent.com/brimdata/zed/main/testdata/edu/schools.zson > schools.zson
curl https://raw.githubusercontent.com/brimdata/zed/main/testdata/edu/testscores.zson > testscores.zson
curl https://raw.githubusercontent.com/brimdata/zed/main/testdata/edu/webaddrs.zson > webaddrs.zson

These files are all encoded in the human-readable ZSON format so you can easily have a look at them. ZSON is not optimized for speed but these files are small enough that the example queries here will all run fast enough.

2. Exploring the Data

It's always a good idea to get a feel for any new data, which is easy to do with Zed. Zed's sample operator is just the ticket --- sample will select one representative value from each "shape" of data present in the input, e.g.,

zq -Z 'sample | sort this' schools.zson testscores.zson webaddrs.zson

displays

{
AvgScrMath: null (uint16),
AvgScrRead: null (uint16),
AvgScrWrite: null (uint16),
cname: "Riverside",
dname: "Beaumont Unified",
sname: "21st Century Learning Institute"
}
{
School: "'3R' Middle",
District: "Nevada County Office of Education",
City: "Nevada City",
County: "Nevada",
Zip: "95959",
Latitude: null (float64),
Longitude: null (float64),
Magnet: null (bool),
OpenDate: 1995-10-30T00:00:00Z,
ClosedDate: 1996-06-28T00:00:00Z,
Phone: null (string),
StatusType: "Merged",
Website: null (string)
}
{
Website: "abbott.lynwood.edlioschool.com",
addr: 151.101.0.80
}

Note that the -Z option tells zq to "pretty print" the output in the ZSON format. Furthermore, you will notice these examples often include a -z to indicate line-oriented ZSON, which is the default when zq is writing to standard output. You can omit -z when running these commands on the terminal but we include them here for clarity and because all of the examples are tied to automated testing, which does not utilize a terminal for standard output.

You can also quickly see a list of the leaf-value data types with this query:

zq -Z "sample | over this | by typeof(value) | yield typeof | sort" schools.zson testscores.zson webaddrs.zson

which emits

<uint16>
<time>
<float64>
<bool>
<string>
<ip>

Nothing too tricky here. After a quick review of the shapes and types, you will notice they are just three relatively simple tables, which is no surprise since we obtained the original data from SQLite database files.

3. Searching

Searching with Zed is easy but powerful because it blends together the keyword search patterns of Web or email search with the more precise predicate matching patterns of query languages like SQL.

With this in mind, you can simply start typing keyword search phrases in Zed and they will usually do the right thing.

With keyword search, you can just type a keyword that you want to look for, e.g.,

zq -z Ygnacio schools.zson

which gives the one matching record:

{School:"Valencia (Ygnacio) High (Alternative)",District:"Delano Joint Union High",City:"Delano",County:"Kern",Zip:"93215-1526",Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:2009-08-01T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"Ygnacio Valley Elementary",District:"Mt. Diablo Unified",City:"Concord",County:"Contra Costa",Zip:"94518-2595",Latitude:37.950182,Longitude:-122.0283,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(925) 682-9336",StatusType:"Active",Website:"www.mdusd.org"}
{School:"Ygnacio Valley High",District:"Mt. Diablo Unified",City:"Concord",County:"Contra Costa",Zip:"94518-2899",Latitude:37.936674,Longitude:-122.02325,Magnet:true,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(925) 685-8414",StatusType:"Active",Website:"www.mdusd.org"}

As with keyword search, you can simply concantenate keywords to require both of them to match (i.e., a "logical AND" of the two search predicates), e.g. we can whittle down the two records above by adding the keyword Delano

zq -z 'Ygnacio Delano' schools.zson

and we get just the one record that matches:

{School:"Valencia (Ygnacio) High (Alternative)",District:"Delano Joint Union High",City:"Delano",County:"Kern",Zip:"93215-1526",Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:2009-08-01T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}

Under the covers, a keyword search translates to Zed's grep function, which lets you search specific fields instead of the entire input value, e.g., we can search for the string "bar" in the City field and list all the unique cities that match with a group-by:

zq -f text 'grep("bar", City) | by City | yield City | sort' schools.zson

produces

Barstow
Big Bar
Diamond Bar
Long Barn
Santa Barbara
Sawyers Bar
Somes Bar

In this example, we use the yield operator here to pull the City field out of the record result and we used -f text to output the results in "text" format instead of ZSON so the strings are printed without quotes. The text format is often useful for piping the output to other Unix tools that might not expect quotes.

When the keyword you want to search for doesn't fit into the keyword syntax, i.e., it has spaces or special characters, you should use a literal string search.

3.2 Globs

To find values that may contain arbitrary substrings between or alongside the desired word(s), one or more glob-style wildcards can be used.

For example, the following search finds records that contain school names that have some additional text between ACE and Academy:

zq -z 'ACE*Academy' schools.zson

produces

{School:"ACE Empower Academy",District:"Santa Clara County Office of Education",City:"San Jose",County:"Santa Clara",Zip:"95116-3423",Latitude:37.348601,Longitude:-121.8446,Magnet:false,OpenDate:2008-08-26T00:00:00Z,ClosedDate:null(time),Phone:"(408) 729-3920",StatusType:"Active",Website:"www.acecharter.org"}
{School:"ACE Inspire Academy",District:"San Jose Unified",City:"San Jose",County:"Santa Clara",Zip:"95112-6334",Latitude:37.350981,Longitude:-121.87205,Magnet:false,OpenDate:2015-08-03T00:00:00Z,ClosedDate:null(time),Phone:"(408) 295-6008",StatusType:"Active",Website:"www.acecharter.org"}

Glob wildcards only have effect when used within keywords searches. An asterisk in a string literal search will match using the literal asterisk character embedded in the string.

3.3 Regular Expressions

For pattern matching beyond glob wildcards, regular expressions (regexps) are also available. To use them, simply place a / character before and after the regexp.

For example, since there are many high schools in our sample data, to find only records containing strings that begin with the word High:

zq -z '/^High /' schools.zson

produces

{School:"High Desert",District:"Soledad-Agua Dulce Union Eleme",City:"Acton",County:"Los Angeles",Zip:"93510",Latitude:34.490977,Longitude:-118.19646,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:1993-06-30T00:00:00Z,Phone:null(string),StatusType:"Merged",Website:null(string)}
{School:"High Desert",District:"Acton-Agua Dulce Unified",City:"Acton",County:"Los Angeles",Zip:"93510-1757",Latitude:34.492578,Longitude:-118.19039,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(661) 269-0310",StatusType:"Active",Website:null(string)}
{School:"High Desert Academy",District:"Eastern Sierra Unified",City:"Benton",County:"Mono",Zip:"93512-0956",Latitude:37.818597,Longitude:-118.47712,Magnet:null(bool),OpenDate:1996-09-03T00:00:00Z,ClosedDate:2012-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:"www.esusd.org"}
{School:"High Desert Academy of Applied Arts and Sciences",District:"Victor Valley Union High",City:"Victorville",County:"San Bernardino",Zip:"92394",Latitude:34.531144,Longitude:-117.31697,Magnet:null(bool),OpenDate:2004-09-07T00:00:00Z,ClosedDate:2011-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:"www.hdaaas.org"}
...

Further details for regular expressions are available in the Zed language documention.

Sometimes you want to search for values that aren't strings, e.g., numbers or IP addresses. Zed can search for any primitive-type value just typing that value like a keyword. In this case, the search looks for both fields of the value's type for an exact match as well as a substring match for the value as typed in any strings encountered.

For example, searching across both our school and test score data sources for the number 596 matches records that contain numeric fields of this precise value (such as from the test scores) and also records that contain string fields (such as the ZIP code and phone number fields in the school data), e.g.,

zq -z '596' testscores.zson schools.zson

finds these records

{AvgScrMath:591(uint16),AvgScrRead:610(uint16),AvgScrWrite:596(uint16),cname:"Los Angeles",dname:"William S. Hart Union High",sname:"Academy of the Canyons"}
{AvgScrMath:614(uint16),AvgScrRead:596(uint16),AvgScrWrite:592(uint16),cname:"Alameda",dname:"Pleasanton Unified",sname:"Amador Valley High"}
{AvgScrMath:620(uint16),AvgScrRead:596(uint16),AvgScrWrite:590(uint16),cname:"Yolo",dname:"Davis Joint Unified",sname:"Davis Senior High"}
{School:"Achieve Charter School of Paradise Inc.",District:"Paradise Unified",City:"Paradise",County:"Butte",Zip:"95969-3913",Latitude:39.760323,Longitude:-121.62078,Magnet:false,OpenDate:2005-09-12T00:00:00Z,ClosedDate:null(time),Phone:"(530) 872-4100",StatusType:"Active",Website:"www.achievecharter.org"}
{School:"Alliance Ouchi-O'Donovan 6-12 Complex",District:"Los Angeles Unified",City:"Los Angeles",County:"Los Angeles",Zip:"90043-2622",Latitude:33.993484,Longitude:-118.32246,Magnet:false,OpenDate:2006-09-05T00:00:00Z,ClosedDate:null(time),Phone:"(323) 596-2290",StatusType:"Active",Website:"http://ouchihs.org"}
...

Literal search also works for string values. This is useful when the string value to search cannot be represented as a keyword due to embedded spaces or special characters.

Let's say we've noticed that a couple of the school names in our sample data include the string Defunct=. An attempt to enter this as a keyword search causes a parse error, e.g.,

zq -z 'Defunct=' *.zson

produces

zq: error parsing Zed at column 8:
Defunct=
=== ^ ===

However, wrapping in quotes to performa a string-literal search gives the desired result:

zq -z '"Defunct="' schools.zson

produces

{School:"Lincoln Elem 'Defunct=",District:"Modesto City Elementary",City:null(string),County:"Stanislaus",Zip:null(string),Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:1989-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"Lovell Elem 'Defunct=",District:"Cutler-Orosi Joint Unified",City:null(string),County:"Tulare",Zip:null(string),Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:1989-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}

Quoted strings are particularly handy when you're looking for long, specific strings that may have several special characters in them. For example, let's say we're looking for information on the Union Hill Elementary district. Entered without quotes, we end up matching far more records than we intended since each space character between words is treated as a Boolean and, e.g.,

zq -z 'Union Hill Elementary' schools.zson

produces

{School:"A. M. Thomas Middle",District:"Lost Hills Union Elementary",City:"Lost Hills",County:"Kern",Zip:"93249-0158",Latitude:35.615269,Longitude:-119.69955,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(661) 797-2626",StatusType:"Active",Website:null(string)}
{School:"Alview Elementary",District:"Alview-Dairyland Union Elementary",City:"Chowchilla",County:"Madera",Zip:"93610-9225",Latitude:37.050632,Longitude:-120.4734,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(559) 665-2275",StatusType:"Active",Website:null(string)}
{School:"Anaverde Hills",District:"Westside Union Elementary",City:"Palmdale",County:"Los Angeles",Zip:"93551-5518",Latitude:34.564651,Longitude:-118.18012,Magnet:false,OpenDate:2005-08-15T00:00:00Z,ClosedDate:null(time),Phone:"(661) 575-9923",StatusType:"Active",Website:null(string)}
{School:"Apple Blossom",District:"Twin Hills Union Elementary",City:"Sebastopol",County:"Sonoma",Zip:"95472-3917",Latitude:38.387396,Longitude:-122.84954,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(707) 823-1041",StatusType:"Active",Website:null(string)}
...

However, wrapping the entire search term in quotes allows us to search for the complete string, including the spaces, e.g.,

zq -z '"Union Hill Elementary"' schools.zson

produces

{School:"Highland Oaks Elementary",District:"Union Hill Elementary",City:"Grass Valley",County:"Nevada",Zip:"95945",Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1997-09-02T00:00:00Z,ClosedDate:2003-07-02T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"Union Hill 3R Community Day",District:"Union Hill Elementary",City:"Grass Valley",County:"Nevada",Zip:"95945",Latitude:39.229055,Longitude:-121.07127,Magnet:null(bool),OpenDate:2003-08-20T00:00:00Z,ClosedDate:2011-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:"www.uhsd.k12.ca.us"}
{School:"Union Hill Charter Home",District:"Union Hill Elementary",City:"Grass Valley",County:"Nevada",Zip:"95945-8805",Latitude:39.204457,Longitude:-121.03829,Magnet:false,OpenDate:1995-07-14T00:00:00Z,ClosedDate:2015-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:"www.uhsd.k12.ca.us"}
{School:"Union Hill Elementary",District:"Union Hill Elementary",City:"Grass Valley",County:"Nevada",Zip:"95945-8805",Latitude:39.204457,Longitude:-121.03829,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(530) 273-8456",StatusType:"Active",Website:"www.uhsd.k12.ca.us"}
{School:"Union Hill Middle",District:"Union Hill Elementary",City:"Grass Valley",County:"Nevada",Zip:"94945-8805",Latitude:39.205006,Longitude:-121.03778,Magnet:false,OpenDate:2013-08-14T00:00:00Z,ClosedDate:null(time),Phone:"(530) 273-8456",StatusType:"Active",Website:"www.uhsd.k12.ca.us"}
{School:null(string),District:"Union Hill Elementary",City:"Grass Valley",County:"Nevada",Zip:"95945-8730",Latitude:39.208869,Longitude:-121.03551,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(530) 273-0647",StatusType:"Active",Website:"www.uhsd.k12.ca.us"}

Search terms can also be include Boolean predicates adhering to Zed's expression syntax.

In particular, a search result can be narrowed down to include only records that contain a certain value in a particular named field. For example, the following search will only match records containing the field called District where it is set to the precise string value Winton:

zq -z 'District=="Winton"' schools.zson

produces

{School:"Frank Sparkes Elementary",District:"Winton",City:"Winton",County:"Merced",Zip:"95388-0008",Latitude:37.382084,Longitude:-120.61847,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(209) 357-6180",StatusType:"Active",Website:null(string)}
{School:"Sybil N. Crookham Elementary",District:"Winton",City:"Winton",County:"Merced",Zip:"95388-0130",Latitude:37.389501,Longitude:-120.61636,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(209) 357-6182",StatusType:"Active",Website:null(string)}
{School:"Winfield Elementary",District:"Winton",City:"Winton",County:"Merced",Zip:"95388",Latitude:37.389121,Longitude:-120.60442,Magnet:false,OpenDate:2007-08-13T00:00:00Z,ClosedDate:null(time),Phone:"(209) 357-6891",StatusType:"Active",Website:null(string)}
{School:"Winton Middle",District:"Winton",City:"Winton",County:"Merced",Zip:"95388-1477",Latitude:37.379938,Longitude:-120.62263,Magnet:false,OpenDate:1990-07-20T00:00:00Z,ClosedDate:null(time),Phone:"(209) 357-6189",StatusType:"Active",Website:null(string)}
{School:null(string),District:"Winton",City:"Winton",County:"Merced",Zip:"95388-0008",Latitude:37.389467,Longitude:-120.6147,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(209) 357-6175",StatusType:"Active",Website:"www.winton.k12.ca.us"}

Because the right-hand-side value to which we were comparing was a string, it was necessary to wrap it in quotes. If this string were written as a keyword, it would have been interpreted as a field name as Zed field references look like keywords in the context of an expression.

For example, to see the records in which the school and district name are the same:

zq -z 'District==School' schools.zson

produces

{School:"Adelanto Elementary",District:"Adelanto Elementary",City:"Adelanto",County:"San Bernardino",Zip:"92301-1734",Latitude:34.576166,Longitude:-117.40944,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(760) 246-5892",StatusType:"Active",Website:null(string)}
{School:"Allensworth Elementary",District:"Allensworth Elementary",City:"Allensworth",County:"Tulare",Zip:"93219-9709",Latitude:35.864487,Longitude:-119.39068,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(661) 849-2401",StatusType:"Active",Website:null(string)}
{School:"Alta Loma Elementary",District:"Alta Loma Elementary",City:"Alta Loma",County:"San Bernardino",Zip:"91701-5007",Latitude:34.12597,Longitude:-117.59744,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(909) 484-5000",StatusType:"Active",Website:null(string)}
...

3.5.1 Type Dependence

When comparing values to the named fields, the value must be comparable to the data type of the field.

For instance, the "Zip" field in the schools data is a string rather than a number because of the extended ZIP+4 format that includes a hyphen and four additional digits and hence could not be represented in a numeric type, e.g.,

zq -z 'cut Zip' schools.zson

produces

{Zip:"95959"}
{Zip:"94607-1404"}
{Zip:"92395-3360"}
...

Because Zed does not coerce strings to numbers in expressions, the predicate Zip==95959 would not match the top record shown, since Zed recognizes the bare value 95959 as a number before comparing it to all the fields named Zip. However, Zip=="95959" would match, since the quotes cause Zed to treat the value as a string.

When confronted with messy data like this, you can usually cleaned it up to achieve the intent of your searches. For example, the dash suffix of the ZIP codes could be dropped, the string converted to an integer, then integer comparisons performed, i.e.,

zq -z 'cut Zip | int64(Zip[0:5])==94607' schools.zson

produces

{Zip:"94607-1404"}
...

3.5.2 Grep Predicates

When comparing a named field to a literal string, the quoted value is treated as an exact match.

For example, let's say we know there are several school names that start with Luther but only a couple district names that do. Because Luther only appears as a substring of the district names in our sample data, the following example produces no output, e.g.,

zq -z 'District=="Luther"' schools.zson

produces an empty output


To perform string searches inside of nested values, we can utilize the grep function with a glob, e.g.,

zq -z 'grep(Luther*, District)' schools.zson

produces

{School:"Luther Burbank Elementary",District:"Luther Burbank",City:"San Jose",County:"Santa Clara",Zip:"95128-1931",Latitude:37.323556,Longitude:-121.9267,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(408) 295-1814",StatusType:"Active",Website:null(string)}
{School:null(string),District:"Luther Burbank",City:"San Jose",County:"Santa Clara",Zip:"95128-1931",Latitude:37.323556,Longitude:-121.9267,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(408) 295-2450",StatusType:"Active",Website:"www.lbsd.k12.ca.us"}

Regular expressions can also be used with grep, e.g.,

zq -z 'grep(/^Sunset (Ranch|Ridge) Elementary/, School)' schools.zson

produces

{School:"Sunset Ranch Elementary",District:"Rocklin Unified",City:"Rocklin",County:"Placer",Zip:"95765-5441",Latitude:38.826425,Longitude:-121.2864,Magnet:false,OpenDate:2010-08-17T00:00:00Z,ClosedDate:null(time),Phone:"(916) 624-2048",StatusType:"Active",Website:"www.rocklin.k12.ca.us"}
{School:"Sunset Ridge Elementary",District:"Pacifica",City:"Pacifica",County:"San Mateo",Zip:"94044-2029",Latitude:37.653836,Longitude:-122.47919,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(650) 738-6687",StatusType:"Active",Website:null(string)}

3.5.3 Containment

Rather than testing for strict equality or pattern matches, you may want to determine if a value is among the many possible elements of a complex field. This is performed with in.

Since our sample data doesn't contain complex fields, we'll make one by using the union aggregate function to create a set-typed field called Schools that contains all unique school names per district. From these we'll find each set that contains a school named Lincoln Elementary, e.g.,

zq -Z 'Schools:=union(School) by District | "Lincoln Elementary" in Schools | sort this' schools.zson

produces

{
District: "Tulare City",
Schools: |[
"Alpine Vista",
"Mulcahy Middle",
"Tulare Support",
"Live Oak Middle",
"Los Tules Middle",
"Maple Elementary",
"Garden Elementary",
"Wilson Elementary",
"Cypress Elementary",
"Lincoln Elementary",
"Heritage Elementary",
"Pleasant Elementary",
"Cherry Avenue Middle",
"Roosevelt Elementary",
"Frank Kohn Elementary",
"Mission Valley Elementary",
"Tulare City Community Day"
]|
}
...

3.5.4 Comparisons

In addition to testing for equality via == and testing containment via in, the other common methods of comparison !=, <, >, <=, and >= are also available.

For example, the following search finds the schools that reported the highest math test scores,

zq -z 'AvgScrMath > 690' testscores.zson

produces

{AvgScrMath:698(uint16),AvgScrRead:639(uint16),AvgScrWrite:664(uint16),cname:"Santa Clara",dname:"Fremont Union High",sname:"Lynbrook High"}
{AvgScrMath:699(uint16),AvgScrRead:653(uint16),AvgScrWrite:671(uint16),cname:"Alameda",dname:"Fremont Unified",sname:"Mission San Jose High"}
{AvgScrMath:691(uint16),AvgScrRead:638(uint16),AvgScrWrite:657(uint16),cname:"Santa Clara",dname:"Fremont Union High",sname:"Monta Vista High"}

The same approach can be used to compare characters in string-type values, such as this search that finds school names at the end of the alphabet, e.g.,

zq -z 'School > "Z"' schools.zson

produces

{School:"Zamora Elementary",District:"Woodland Joint Unified",City:"Woodland",County:"Yolo",Zip:"95695-5137",Latitude:38.658609,Longitude:-121.79355,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(530) 666-3641",StatusType:"Active",Website:null(string)}
{School:"Zamorano Elementary",District:"San Diego Unified",City:"San Diego",County:"San Diego",Zip:"92139-2989",Latitude:32.680338,Longitude:-117.03864,Magnet:true,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(619) 430-1400",StatusType:"Active",Website:"http://new.sandi.net/schools/zamorano"}
{School:"Zane (Catherine L.) Junior High",District:"Eureka City High",City:"Eureka",County:"Humboldt",Zip:"95501-3140",Latitude:40.788118,Longitude:-124.14903,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:1998-06-30T00:00:00Z,Phone:null(string),StatusType:"Merged",Website:null(string)}
...

3.6 Boolean Logic

Search terms can be combined with Boolean logic as detailed in the Zed language documentation.

In particular, search terms separated by blank space implies Boolean and between the concatenated terms.

Let's say we're earching for information about academies that are flagged as being in a Pending status. We can simply concatenate the predicate for "Pending" and the keyword search for academy, e.g.,

zq -z 'StatusType=="Pending" academy' schools.zson

produces

{School:"Equitas Academy 4",District:"Los Angeles Unified",City:"Los Angeles",County:"Los Angeles",Zip:"90015-2412",Latitude:34.044837,Longitude:-118.27844,Magnet:false,OpenDate:2017-09-01T00:00:00Z,ClosedDate:null(time),Phone:"(213) 201-0440",StatusType:"Pending",Website:"http://equitasacademy.org"}
{School:"Pinnacle Academy Charter - Independent Study",District:"South Monterey County Joint Union High",City:"King City",County:"Monterey",Zip:"93930-3311",Latitude:36.208934,Longitude:-121.13286,Magnet:false,OpenDate:2016-08-08T00:00:00Z,ClosedDate:null(time),Phone:"(831) 385-4661",StatusType:"Pending",Website:"www.smcjuhsd.org"}
{School:"Rocketship Futuro Academy",District:"SBE - Rocketship Futuro Academy",City:"Concord",County:"Contra Costa",Zip:"94521-1522",Latitude:37.965658,Longitude:-121.96106,Magnet:false,OpenDate:2016-08-15T00:00:00Z,ClosedDate:null(time),Phone:"(301) 789-5469",StatusType:"Pending",Website:"www.rsed.org"}
{School:"Sherman Thomas STEM Academy",District:"Madera Unified",City:"Madera",County:"Madera",Zip:"93638",Latitude:36.982843,Longitude:-120.06665,Magnet:false,OpenDate:2017-08-09T00:00:00Z,ClosedDate:null(time),Phone:"(559) 674-1192",StatusType:"Pending",Website:"www.stcs.k12.ca.us"}
{School:null(string),District:"SBE - Rocketship Futuro Academy",City:"Concord",County:"Contra Costa",Zip:"94521-1522",Latitude:37.965658,Longitude:-121.96106,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(301) 789-5469",StatusType:"Pending",Website:"www.rsed.org"}

Of course, the logical AND may also be explicit and the above query can be written explicitly as

StatusType=="Pending" and academy

You can also combine predicates in a logical OR. Let'a revisit two of our previous example searches that each only returned a couple records, searching now with or to see them all at once, e.g.,

zq -z '"Defunct=" or ACE*Academy' schools.zson

produces

{School:"ACE Empower Academy",District:"Santa Clara County Office of Education",City:"San Jose",County:"Santa Clara",Zip:"95116-3423",Latitude:37.348601,Longitude:-121.8446,Magnet:false,OpenDate:2008-08-26T00:00:00Z,ClosedDate:null(time),Phone:"(408) 729-3920",StatusType:"Active",Website:"www.acecharter.org"}
{School:"ACE Inspire Academy",District:"San Jose Unified",City:"San Jose",County:"Santa Clara",Zip:"95112-6334",Latitude:37.350981,Longitude:-121.87205,Magnet:false,OpenDate:2015-08-03T00:00:00Z,ClosedDate:null(time),Phone:"(408) 295-6008",StatusType:"Active",Website:"www.acecharter.org"}
{School:"Lincoln Elem 'Defunct=",District:"Modesto City Elementary",City:null(string),County:"Stanislaus",Zip:null(string),Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:1989-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"Lovell Elem 'Defunct=",District:"Cutler-Orosi Joint Unified",City:null(string),County:"Tulare",Zip:null(string),Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:1989-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}

Use not to invert the matching logic in the term that comes to the right of it in your search.

For example, to find schools in the Dixon Unified district other than elementary schools, we invert the logic of a search term:

zq -z 'not elementary District=="Dixon Unified"' schools.zson

produces

{School:"C. A. Jacobs Intermediate",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620-3209",Latitude:38.446472,Longitude:-121.83631,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(707) 693-6350",StatusType:"Active",Website:"www.dixonusd.org"}
{School:"Dixon Adult",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620",Latitude:38.444818,Longitude:-121.82287,Magnet:null(bool),OpenDate:1996-09-09T00:00:00Z,ClosedDate:2016-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"Dixon Community Day",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620",Latitude:38.44755,Longitude:-121.82001,Magnet:false,OpenDate:2003-08-23T00:00:00Z,ClosedDate:null(time),Phone:"(707) 693-6340",StatusType:"Active",Website:"www.dixonusd.org"}
{School:"Dixon High",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620-9301",Latitude:38.436088,Longitude:-121.81672,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(707) 693-6330",StatusType:"Active",Website:null(string)}
{School:"Dixon Montessori Charter",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620-2702",Latitude:38.447984,Longitude:-121.83186,Magnet:false,OpenDate:2010-08-11T00:00:00Z,ClosedDate:null(time),Phone:"(707) 678-8953",StatusType:"Active",Website:"www.dixonmontessori.org"}
{School:"Dixon Unified Alter. Educ.",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620",Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1993-08-26T00:00:00Z,ClosedDate:1994-06-30T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"Maine Prairie High (Continuation)",District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620-3019",Latitude:38.447549,Longitude:-121.81986,Magnet:false,OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(707) 693-6340",StatusType:"Active",Website:null(string)}
{School:null(string),District:"Dixon Unified",City:"Dixon",County:"Solano",Zip:"95620-3447",Latitude:38.44468,Longitude:-121.82249,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(707) 693-6300",StatusType:"Active",Website:"www.dixonusd.org"}

Note that ! can also be used as alternative shorthand for not, e.g.,

! elementary District=="Dixon Unified"

3.6.1 Logical Grouping

Unless wrapped in parentheses, a search is evaluated in left-to-right order. Terms wrapped in parentheses will be evaluated first, overriding the default left-to-right evaluation.

For example, we've noticed there are some test score records that have null values for all three test scores:

zq -z 'AvgScrMath==null AvgScrRead==null AvgScrWrite==null' testscores.zson

produces

{AvgScrMath:null(uint16),AvgScrRead:null(uint16),AvgScrWrite:null(uint16),cname:"Riverside",dname:"Beaumont Unified",sname:"21st Century Learning Institute"}
{AvgScrMath:null(uint16),AvgScrRead:null(uint16),AvgScrWrite:null(uint16),cname:"Los Angeles",dname:"ABC Unified",sname:"ABC Secondary (Alternative)"}
...

We can easily filter these out by negating the search for these records, e.g.,

zq -z 'not (AvgScrMath==null AvgScrRead==null AvgScrWrite==null)' testscores.zson

produces

{AvgScrMath:371(uint16),AvgScrRead:376(uint16),AvgScrWrite:368(uint16),cname:"Los Angeles",dname:"Los Angeles Unified",sname:"APEX Academy"}
{AvgScrMath:367(uint16),AvgScrRead:359(uint16),AvgScrWrite:369(uint16),cname:"Alameda",dname:"Oakland Unified",sname:"ARISE High"}
{AvgScrMath:491(uint16),AvgScrRead:489(uint16),AvgScrWrite:484(uint16),cname:"Santa Clara",dname:"San Jose Unified",sname:"Abraham Lincoln High"}
...

Parentheses can also be nested, e.g.,

zq -z 'grep(*High*, sname) and (not (AvgScrMath==null AvgScrRead==null AvgScrWrite==null) and dname=="San Francisco Unified")' testscores.zson

produces

{AvgScrMath:504(uint16),AvgScrRead:467(uint16),AvgScrWrite:467(uint16),cname:"San Francisco",dname:"San Francisco Unified",sname:"Balboa High"}
{AvgScrMath:480(uint16),AvgScrRead:443(uint16),AvgScrWrite:431(uint16),cname:"San Francisco",dname:"San Francisco Unified",sname:"Burton (Phillip and Sala) Academic High"}
{AvgScrMath:413(uint16),AvgScrRead:410(uint16),AvgScrWrite:395(uint16),cname:"San Francisco",dname:"San Francisco Unified",sname:"City Arts and Tech High"}
...

Except when writing the most common searches that leverage only the implicit and, it's generally good practice to use parentheses even when not strictly necessary, just to make sure your queries clearly communicate their intended logic.

4. Record Operators

As with the data sets explored here, a very typical use case for Zed is to operate over structured logs or events that are all represented as Zed records. While Zed queries may operate over any sequence of values, the following operators are designed specifically to work on sequences of records:

  • cut - extract subsets of record fields into new records
  • drop - drop fields from record values
  • fuse - coerce all input values into a merged type
  • put - add or modify fields of records
  • rename - change the name of record fields

4.1 cut

cut produces output records from input records containing only the specified named fields.

This example returns only the name and opening date from our school records:

zq -Z 'cut School,OpenDate' schools.zson

produces

{
School: "'3R' Middle",
OpenDate: 1995-10-30T00:00:00Z
}
{
School: "100 Black Men of the Bay Area Community",
OpenDate: 2012-08-06T00:00:00Z
}
...

As long as some of the named fields are present, they are returned while absent fields are error("missing"). For instance, the following query is run against all three of our data sources and returns values from our school data that includes fields for both School and Website, values from our web address data that have the Website and addr fields, and the missing value from the test score data since it has none of these fields:

zq -z 'yosemiteuhsd | cut School,Website,addr' *.zson

produces

{School:null(string),Website:"www.yosemiteuhsd.com",addr:error("missing")}
{School:error("missing"),Website:"www.yosemiteuhsd.com",addr:104.253.209.210}

Here, we return only the sname and dname fields of the test scores while also renaming the fields:

zq -z 'cut School:=sname,District:=dname' testscores.zson

produces

{School:"21st Century Learning Institute",District:"Beaumont Unified"}
{School:"ABC Secondary (Alternative)",District:"ABC Unified"}
...

4.2 drop

drop produces output records from input records with the indicated fields dropped from the output.

This example return all the fields other than the score values in our test score data:

zq -z 'drop AvgScrMath,AvgScrRead,AvgScrWrite' testscores.zson

produces

{cname:"Riverside",dname:"Beaumont Unified",sname:"21st Century Learning Institute"}
{cname:"Los Angeles",dname:"ABC Unified",sname:"ABC Secondary (Alternative)"}
...

4.3 fuse

fuse produces output records from input records where the outputs all have a uniform type consisting of a fusion of the input types. Note that fuse operates in two passes: the first pass computes the output type and the second pass tranforms the records. Thus, all input must be read before any output is produced. If the input does not fit in memory, it is spilled to temporary storage.

Let's say you'd started with table-formatted output of all records in our data that reference the town of Geyserville, e.g.,

zq -f table 'Geyserville' *.zson

produces

School                            District            City        County Zip        Latitude  Longitude  Magnet OpenDate             ClosedDate           Phone          StatusType Website
Buena Vista High Geyserville Unified Geyserville Sonoma 95441-9670 38.722005 -122.89123 F 1980-07-01T00:00:00Z (707) 857-3592 Active -
Geyserville Community Day Geyserville Unified Geyserville Sonoma 95441 38.722005 -122.89123 - 2004-09-01T00:00:00Z 2010-06-30T00:00:00Z - Closed -
Geyserville Educational Park High Geyserville Unified Geyserville Sonoma 95441 38.722005 -122.89123 - 1980-07-01T00:00:00Z 2014-06-30T00:00:00Z - Closed -
Geyserville Elementary Geyserville Unified Geyserville Sonoma 95441-0108 38.705895 -122.90296 F 1980-07-01T00:00:00Z (707) 857-3410 Active www.gusd.com
Geyserville Middle Geyserville Unified Geyserville Sonoma 95441 38.722005 -122.89123 - 1980-07-01T00:00:00Z 2014-06-30T00:00:00Z - Closed -
Geyserville New Tech Academy Geyserville Unified Geyserville Sonoma 95441-9670 38.72015 -122.88534 F 2014-07-01T00:00:00Z (707) 857-3592 Active www.gusd.com
- Geyserville Unified Geyserville Sonoma 95441-9670 38.722005 -122.89123 - (707) 857-3592 Active www.gusd.com
AvgScrMath AvgScrRead AvgScrWrite cname dname sname
- - - Sonoma Geyserville Unified Geyserville New Tech Academy
- - - Sonoma Geyserville Unified -

School records were output first, so the preceding header row describes the names of those fields. Later on, two test score records were also output, so a header row describing its fields was also printed. This presentation accurately conveys the heterogeneous nature of the data, but changing schemas mid-stream is not allowed in formats such as CSV or other downstream tooling such as SQL. Indeed, zq halts its output in this case, e.g.,

zq -f csv 'Geyserville' *.zson

produces

School,District,City,County,Zip,Latitude,Longitude,Magnet,OpenDate,ClosedDate,Phone,StatusType,Website
Buena Vista High,Geyserville Unified,Geyserville,Sonoma,95441-9670,38.722005,-122.89123,false,1980-07-01T00:00:00Z,,(707) 857-3592,Active,
Geyserville Community Day,Geyserville Unified,Geyserville,Sonoma,95441,38.722005,-122.89123,,2004-09-01T00:00:00Z,2010-06-30T00:00:00Z,,Closed,
Geyserville Educational Park High,Geyserville Unified,Geyserville,Sonoma,95441,38.722005,-122.89123,,1980-07-01T00:00:00Z,2014-06-30T00:00:00Z,,Closed,
Geyserville Elementary,Geyserville Unified,Geyserville,Sonoma,95441-0108,38.705895,-122.90296,false,1980-07-01T00:00:00Z,,(707) 857-3410,Active,www.gusd.com
Geyserville Middle,Geyserville Unified,Geyserville,Sonoma,95441,38.722005,-122.89123,,1980-07-01T00:00:00Z,2014-06-30T00:00:00Z,,Closed,
Geyserville New Tech Academy,Geyserville Unified,Geyserville,Sonoma,95441-9670,38.72015,-122.88534,false,2014-07-01T00:00:00Z,,(707) 857-3592,Active,www.gusd.com
,Geyserville Unified,Geyserville,Sonoma,95441-9670,38.722005,-122.89123,,,,(707) 857-3592,Active,www.gusd.com
CSV output requires uniform records but multiple types encountered (consider 'fuse')

By using fuse, the unified schema of field names and types across all records is assembled in a first pass through the data stream, which enables the presentation of the results under a single, wider header row with no further interruptions between the subsequent data rows, e.g.,

zq -f csv 'Geyserville | fuse' *.zson

produces

School,District,City,County,Zip,Latitude,Longitude,Magnet,OpenDate,ClosedDate,Phone,StatusType,Website,AvgScrMath,AvgScrRead,AvgScrWrite,cname,dname,sname
Buena Vista High,Geyserville Unified,Geyserville,Sonoma,95441-9670,38.722005,-122.89123,false,1980-07-01T00:00:00Z,,(707) 857-3592,Active,,,,,,,
Geyserville Community Day,Geyserville Unified,Geyserville,Sonoma,95441,38.722005,-122.89123,,2004-09-01T00:00:00Z,2010-06-30T00:00:00Z,,Closed,,,,,,,
Geyserville Educational Park High,Geyserville Unified,Geyserville,Sonoma,95441,38.722005,-122.89123,,1980-07-01T00:00:00Z,2014-06-30T00:00:00Z,,Closed,,,,,,,
Geyserville Elementary,Geyserville Unified,Geyserville,Sonoma,95441-0108,38.705895,-122.90296,false,1980-07-01T00:00:00Z,,(707) 857-3410,Active,www.gusd.com,,,,,,
Geyserville Middle,Geyserville Unified,Geyserville,Sonoma,95441,38.722005,-122.89123,,1980-07-01T00:00:00Z,2014-06-30T00:00:00Z,,Closed,,,,,,,
Geyserville New Tech Academy,Geyserville Unified,Geyserville,Sonoma,95441-9670,38.72015,-122.88534,false,2014-07-01T00:00:00Z,,(707) 857-3592,Active,www.gusd.com,,,,,,
,Geyserville Unified,Geyserville,Sonoma,95441-9670,38.722005,-122.89123,,,,(707) 857-3592,Active,www.gusd.com,,,,,,
,,,,,,,,,,,,,,,,Sonoma,Geyserville Unified,Geyserville New Tech Academy
,,,,,,,,,,,,,,,,Sonoma,Geyserville Unified,

In addition to the csv format, the arrows, parquet, table, and zeek formats also benefit from fused records.

4.4 put

put produces output records from input records and either mutates or adds fields indicated by the expressions.

If multiple fields are written by put, the new field values are computed first and then they are all written simultaneously. As a result, a computed value cannot be referenced in another expression. If you need to re-use a computed result, this can be done by chaining multiple put operators. For example, this will not work

put N:=len(somelist), isbig:=N>10

but it could be written instead as

put N:=len(somelist) | put isbig:=N>10

For example, to add a field to our test score records representing the computed average of the math, reading, and writing scores for each school that reported them, we could say:

zq -Z 'AvgScrMath!=null | put AvgAll:=(AvgScrMath+AvgScrRead+AvgScrWrite)/3.0' testscores.zson

which produces

{
AvgScrMath: 371 (uint16),
AvgScrRead: 376 (uint16),
AvgScrWrite: 368 (uint16),
cname: "Los Angeles",
dname: "Los Angeles Unified",
sname: "APEX Academy",
AvgAll: 371.6666666666667
}
...

We can also use put to create derived tables and display them in tabular form using -f table, e.g.,

zq -f table 'AvgScrMath != null | put combined_scores:=AvgScrMath+AvgScrRead+AvgScrWrite | cut sname,combined_scores,AvgScrMath,AvgScrRead,AvgScrWrite | head 5' testscores.zson

produces

sname                       combined_scores AvgScrMath AvgScrRead AvgScrWrite
APEX Academy 1115 371 376 368
ARISE High 1095 367 359 369
Abraham Lincoln High 1464 491 489 484
Abraham Lincoln Senior High 1319 462 432 425
Academia Avance Charter 1148 386 380 382

As noted above the put keyword is entirely optional. Here we omit it and create a new field to hold the lowercase representation of the school District field:

zq -Z 'cut District | lower_district:=lower(District)' schools.zson

produces

{
District: "Nevada County Office of Education",
lower_district: "nevada county office of education"
}
...

4.5 rename

rename produces output records from input records where field mays are change. Note that a field's name can only be renamed as it exists inside of the record and cannot be moved between sub-records in a nested value.

The rename steps are applied left-to-right.

Here is a simple example that renames some fields in our test score data to match the field names from our school data:

zq -Z 'rename School:=sname,District:=dname,City:=cname' testscores.zson

produces

{
AvgScrMath: null (uint16),
AvgScrRead: null (uint16),
AvgScrWrite: null (uint16),
City: "Riverside",
District: "Beaumont Unified",
School: "21st Century Learning Institute"
}
...

As mentioned above, a field can only be renamed within its own record. In other words, a field cannot move between nested levels when being renamed.

For example, consider this sample input data nested.zson:

{
outer: {
inner: "MyValue"
}
}

The field inner can be renamed within that nested record, e.g.,

zq -Z 'rename outer.renamed:=outer.inner' nested.zson

produces

{
outer: {
renamed: "MyValue"
}
}

However, an attempt to rename it to a top-level field will fail, e.g.,

zq -Z 'rename toplevel:=outer.inner' nested.zson

produces this compile-time error message and the query is not run:

cannot rename outer.inner to toplevel

This goal could instead be achieved by combining put and drop, e.g.,

zq -Z 'put toplevel:=outer.inner | drop outer.inner' nested.zson

produces

{
toplevel: "MyValue"
}

5. Aggregates

The summarize operator performs zero or more aggregations with zero or more group-by expressions. Each aggregation is performed by an aggregate function that operates on batches of records to carry out a running computation over the values they contain. The summarize keyword is optional as the operato can be inferred from context.

As with SQL, multiple aggregate functions may be invoked at the same time. For example, to simultaneously calculate the minimum, maximum, and average of the math test scores:

zq -f table 'min(AvgScrMath),max(AvgScrMath),avg(AvgScrMath)' testscores.zson

produces

min max avg
289 699 484.99019042123484

5.1 Output Field Names

The output of an aggregation is a sequence of records that form a table, and the field names are specified in the assignments of the aggregate functions and the group-by assignemnts. When an expression is given without a field name, a name is derived from the expression. If a name cannot be derived, then a compile-time error is reported and the query does not run.

As just shown, by default the result returned is placed in a field with the same name as the aggregate function. You may instead use := to specify an explicit name for the generated field, e.g.,

zq -f table 'lowest:=min(AvgScrMath),highest:=max(AvgScrMath),typical:=avg(AvgScrMath)' testscores.zson

produces

lowest highest typical
289 699 484.99019042123484

5.2 Grouping

All aggregate functions may be invoked with one or more group-by expressions, which forms one or more group-by keys. Each unique group-by set defines input values upon which each aggregate function instance operates. If no group-by expression is provided, the aggregate function operates over all values in the input stream and a single record is the result.

5.3 Where Clause

A where clause may also be added to filter the values on which an aggregate function will operate. For example, this query calculates average math test scores for the cities of Los Angeles and San Francisco:

zq -Z 'LA_Math:=avg(AvgScrMath) where cname=="Los Angeles", SF_Math:=avg(AvgScrMath) where cname=="San Francisco"' testscores.zson

produces

{
LA_Math: 456.27341772151897,
SF_Math: 485.3636363636364
}

5.4 Aggregate Functions

This section depicts examples of various aggregate functions operating over thes "schools data set".

5.4.1 and

The and function accumulates a Boolean truth value based on the logical AND of all of its input.

Many of the school records in our sample data include websites, but many do not. The following query shows the cities in which all schools have a website. e.g.,

zq -Z 'all_schools_have_website:=and(Website!=null) by City | sort City' schools.zson

produces

{
City: "Acampo",
all_schools_have_website: false
}
{
City: "Acton",
all_schools_have_website: false
}
{
City: "Acton, CA",
all_schools_have_website: true
}
...

5.4.2 any

The any function produces one value from all of its input, chosen in an undefined manner.

This query gives the name of one of the schools in our sample data:

zq -z 'any(School)' schools.zson

For small inputs that fit in memory, this will typically be the first such field in the stream, but in general you should not rely upon this. In this case, the output is:

"'3R' Middle"

5.4.3 avg

The avg function computes an arithmetic mean over all of all of its input.

This query calculates the average of the math test scores:

zq -f table 'avg:=avg(AvgScrMath)' testscores.zson

and produces

avg
484.99019042123484

5.4.4 collect

The collect function accumulates all of its input into an array.

For schools in Fresno county that include websites, the following query constructs an ordered list per city of their websites along with a parallel list of which school each website represents:

zq -Z 'County=="Fresno" Website!=null | Websites:=collect(Website),Schools:=collect(School) by City | sort City' schools.zson

and produces

{
City: "Auberry",
Websites: [
"www.sierra.k12.ca.us",
"www.sierra.k12.ca.us",
"www.pineridge.k12.ca.us",
"www.pineridge.k12.ca.us"
],
Schools: [
"Auberry Elementary",
"Balch Camp Elementary",
"Pine Ridge Elementary"
]
}
{
City: "Big Creek",
Websites: [
"www.bigcreekschool.com",
"www.bigcreekschool.com"
],
Schools: [
"Big Creek Elementary"
]
}
...

5.4.5 count

The count function produces a count of all of its input values.

This query counts the number of records in each of our example data sources:

zq -z 'count()' schools.zson
zq -z 'count()' testscores.zson
zq -z 'count()' webaddrs.zson

and produces

17686(uint64)
2331(uint64)
2223(uint64)

The Website field is known to be in our school and website address data sources, but not in the test score data. To confirm this, we can count across all data sources and specify the named field, e.g.,

zq -z 'count(Website)' *.zson

produces

19909(uint64)

Since 17686 + 2223 = 19909, the count result is what we expected.

5.4.6 dcount

The dcount function produces a distinct count of all of its input values, i.e., the number of unique values in its input.

For large inputs, this value is an approximation of the actual value. The approcimation error is described in detail in the code and research linked from the HyperLogLog repository.

This query generates an approcimate count the number of unique school names in our sample data set:

zq -z 'dcount(School)' schools.zson

and produces

13804(uint64)

To see the precise value, which may take longer to execute, this query

zq -z 'count() by School | count()' schools.zson

produces

13876(uint64)

Here we saw the approximation was off by 0.3%.

5.4.7 max

The max function computes the maximum numeric value over all of its input.

To see the highest reported math test score, this query:

zq -f table 'max:=max(AvgScrMath)' testscores.zson

produces

max
699

5.4.8 min

The min function computes the minimum numeric value over all of its input.

To see the lowest reported math test score, this query

zq -f table 'min:=min(AvgScrMath)' testscores.zson

produces

min
289

5.4.9 or

The or function accumulates a Boolean truth value based on the logical OR of all of its input.

Many of the school records in our sample data include websites, but many do not. The following query shows the cities for which at least one school has a listed website:

zq -Z 'has_at_least_one_school_website:=or(Website!=null) by City | sort City' schools.zson

and produces

{
City: "Acampo",
has_at_least_one_school_website: true
}
{
City: "Acton",
has_at_least_one_school_website: true
}
{
City: "Acton, CA",
has_at_least_one_school_website: true
}
{
City: "Adelanto",
has_at_least_one_school_website: true
}
{
City: "Adin",
has_at_least_one_school_website: false
}
...

5.4.10 sum

The sum function computes the minimum numeric value over all of its input.

This query calculates the total of all the math, reading, and writing test scores across all schools:

zq -Z 'AllMath:=sum(AvgScrMath),AllRead:=sum(AvgScrRead),AllWrite:=sum(AvgScrWrite)' testscores.zson

and produces

{
AllMath: 840488 (uint64),
AllRead: 832260 (uint64),
AllWrite: 819632 (uint64)
}

5.4.11 union

The union function computes a set union over all of this input.

For schools in Fresno county that include websites, the following query constructs a set per city of all the unique websites for the schools in that city:

zq -Z 'County=="Fresno" Website!=null | Websites:=union(Website) by City | sort City' schools.zson

and produces

{
City: "Auberry",
Websites: |[
"www.sierra.k12.ca.us",
"www.pineridge.k12.ca.us"
]|
}
{
City: "Big Creek",
Websites: |[
"www.bigcreekschool.com"
]|
}
...

5.5 Group-by Examples

As mentioned above, the summarize operator may include group-by expressions that partitions the input sequence into groups that are processed independently from on another.

The output order of values from each grouped aggregation is undefined. To ensure a deterministic order, a sort operator may be used downstream of the aggregation.

In many of the examples, you will see a sort tacked onto the end of the computation. This ensures a deterministic order and reliable testing since all of these examples are subject to automated testing.

The simplest group-by example summarizes the unique values of the named field(s), which requires no aggregate function

For example, to see the different categories of status for the schools in our example data, this query:

zq -z 'by StatusType | sort' schools.zson

produces

{StatusType:"Active"}
{StatusType:"Closed"}
{StatusType:"Merged"}
{StatusType:"Pending"}

If you work a lot at the UNIX/Linux shell, you might have sought to accomplish the same via a familiar idiom: sort | uniq. This works in Zed, but the by shorthand is preferable, e.g.,

zq -z 'cut StatusType | sort | uniq' schools.zson

produces

{StatusType:"Active"}
{StatusType:"Closed"}
{StatusType:"Merged"}
{StatusType:"Pending"}

When specifying multiple comma-separated field names, a group is formed for each unique combination of values found in those fields. To see the average reading test scores and school count for each county/district pairing, this query:

zq -f table 'avg(AvgScrRead),count() by cname,dname | sort -r count' testscores.zson

produces

cname           dname                                              avg                count
Los Angeles Los Angeles Unified 416.83522727272725 202
San Diego San Diego Unified 472 44
Alameda Oakland Unified 414.95238095238096 27
San Francisco San Francisco Unified 454.36842105263156 26
...

Instead of a simple field name, any of the comma-separated group-by elements can be any Zed expression, which may appear in the form of a field assignment field:=expr

To see a count of how many school names of a particular character length appear in our example data, this query:

zq -f table 'count() by Name_Length:=len(School) | sort -r' schools.zson

produces

Name_Length count
89 2
85 2
84 2
83 1
...

The fields referenced in a by grouping may or may not be present, or may be inconsistently present, in the given input records, in which case, the group-by aggregation still proceeds but embeds any error conditions in the result, When a value is missing for a specified field, it will appear as error("missing").

For instance, if we'd made an typographical error in our prior example when attempting to reference the dname field, the misspelled field would appear as embedded missing errors, e.g.,

zq -Z 'avg(AvgScrRead),count() by cname,dnmae | sort -r count' testscores.zson

produces

{
cname: "Los Angeles",
dnmae: error("missing"),
avg: 450.83037974683543,
count: 469 (uint64)
}
{
cname: "San Diego",
dnmae: error("missing"),
avg: 496.74789915966386,
count: 168 (uint64)
}
...

6. Sorting

Zed provides a convenient way to sort data using the sort operator. All values in Zed have a well-defined sort order, even complex values and values of different data types, so you can easily sort heterogenous sequences of values.

This query sorts our test score records by average reading score:

zq -z 'sort AvgScrRead' testscores.zson

and produces

{AvgScrMath:352(uint16),AvgScrRead:308(uint16),AvgScrWrite:327(uint16),cname:"Alameda",dname:"Oakland Unified",sname:"Oakland International High"}
{AvgScrMath:289(uint16),AvgScrRead:314(uint16),AvgScrWrite:312(uint16),cname:"Contra Costa",dname:"West Contra Costa Unified",sname:"Gompers (Samuel) Continuation"}
{AvgScrMath:450(uint16),AvgScrRead:321(uint16),AvgScrWrite:318(uint16),cname:"San Francisco",dname:"San Francisco Unified",sname:"S.F. International High"}
{AvgScrMath:314(uint16),AvgScrRead:324(uint16),AvgScrWrite:321(uint16),cname:"Los Angeles",dname:"Norwalk-La Mirada Unified",sname:"El Camino High (Continuation)"}
{AvgScrMath:307(uint16),AvgScrRead:324(uint16),AvgScrWrite:328(uint16),cname:"Contra Costa",dname:"West Contra Costa Unified",sname:"North Campus Continuation"}
...

Now we'll sort the test score records first by average reading score and then by average math score. Note how this changed the order of the bottom two records in the result, e.g.,

zq -z 'sort AvgScrRead,AvgScrMath' testscores.zson

produces

{AvgScrMath:352(uint16),AvgScrRead:308(uint16),AvgScrWrite:327(uint16),cname:"Alameda",dname:"Oakland Unified",sname:"Oakland International High"}
{AvgScrMath:289(uint16),AvgScrRead:314(uint16),AvgScrWrite:312(uint16),cname:"Contra Costa",dname:"West Contra Costa Unified",sname:"Gompers (Samuel) Continuation"}
{AvgScrMath:450(uint16),AvgScrRead:321(uint16),AvgScrWrite:318(uint16),cname:"San Francisco",dname:"San Francisco Unified",sname:"S.F. International High"}
{AvgScrMath:307(uint16),AvgScrRead:324(uint16),AvgScrWrite:328(uint16),cname:"Contra Costa",dname:"West Contra Costa Unified",sname:"North Campus Continuation"}
{AvgScrMath:314(uint16),AvgScrRead:324(uint16),AvgScrWrite:321(uint16),cname:"Los Angeles",dname:"Norwalk-La Mirada Unified",sname:"El Camino High (Continuation)"}
...

Here we'll find the counties with the most schools by using the count() aggregate function and piping its output to a sort in reverse order. Note that even though we didn't list a field name as an explicit argument, the sort operator did what we wanted because it found a field of the uint64 data type, e.g.,

zq -z 'count() by County | sort -r' schools.zson

produces

{County:"Los Angeles",count:3636(uint64)}
{County:"San Diego",count:1139(uint64)}
{County:"Orange",count:886(uint64)}
...

Next we'll count the number of unique websites mentioned in our school records. Since we know some of the records don't include a website, we'll deliberately put the null values at the front of the list so we can see how many there are, e.g.,

zq -z 'count() by Website | sort -nulls first Website' schools.zson

produces

{Website:null(string),count:10722(uint64)}
{Website:"acornstooakscharter.org",count:1(uint64)}
{Website:"atlascharter.org",count:1(uint64)}
{Website:"bizweb.lightspeed.net/~leagles",count:1(uint64)}
...

7. Sequence Filters

Several Zed operators manipulate a sequence of values based on the order in which they appear in the input:

  • head - copy leading values of input sequence
  • tail - copy trailing values of input sequence
  • uniq - deduplicate adjacent values

7.1 head

The head operator takes an integer argument N and copies the first N values of its input to its output.

For example, this query selects the first school record:

zq -Z 'head' schools.zson

and produces

{
School: "'3R' Middle",
District: "Nevada County Office of Education",
City: "Nevada City",
County: "Nevada",
Zip: "95959",
Latitude: null (float64),
Longitude: null (float64),
Magnet: null (bool),
OpenDate: 1995-10-30T00:00:00Z,
ClosedDate: 1996-06-28T00:00:00Z,
Phone: null (string),
StatusType: "Merged",
Website: null (string)
}

To see the first five school records in Los Angeles county, this query

zq -z 'County=="Los Angeles" | head 5' schools.zson

produces

{School:"ABC Adult",District:"ABC Unified",City:"Cerritos",County:"Los Angeles",Zip:"90703-2801",Latitude:33.878924,Longitude:-118.07128,Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:null(time),Phone:"(562) 229-7960",StatusType:"Active",Website:"www.abcadultschool.com"}
{School:"ABC Charter Middle",District:"Los Angeles Unified",City:"Los Angeles",County:"Los Angeles",Zip:"90017",Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:2008-09-03T00:00:00Z,ClosedDate:2009-06-10T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:"www.abcsf.us"}
{School:"ABC Evening High School",District:"ABC Unified",City:"Cerritos",County:"Los Angeles",Zip:"90701",Latitude:null(float64),Longitude:null(float64),Magnet:null(bool),OpenDate:1980-07-01T00:00:00Z,ClosedDate:1994-11-23T00:00:00Z,Phone:null(string),StatusType:"Closed",Website:null(string)}
{School:"ABC Secondary (Alternative)",District:"ABC Unified",City:"Cerritos",County:"Los Angeles",Zip:"90703-2301",Latitude:33.881547,Longitude:-118.04635,Magnet:false,OpenDate:1991-09-05T00:00:00Z,ClosedDate:null(time),Phone:"(562) 229-7768",StatusType:"Active",Website:null(string)}
{School:"APEX Academy",District:"Los Angeles Unified",City:"Los Angeles",County:"Los Angeles",Zip:"90028-8526",Latitude:34.052234,Longitude:-118.24368,Magnet:false,OpenDate:2008-09-03T00:00:00Z,ClosedDate:null(time),Phone:"(323) 817-6550",StatusType:"Active",Website:null(string)}

7.2 tail

The tail operator takes an integer argument N and copies the last N values of its input to its output.

For example, this query selects the last school record:

zq -Z 'tail' schools.zson

and produces

{
School: null (string),
District: "Wheatland Union High",
City: "Wheatland",
County: "Yuba",
Zip: "95692-9798",
Latitude: 38.998968,
Longitude: -121.45497,
Magnet: null (bool),
OpenDate: null (time),
ClosedDate: null (time),
Phone: "(530) 633-3100",
StatusType: "Active",
Website: "www.wheatlandhigh.org"
}

To see the last five school records in Los Angeles county, this query

zq -z 'County=="Los Angeles" | tail 5' schools.zson

produces

{School:null(string),District:"Wiseburn Unified",City:"Hawthorne",County:"Los Angeles",Zip:"90250-6462",Latitude:33.920462,Longitude:-118.37839,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(310) 643-3025",StatusType:"Active",Website:"www.wiseburn.k12.ca.us"}
{School:null(string),District:"SBE - Anahuacalmecac International University Preparatory of North America",City:"Los Angeles",County:"Los Angeles",Zip:"90032-1942",Latitude:34.085085,Longitude:-118.18154,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(323) 352-3148",StatusType:"Active",Website:"www.dignidad.org"}
{School:null(string),District:"SBE - Academia Avance Charter",City:"Highland Park",County:"Los Angeles",Zip:"90042-4005",Latitude:34.107313,Longitude:-118.19811,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(323) 230-7270",StatusType:"Active",Website:"www.academiaavance.com"}
{School:null(string),District:"SBE - Prepa Tec Los Angeles High",City:"Huntington Park",County:"Los Angeles",Zip:"90255-4138",Latitude:33.983752,Longitude:-118.22344,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(323) 800-2741",StatusType:"Active",Website:"www.prepatechighschool.org"}
{School:null(string),District:"California Advancing Pathways for Students in Los Angeles County ROC/P",City:"Bellflower",County:"Los Angeles",Zip:"90706",Latitude:33.882509,Longitude:-118.13442,Magnet:null(bool),OpenDate:null(time),ClosedDate:null(time),Phone:"(562) 866-9011",StatusType:"Active",Website:"www.CalAPS.org"}

7.3 uniq

The uniq operator copies input values that are different from the previous input to the output.

Let's say you'd been looking at the contents of just the District and County fields in the order they appear in the school data, e.g.,

zq -z 'cut District,County' schools.zson

produces

{District:"Nevada County Office of Education",County:"Nevada"}
{District:"Oakland Unified",County:"Alameda"}
{District:"Victor Elementary",County:"San Bernardino"}
{District:"Novato Unified",County:"Marin"}
{District:"Beaumont Unified",County:"Riverside"}
{District:"Nevada County Office of Education",County:"Nevada"}
{District:"Nevada County Office of Education",County:"Nevada"}
{District:"San Bernardino City Unified",County:"San Bernardino"}
{District:"San Bernardino City Unified",County:"San Bernardino"}
{District:"Ojai Unified",County:"Ventura"}
...

To eliminate the adjacent lines that share the same field/value pairs, this query

zq -z 'cut District,County | uniq' schools.zson

produces

{District:"Nevada County Office of Education",County:"Nevada"}
{District:"Oakland Unified",County:"Alameda"}
{District:"Victor Elementary",County:"San Bernardino"}
{District:"Novato Unified",County:"Marin"}
{District:"Beaumont Unified",County:"Riverside"}
{District:"Nevada County Office of Education",County:"Nevada"}
{District:"San Bernardino City Unified",County:"San Bernardino"}
{District:"Ojai Unified",County:"Ventura"}
...

8. Value Construction

The yield operator creates one or more output values for each input value based on the one or more expressions provided as arguments to yield.

This example produce two simpler records for every school record listing the average math score with the school name and the county name:

zq -Z 'AvgScrMath!=null | yield {school:sname,avg:AvgScrMath}, {county:cname,zvg:AvgScrMath}' testscores.zson

which produces

{
school: "APEX Academy",
avg: 371 (uint16)
}
{
county: "Los Angeles",
zvg: 371 (uint16)
}
{
school: "ARISE High",
avg: 367 (uint16)
}
{
county: "Alameda",
zvg: 367 (uint16)
}
...

In earlier example, we used put to create a table using this query:

zq -f table 'AvgScrMath != null | put combined_scores:=AvgScrMath+AvgScrRead+AvgScrWrite | cut sname,combined_scores,AvgScrMath,AvgScrRead,AvgScrWrite | head 5' testscores.zson

produces

sname                       combined_scores AvgScrMath AvgScrRead AvgScrWrite
APEX Academy 1115 371 376 368
ARISE High 1095 367 359 369
Abraham Lincoln High 1464 491 489 484
Abraham Lincoln Senior High 1319 462 432 425
Academia Avance Charter 1148 386 380 382

The same result can be achieved by yielding a record literal, sometimes with a more intuitive structure, e.g.,

zq -f table 'AvgScrMath != null | yield  {sname,combined_scores:AvgScrMath+AvgScrRead+AvgScrWrite,AvgScrMath,AvgScrRead,AvgScrWrite} | head 5' testscores.zson

produces

sname                       combined_scores AvgScrMath AvgScrRead AvgScrWrite
APEX Academy 1115 371 376 368
ARISE High 1095 367 359 369
Abraham Lincoln High 1464 491 489 484
Abraham Lincoln Senior High 1319 462 432 425
Academia Avance Charter 1148 386 380 382