Zed Data Model
Zed data is defined as an ordered sequence of one or more typed data values. Each value's type is either a "primitive type", a "complex type", the "type type", a "named type", or the "null type".
1. Primitive Types
Primitive types include signed and unsigned integers, IEEE binary and decimal floating point, string, byte sequence, Boolean, IP address, IP network, null, and a first-class type type.
There are 30 types of primitive values with syntax defined as follows:
|unsigned 8-bit integer
|unsigned 16-bit integer
|unsigned 32-bit integer
|unsigned 64-bit integer
|unsigned 128-bit integer
|unsigned 256-bit integer
|signed 8-bit integer
|signed 16-bit integer
|signed 32-bit integer
|signed 64-bit integer
|signed 128-bit integer
|signed 256-bit integer
|signed 64-bit integer as nanoseconds
|signed 64-bit integer as nanoseconds from epoch
|the Boolean value
|a bounded sequence of 8-bit bytes
|a UTF-8 string
|an IPv4 or IPv6 address
|an IPv4 or IPv6 address and net mask
|a Zed type value
|the null type
The type type provides for first-class types and even though a type value can represent a complex type, the value itself is a singleton.
Two type values are equivalent if their underlying types are equal. Since every type in the Zed type system is uniquely defined, type values are equal if and only if their corresponding types are uniquely equal.
The null type is a primitive type representing only a
null value can have any type.
timevalues correspond to 64-bit epoch nanoseconds and thus not every valid RFC 3339 date and time string represents a valid Zed time. In addition, nanosecond epoch times overflow on April 11, 2262. For the world of 2262, a new epoch can be created well in advance and the old time epoch and new time epoch can live side by side with the old using a named type for the new epoch time referring to the old
time. An app that wants more than 64 bits of timestamp precision can always use a named type of a
bytestype and do its own conversions to and from the corresponding bytes values. A time with a local time zone can be represented as a Zed record of a time field and a zone field
2. Complex Types
Complex types are composed of primitive types and/or other complex types. The categories of complex types include:
- record - an ordered collection of zero or more named values called fields,
- array - an ordered sequence of zero or more values called elements,
- set - a set of zero or more unique values called elements,
- map - a collection of zero or more key/value pairs where the keys are of a uniform type called the key type and the values are of a uniform type called the value type,
- union - a type representing values whose type is any of a specified collection of two or more unique types,
- enum - a type representing a finite set of symbols typically representing categories, and
- error - any value wrapped as an "error".
The type system comprises a total order:
- The order of primitive types corresponds to the order in the table above.
- All primitive types are ordered before any complex types.
- The order of complex type categories corresponds to the order above.
- For complex types of the same category, the order is defined below.
A record comprises an ordered set of zero or more named values
called "fields". The field names must be unique in a given record
and the order of the fields is significant, e.g., type
is distinct from type
A field name is any UTF-8 string.
A field value is any Zed value.
In contrast to many schema-oriented data formats, Zed has no way to specify a field as "optional" since any field value can be a null value.
If an instance of a record value omits a value by dropping the field altogether rather than using a null, then that record value corresponds to a different record type that elides the field in question.
A record type is uniquely defined by its ordered list of field-type pairs.
The type order of two records is as follows:
- Record with fewer columns than other is ordered before the other.
- Records with the same number of columns are ordered as follows according to:
- the lexicographic order of the field names from left to right,
- or if all the field names are the same, the type order of the field types from left to right.
An array is an ordered sequence of zero or more Zed values called "elements" all conforming to the same Zed type.
An array value may be empty. An empty array may have element type
An array type is uniquely defined by its single element type.
The type order of two arrays is defined as the type order of the two array element types.
Note that mixed-type JSON arrays are representable as a Zed array with elements of type union.
A set is an unordered sequence of zero or more Zed values called "elements" all conforming to the same Zed type.
A set may be empty. An empty set may have element type
A set of mixed-type values is representable as a Zed set with elements of type union.
A set type is uniquely defined by its single element type.
The type order of two sets is defined as the type order of the two set element types.
A map represents a list of zero or more key-value pairs, where the keys have a common Zed type and the values have a common Zed type.
Each key across an instance of a map value must be a unique value.
A map value may be empty.
A map type is uniquely defined by its key type and value type.
The type order of two map types is as follows:
- the type order of their key types,
- or if they are the same, then the order of their value types.
A union represents a value that may be any one of a specific enumeration of two or more unique Zed types that comprise its "union type".
A union type is uniquely defined by an ordered set of unique types (which may be other union types) where the order corresponds to the Zed type system's total order.
Union values are tagged in that any instance of a union value explicitly conforms to exactly one of the union's types. The union tag is an integer indicating the position of its type in the union type's ordered list of types.
The type order of two union types is as follows:
- The union type with fewer types than other is ordered before the other.
- Two union types with the same number of types are ordered according to the type order of the constituent types in left to right order.
An enum represents a symbol from a finite set of one or more unique symbols referenced by name. An enum name may be any UTF-8 string.
An enum type is uniquely defined by its ordered set of unique symbols, where the order is significant, e.g., two enum types with the same set of symbols but in different order are distinct.
The type order of two enum types is as follows:
- The enum type with fewer symbols than other is ordered before the other.
- Two enum types with the same number of symbols are ordered according to the type order of the constituent types in left to right order.
An error represents any value designated as an error.
The type order of an error is the type order of the type of its contained value.
3. Named Type
A named type is a name for a specific Zed type. Any value can have a named type and the named type is a distinct type from the underlying type. A named type can refer to another named type.
The binding between a named type and its underlying type is local in scope and need not be unique across a sequence of values.
A type name may be any UTF-8 string exclusive of primitive type names.
For example, if "port" is a named type for
uint16, then two values of
type "port" have the same type but a value of type "port" and a value of type
do not have the same type.
The type order of a named type is the type order of its underlying type with two exceptions:
- A named type is ordered after its underlying type.
- Named types sharing an underlying type are ordered lexicographically by name.
While the Zed data model does not include explicit support for schema versioning, named types provide a flexible mechanism to implement versioning on top of the Zed serialization formats. For example, a Zed-based system could define a naming convention of the form
<type>is the type name of a record representing the schema and
<version>is a decimal string indicating the version of that schema. Since types need only be parsed once per stream in the Zed binary serialization formats, a Zed type implementation could efficiently support schema versioning using such a convention.
4. Null Values
All Zed types have a null representation. It is up to an implementation to decide how external data structures map into and out of values with nulls. Typically, a null value is either the zero value or, in the case of record fields, an optional field whose value is not present, though these semantics are not explicitly defined by the Zed data model.