Candid Specification
Version: 0.1.0
Date: May 4, 2020
Motivation
To document, discover, and interact with canisters (also known as services or actors) on the Internet Computer, we need an interface description language (IDL) for specifying the signature of a canister.
Goals
-
Language-independent description of canister interfaces and the data they exchange (names, parameter and result formats of service methods)
-
Simple and canonical constructs (C-like; algebraically: sums, products, exponentials)
-
Extensible, backwards-compatible
-
Well-formedness is checkable and guaranteed by the platform
-
Deterministic mapping to serialised representation
-
Human-readable and machine-readable
-
Declarative, usable as input to binding code generators
Non-Goals
-
Specification of semantic constraints beyond representation concerns (Rationale: (1) there is no natural boundary to what should be expressible, a scalable solution would quickly lead into the realm of program logics and/or dependent types; (2) cost and complexity for the platform, the hypervisor would have to check and guarantee these properties on every message send; (3) in general, interesting properties cannot be formulated or checked without contextual information such as a canister’s state.)
-
Prescribing the wire format used internally by the network to transport data (though it will make sense to use an extension of the serialisation format described, which is fairly generic)
Inspiration
-
Protocol buffers
-
Thrift
-
Fuchsia IDL
-
CORBA
-
JSON, YAML
-
Fisher, Mandelbaum, Walker, … : The next 700 data description languages
Why not Protocol Buffers or others?
Vanilla protocol buffers are not sufficient or well-suited for describing canisters on the Internet Computer:
-
They are primarily a data description language, not an IDL. There is syntax for defining ``services'', but it assumes RPCs not messaging and requires developing a plugin (replacing the gRPC a.k.a. Stubby one) to provide semantics.
-
They deserialise data into abstract protobuf objects, not actual language data structures. The ``message'' (a.k.a. objects/records/structs) format is designed to be represented as its own abstract in-memory type.
-
They are an inherently first-order data format, that is, they cannot describe functions/callbacks or object/actor parameters with methods.
-
They lack various data types that we want to be able to handle, such as proper arrays, big nums, variants, references.
-
They provide no place to express other information we might need to incorporate into IDL descriptions, such as ``read-onlyness'' or access control specifications.
-
They do not have a semantically sound and compositional foundation for defining safe upgradeability (for example, as a relation on interface types).
-
They do have a number of features we do not need, or may want to define differently.
Given all of the above, I expect there would be fairly little we would be able to reuse from existing protobuf bindings. At the same time, there would be no easy way to incorporate various extensions we require.
Type Structure
The purpose of an IDL is defining the signature, and thereby the type of an actor (service), that is, the set of messages and their parameter and result types. To that end, the grammar for Candid consists mostly of a type grammar.
Core Grammar
This is a summary of the grammar proposed:
<prog> ::= <def>;* <actor>;? <def> ::= type <id> = <datatype> | import <text> <actor> ::= service <id>? : (<actortype> | <id>) <actortype> ::= { <methtype>;* } <methtype> ::= <name> : (<functype> | <id>) <functype> ::= ( <argtype>,* ) -> ( <argtype>,* ) <funcann>* <funcann> ::= oneway | query <argtype> ::= <datatype> <fieldtype> ::= <nat> : <datatype> <datatype> ::= <id> | <primtype> | <constype> | <reftype> <primtype> ::= | nat | nat8 | nat16 | nat32 | nat64 | int | int8 | int16 | int32 | int64 | float32 | float64 | bool | text | null | reserved | empty <constype> ::= | opt <datatype> | vec <datatype> | record { <fieldtype>;* } | variant { <fieldtype>;* } <reftype> ::= | func <functype> | service <actortype> | principal <name> ::= <id> | <text> <id> ::= (A..Z|a..z|_)(A..Z|a..z|_|0..9)* <text> ::= "<char>*" <nat> ::= (0..9)(_? 0..9)* | 0x(0..9|a..f|A..F)(_? 0..9|a..f|A..F)*
A <char>
is a Unicode scalar value (i.e., a codepoint that is not
a surrogate part).
Syntactic Shorthands
In addition to this basic grammar, a few syntactic shorthands are supported that can be reduced to the basic forms:
<argtype> ::= ... | <name> : <datatype> := <datatype> <constype> ::= ... | blob := vec nat8 <fieldtype> ::= ... | <name> : <datatype> := <hash(name)> : <datatype> | <datatype> := N : <datatype> where N is either 0 or previous + 1 (only in records) | <nat> := <nat> : null (only in variants) | <name> := <name> : null (only in variants)
Services
A service is a standalone actor on the platform that can communicate with other services via sending and receiving messages. Messages are sent to a service by invoking one of its methods, that is, the functions that the service provides.
Note: Candid is in fact agnostic to the exact nature of services. In particular, it could be applied to a setting where services are synchronous (objects with RPCs) instead of asynchronous (actors with bidirectional message sends).
Structure
A service’s signature is described by an actor type, which defines the list of methods that the service provides. Each method is described by its name and a function type describing its signature. The function type can also be given by referring to a type definition naming a function reference type.
<actortype> ::= { <methtype>;* } <methtype> ::= <name> : (<functype> | <id>)
We identify <methtype>
lists in an actor type up to reordering.
Names
A name is given either in the syntax of a typical programming language identifier, or as an arbitrary string in quotes:
<name> ::= <id> | <text> <id> ::= (A..Z|a..z|_)(A..Z|a..z|_|0..9)* <text> ::= "<char>*"
Identifiers cannot be keywords of the Candid grammar. In case a name is needed that coincides with a keyword, it has to be quoted as a text string.
Functions
Functions are endpoints for communication.
A typical function invocation is a bidirectional communication, with parameters and results, also known as a request and response.
A oneway
function invocation is a uni-directional communication with zero or more parameters but no
results, intended for fire-and-forget scenarios.
Note: Candid is in fact agnostic to the question of whether communication via functions is synchronous (like RPCs) or asynchronous (like messaging with callbacks as response continuations). However, it assumes that all invocations have the same semantics, so that there is no need to distinguish between synchronous and asynchronous messages.
Note: In a synchronous interpretation of functions, invocation of a oneway function would return immediately, without waiting for completion
of the service-side invocation of the function.
In an asynchronous interpretation of functions, the invocation of a oneway
function does not accept a callback (to invoke on completion).
Structure
A function type describes the list of parameters and results and their respective types. It can optionally be annotated to be query, which indicates that it does not modify any state and can potentially be executed more efficiently (e.g., on cached state). (Other annotations may be added in the future.)
<functype> ::= ( <argtype>,* ) -> ( <argtype>,* ) <funcann>* <funcann> ::= oneway | query <argtype> ::= <datatype>
We identify <funcann>
lists in a function type up to reordering.
The list of parameters must be shorter than 2^32 values; the same restriction apply to the result list. The result list of a oneway
function must be empty.
Shorthand: Named Parameters and Results
As a ``shorthand'', the types of parameters and results in a function type may be prefixed by a name:
<argtype> ::= <name> : <datatype> := <datatype>
The chosen names only serve documentation purposes and have no semantic significance. However, duplicate names are not allowed.
Example
(text, text, nat16) -> (text, nat64) (name : text, address : text, nat16) -> (text, id : nat64) (name : text, address : text, nr : nat16) -> (nick : text, id : nat64)
All three types are equivalent.
Data
The content of message arguments and results is data. Three basic forms of data types can be distinguished: primitive data, which are basic values, and constructed data, which are compound forms of data types, and references, which point to a resource in the network.
<datatype> ::= <primtype> | <constype> | <reftype>
Natural Numbers
The type nat
describes a natural number (unsigned integer) of unlimited range.
There are also variants limited to 8, 16, 32, or 64 bit value range with fixed-size representations.
<primtype> ::= nat | nat8 | nat16 | nat32 | nat64 | ...
Note: Values of type nat
have variable length representations in the binary serialisation format, and hence take up space proportional to (the logarithm of) their value.
As long as typical values are small, they may hence be more space-efficient than the fixed size types.
Integer Numbers
The type int
describes an integer number (signed) of unlimited range.
There are also variants limited to 8, 16, 32, or 64 bit value range with fixed-size representations.
<primtype> ::= ... | int | int8 | int16 | int32 | int64 | ...
Note: Values of type nat
have variable length representations in the binary serialisation format, and hence take up space proportional to (the logarithm of) their value.
As long as typical values are small, they may hence be more space-efficient than the fixed size types.
Floating-Point Numbers
Floating-point values are represented in IEEE 754 binary format and are supported in single precision (32 bit) and double precision (64 bit).
<primtype> ::= ... | float32 | float64 | ...
Text
Text strings are represented by the type text
and consist of a sequence of Unicode scalar values.
<primtype> ::= ... | text | ...
Note: The text
type is distinguished from vec nat8
(a UTF-8 string) or vec nat32
(a sequence of code points) in order to allow
bindings to map it to a suitable string type, and enable the binary format to select an efficient internal representation independently.
Null
The type null
has exactly one value (the null value) and therefore carries no information. It can e.g. be used as a placeholder for optional fields that ought to be added to a record in future upgrades,
or for variant cases that do not need any value, see below.
<primtype> ::= ... | null | ...
Reserved
The type reserved
is a type with unknown content that ought to be ignored.
Its purpose is to occupy field ids in records in order to prevent backwards/forwards compatibility problems, see the description of record types below.
<primtype> ::= ... | reserved
Note: This type has a similar role as reserved fields in proto buffers.
Empty
The type empty
is the type of values that are not present. Its purpose is to mark variants that are not actually there, or – as argument types of function reference – indicate that they will not be
called.
<primtype> ::= ... | empty
Options
An option is a value of a specific data type that may be absent.
<constype> ::= opt <datatype> | ...
Vectors
A vector is a homogeneous sequence of values of the same data type.
<constype> ::= ... | vec <datatype> | ...
Shorthand: Blobs
A shorthand exists for the specific vector blob, which is an arbitrary sequence of bytes:
<constype> ::= .... | blob := vec nat8
Records
A record is a heterogeneous sequence of values of different data types. Each value is tagged by a field id which is a numeric value that has to be unique within the record and carries a single value of specified data type. The order in which fields are specified is immaterial.
<constype> ::= ... | record { <fieldtype>;* } | ... <fieldtype> ::= <nat> : <datatype>
We identify <fieldtype>
lists in a record type up to reordering.
The id is described as a simple unsigned integer that has to fit the 32 bit value range. It can be given in either decimal or hexadecimal notation:
<nat> ::= (0..9)(_? 0..9)* | 0x(0..9|a..f|A..F)(_? 0..9|a..f|A..F)*
An id value must be smaller than 2^32 and no id may occur twice in the same record type.
Shorthand: Symbolic Field Ids
An id can also be given as a name, which is a shorthand for a numeric id that is the hash of that name:
<fieldtype> ::= ... | <name> : <datatype> := <hash(name)> : <datatype>
The purpose of identifying fields by unique (numeric or textual) ids is to support safe upgrading of the record type returned by a function: a new version of a function can safely add fields to an old record as long as their id has not been used before. See the discussion on upgrading below for more details.
The hash function is specified as
hash(id) = ( Sum_(i=0..k) id[i] * 223^(k-i) ) mod 2^32 where k = |id|-1
This expansion implies that a hash collision between field names within a single record is disallowed.
This hash function has the the following useful properties:
-
Collisions are sufficiently rare. It has no collisions for names up to length 4.
-
It is rather simple to implement, compared to, say, a cryptographic hash function (we do not need resistence against collision attacks).
The hash function does not have the property that every numeric value can be turned into a human-readable preimage. Host languages that cannot support numeric field names will have to come up with a suitable encoding textual encoding of numeric field names, as well as of field names that are not valid in the host language.
Shorthand: Tuple Fields
Field ids can also be omitted entirely, which is just a shorthand for picking either 0 (for the first field) or N+1 when the previous field has id N.
<fieldtype> ::= ... | <datatype> := N : <datatype>
Examples
record { name : text; street : text; num : nat; city : text; zip : nat; } record { nat; nat } record { 0 : nat; 1 : nat }
The latter two records are equivalent.
Variants
A variant is a tagged union of different possible data types. The tag is given by a numeric id that uniquely determines the variant case. Each case is described as a field. The order in which fields are specified is immaterial.
<constype> ::= ... | variant { <fieldtype>;* } | ...
We identify <fieldtype>
lists in a variant type up to reordering.
A field id must be smaller than 2^32 and no id may occur twice in the same variant type.
Shorthand: Symbolic Tag Ids
Like for record fields, the id for a variant tag can also be given as a name, which is a shorthand for its hash.
Shorthand: Enumeration Types
The type of a variant field can be omitted, in which case it is null
.
<fieldtype> ::= ... | <nat> := <nat> : null | <name> := <name> : null
This abbreviation only applies to variants. At the same time, variants do not allow the tuple field abbreviation for omitting the field id.
Example
type color = variant { red; green; blue }; type tree = variant { leaf : int; branch : record {left : tree; val : int; right : tree}; }
References
A third form of value are references. They represent first-class handles to (possibly remote) functions, services, or principals.
Actor References
An actor reference points to a service and is described by an actor type. Through this, services can communicate connections to other services.
<reftype> ::= ... | service <actortype>
Example
type broker = service { findCounterService : (name : text) -> (service {up : () -> (); current : () -> nat}); }
Function References
A function reference is described by its function type. For example, they allow passing callbacks to other functions.
<reftype> ::= func <functype> | ...
Principal References
A principal reference points to an identity, such as a canister or a user. Through this, we can authenticate or authorize other services or users.
<reftype> ::= ... | principal | ...
Type Definitions
Types can be named via type definitions.
<def> ::= type <id> = <datatype>
Type definitions are mutually recursive, i.e., they can refer to themselves or each other. However, every type cycle must be productive, that is, go through a type expression that is not just an identifier. A type definition that is vacuous—only equal to itself—is not allowed.
Examples
type stream = opt record {head : nat; next : func () -> stream};
type node = record {head : nat; tail : list}; type list = opt node;
type A = B; type B = A; // error: cyclic type definition
Imports
In order to allow splitting interface definitions up into multiple files or share common definitions between multiple interfaces, import declarations are provided.
<def> ::= ... | import <text>
An import refers to another interface file by URL. The semantics is that of textual inclusion, except that definitions from the imported file must not refer to definitions from the importing file.
Example
File A.dfn
:
type A = service { f : () -> () };
File B.dfn
:
import "A.dfn" service B : A ;
Open Question: Instead of a flat name space, should we require qualified names for imports?
Interfaces
An interface description consists of a sequence of imports and type definitions, possibly followed by a service declaration. A service declaration names and specifies a service actor by specifying an actor type. The actor type may also be given by referring to the name of a type definition for an actor reference type.
<desc> ::= <def>;* <service>;? <service> ::= service <id>? : (<actortype> | <id>)
The optional name given to the service in an interface description is immaterial; it only serves as documentation.
Upgrading and Subtyping
Interfaces are allowed to evolve over time in a manner that is robust, meaning that the interface cannot break existing client code.
To capture this notion precisely, a service of type T
is upgradable to a version with another type T'
if and only if T'
is structural subtype of
T
, written T' <: T
. This defines that T'
is more specialised than T
. (Note: A more specialised type is less general, that is, denotes a smaller set of possible values, thus the
direction of the subtype ordering, even though a subtype record can have more fields.)
For upgrading data structures passed between service and client, it is important to distinguish the direction in which the data flows, as the upgrading requirements are opposite to each other:
-
Outbound data returned from service to client as message results is provided by the service; an upgrade may provide more or more refined data without breaking clients. For example, an outbound record may provide additional fields after an upgrade.
-
Inbound data passed from client to service as message parameters is required by the service; an upgrade may only require less or less specific data without breaking clients. For example, an inbound record may require fewer fields after an upgrade.
That is, outbound message results can only be replaced with a subtype (more fields) in an upgrade, while inbound message parameters can only be replaced with a supertype (fewer fields). This corresponds to the notions of co-variance and contra-variance in type systems.
Subtyping applies recursively to the types of the fields themselves. Moreover, the directions get inverted for inbound function and actor references, in compliance with standard rules.
Primitive Types
Most primitive types cannot be changed in an upgrade.
------------------------ <primtype> <: <primtype>
An exception are integers, which can be specialised to natural numbers:
----------- nat <: int
Additional rules apply to empty
and reserved
, which makes these a bottom resp. top type:
------------------------- <datatype> <: reserved -------------------- empty <: <datatype>
Options and Vectors
An option or vector type can be specialised via its constituent type.
<datatype> <: <datatype'> --------------------------------- opt <datatype> <: opt <datatype'> <datatype> <: <datatype'> --------------------------------- vec <datatype> <: vec <datatype'>
Furthermore, an option type can be specialised to either null
or to its constituent type:
------------------------ null <: opt <datatype> not (null <: <datatype>) <datatype> <: <datatype'> ----------------------------- <datatype> <: opt <datatype'>
The premise means that the rule does not apply when the constituent type is itself null
, an option or reserved
.
That restriction is necessary so that there is no ambiguity.
For example, otherwise there would be two ways to interpret null
when going from opt nat
to opt opt nat
, either as null
or as ?null
.
Q: The negated nature of this premise isn’t really compatible with parametric polymorphism. Is that a problem? We could always introduce a supertype of all non-nullable types and rephrase it with that.
Records
In a specialised record type, the type of a record field can be specialised, or a field can be added.
--------------------------------------- record { <fieldtype'>;* } <: record { } <datatype> <: <datatype'> record { <fieldtype>;* } <: record { <fieldtype'>;* } ---------------------------------------------------------------------------------------------- record { <nat> : <datatype>; <fieldtype>;* } <: record { <nat> : <datatype'>; <fieldtype'>;* }
NOTE: There is a need for a mechanism to also remove fields (which means adding a field when a record appears as an argument). The precise mechanism is still work in progress.
Variants
For a specialised variant, the type of a tag can be specialised, or a tag can be removed.
----------------------------------------- variant { } <: variant { <fieldtype'>;* } <datatype> <: <datatype'> variant { <fieldtype>;* } <: variant { <fieldtype'>;* } ------------------------------------------------------------------------------------------------ variant { <nat> : <datatype>; <fieldtype>;* } <: variant { <nat> : <datatype'>; <fieldtype'>;* }
Functions
For a specialised function, any parameter type can be generalised and any result type specialised. Moreover, arguments can be dropped while results can be added. That is, the rules mirror those of tuple-like records in that they are ordered and can only be extended at the end.
record { (N1' : <datatype1'>);* } <: record { (N1 : <datatype1>);* } record { (N2 : <datatype2>);* } <: record { N2' : <datatype2'>);* } ------------------------------------------------------------------------------------------------------------------- func ( <datatype1>,* ) -> ( <datatype2>,* ) <funcann>* <: func ( <datatype1'>,* ) -> ( <datatype2'>,* ) <funcann>*
where NI*
is the <nat>
sequence 1
..|<datatypeNI>*|
, respectively.
Viewed as sets, the annotations on the functions must be equal.
Services
For a service, a method can be specialised (by specialising its function type), or a method added. Essentially, they are treated like records of functions.
---------------------------------------- service { <methtype'>;* } <: service { } <functype> <: <functype'> service { <methtype>;* } <: service { <methtype'>;* } ------------------------------------------------------------------------------------------------ service { <name> : <functype>; <methtype>;* } <: service { <name> : <functype'>; <methtype'>;* }
Elaboration
To define the actual coercion function, we extend the subtyping relation to a ternary elaboration relation T <: T' ~> f
, where f
is a suitable coercion function of type T -> T'
.
Primitive Types
-------------------------------- <primtype> <: <primtype> ~> \x.x ------------------ Nat <: Int ~> \x.x ------------------------------ <datatype> <: reserved ~> \x.x ------------------------------ empty <: <datatype> ~> \_.unreachable
Options and Vectors
<datatype> <: <datatype'> ~> f --------------------------------------------------- opt <datatype> <: opt <datatype'> ~> \x.map_opt f x <datatype> <: <datatype'> ~> f --------------------------------------------------- vec <datatype> <: vec <datatype'> ~> \x.map_vec f x not (null <: <datatype>) --------------------------------- null <: opt <datatype> ~> \x.null not (null <: <datatype>) <datatype> <: <datatype'> ~> f ------------------------------------------ <datatype> <: opt <datatype'> ~> \x.?(f x)
Records
------------------------------------------------ record { <fieldtype'>;* } <: record { } ~> \x.{} <datatype> <: <datatype'> ~> f1 record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f2 ---------------------------------------------------------------------------------------------- record { <nat> : <datatype>; <fieldtype>;* } <: record { <nat> : <datatype'>; <fieldtype'>;* } ~> \x.{f2 x with <nat> = f1 x.<nat>}
Variants
------------------------------------------------- variant { } <: variant { <fieldtype'>;* } ~> \x.x <datatype> <: <datatype'> ~> f1 variant { <fieldtype>;* } <: variant { <fieldtype'>;* } ~> f2 ------------------------------------------------------------------------------------------------ variant { <nat> : <datatype>; <fieldtype>;* } <: variant { <nat> : <datatype'>; <fieldtype'>;* } ~> \x.case x of <nat> y => <nat> (f1 y) | _ => f2 x
Functions
record { N1':<datatype1'>;* } <: record { N1:<datatype1>;* } ~> f1 record { N2:<datatype2>;* } <: record { N2':<datatype2'>;* } ~> f2 ------------------------------------------------------------------------------------------------------------------ func ( <datatype1>,* ) -> ( <datatype2>,* ) <funcann>* <: func ( <datatype1'>,* ) -> ( <datatype2'>,* ) <funcann>* ~> \x.\y.f2 (x (f1 y))
Services
------------------------------------------------- service { <methtype'>;* } <: service { } ~> \x.{} <functype> <: <functype'> ~> f1 service { <methtype>;* } <: service { <methtype'>;* } ~> f2 ------------------------------------------------------------------------------------------------ service { <name> : <functype>; <methtype>;* } <: service { <name> : <functype'>; <methtype'>;* } ~> \x.{f1 x; <name> = f2 x.<name>}
Open Questions
-
Support default field values?
-
Better upgradability for variants?
-
Support generic type definitions?
-
Namespaces for imports?
Binary Format
At runtime, every Candid value is serialised into a triple (T, M, R), where T (type'') and M (
memory'') are sequences of bytes and R(``references'') is a sequence of references.
If R is empty, it can be omitted.
By making the type of the data explicit, (1) the serialised data becomes self-describing, which is useful for tooling, (2) error discovery and error handling is improved, (3) the binary format is decoupled from versioning concerns, so that the latter can be designed more flexible.
By using references, (1) the wire representation of reference values (which may be complex and involve system meta data such as types) need not be exposed to client code, and (2) the system knows where the references are in the serialised data, such that it can rewrite/map/filter/adjust them as it sees fit.
Accordingly, serialisation is defined by three mapping functions, T
, M
and R
, producing the respective components.
Note:
-
While not required, a serialiser can minimise the size of the serialised type by first computing a value’s principal type with respect to the subtyping relation. In particular, a variant type need not include more fields than necessary. For example, if
E = variant { A, B, C }
and the value to serialise isA
of typeE
, then it can be serialised with typevariant { A }
. Similarly, if the value is the vector[C, A, C]
, then the principal typevec (variant { A, C })
suffices.
Serialisation
This section describes how abstract Candid values of the types described by Candid are serialised into a binary representation for transfer between services.
Serialisation is defined by three functions T
, M
, and R
given below.
Most Candid values are self-explanatory, except for references. There are two forms of Candid values for service references and principal references:
-
ref(r)
indicates an opaque reference, understood only by the underlying system. -
id(b)
, indicates a transparent reference to a service addressed by the blobb
.
Likewise, there are two forms of Candid values for function references:
-
ref(r)
indicates an opaque reference, understood only by the underlying system. -
pub(s,n)
, indicates the public method namen
of the service referenced bys
.
Notation
T
and M
create a byte sequence described below in terms of natural storage types (i<N>
for N = 8, 16, 32, 64
, f<N>
for N = 32, 64
).
The bytes are sequenced according to increasing
significance (least significant byte first, also known as little-endian).
The following notation is used:
-
.
is the empty byte sequence -
x1 x2
is concatenation -
t^N
,t+
,t*
,t?
are sequences ofN
,N>0
,N>=0
, orN<=1
repetitions, respectively -
leb128
andsleb128
are the shortest unsigned and signed LEB128 encodings of a number, respectively -
utf8
is the UTF-8 encoding of a text string (not 0-terminated)
Types
T
maps an Candid type to a byte sequence representing that type.
Each type constructor is encoded as a negative opcode; positive numbers index auxiliary type definitions that define more complex types.
We assume that the fields in a record or variant type are sorted by increasing id and the methods in a service are sorted by name.
T : <primtype> -> i8* T(null) = sleb128(-1) = 0x7f T(bool) = sleb128(-2) = 0x7e T(nat) = sleb128(-3) = 0x7d T(int) = sleb128(-4) = 0x7c T(nat8) = sleb128(-5) = 0x7b T(nat16) = sleb128(-6) = 0x7a T(nat32) = sleb128(-7) = 0x79 T(nat64) = sleb128(-8) = 0x78 T(int8) = sleb128(-9) = 0x77 T(int16) = sleb128(-10) = 0x76 T(int32) = sleb128(-11) = 0x75 T(int64) = sleb128(-12) = 0x74 T(float32) = sleb128(-13) = 0x73 T(float64) = sleb128(-14) = 0x72 T(text) = sleb128(-15) = 0x71 T(reserved) = sleb128(-16) = 0x70 T(empty) = sleb128(-17) = 0x6f T : <constype> -> i8* T(opt <datatype>) = sleb128(-18) I(<datatype>) // 0x6e T(vec <datatype>) = sleb128(-19) I(<datatype>) // 0x6d T(record {<fieldtype>^N}) = sleb128(-20) T*(<fieldtype>^N) // 0x6c T(variant {<fieldtype>^N}) = sleb128(-21) T*(<fieldtype>^N) // 0x6b T : <fieldtype> -> i8* T(<nat>:<datatype>) = leb128(<nat>) I(<datatype>) T : <reftype> -> i8* T(func (<datatype1>*) -> (<datatype2>*) <funcann>*) = sleb128(-22) T*(<datatype1>*) T*(<datatype2>*) T*(<funcann>*) // 0x6a T(service {<methtype>*}) = sleb128(-23) T*(<methtype>*) // 0x69 T(principal)= sleb128(-24) // 0x68 T : <methtype> -> i8* T(<name>:<datatype>) = leb128(|utf8(<name>)|) i8*(utf8(<name>)) I(<datatype>) T : <funcann> -> i8* T(query) = i8(1) T(oneway) = i8(2) T* : <X>* -> i8* T*(<X>^N) = leb128(N) T(<X>)^N
Every nested type is encoded as either a primitive type or an index into a list of type definitions. This allows for recursive types and sharing of types occurring multiple times:
I : <datatype> -> i8* I(<primtype>) = T(<primtype>) I(<datatype>) = sleb128(i) where type definition i defines T(<datatype>)
Type definitions themselves are represented as a list of serialised data types:
T*(<datatype>*)
The data types in this list can themselves refer to each other (or themselves) via I
.
Note:
-
Due to the type definition prefix, there are always multiple possible ways to represent any given serialised type. Type serialisation hence is not technically a function but a relation.
-
The serialised data type representing a method type must denote a function type.
-
Because recursion goes through
T
, this format by construction rules out non-well-founded definitions liketype t = t
.
Memory
M
maps an Candid value to a byte sequence representing that value.
The definition is indexed by type.
We assume that the fields in a record value are sorted by increasing id.
M : <val> -> <primtype> -> i8* M(n : nat) = leb128(n) M(i : int) = sleb128(i) M(n : nat<N>) = i<N>(n) M(i : int<N>) = i<N>(signed_N^-1(i)) M(z : float<N>) = f<N>(z) M(b : bool) = i8(if b then 1 else 0) M(t : text) = leb128(|utf8(t)|) i8*(utf8(t)) M(_ : null) = . M(_ : reserved) = . // NB: M(_ : empty) will never be called M : <val> -> <constype> -> i8* M(null : opt <datatype>) = i8(0) M(?v : opt <datatype>) = i8(1) M(v : <datatype>) M(v* : vec <datatype>) = leb128(N) M(v : <datatype>)* M(kv* : record {<fieldtype>*}) = M(kv : <fieldtype>)* M(kv : variant {<fieldtype>*}) = leb128(i) M(kv : <fieldtype>*[i]) M : (<nat>, <val>) -> <fieldtype> -> i8* M((k,v) : k:<datatype>) = M(v : <datatype>) M : <val> -> <reftype> -> i8* M(ref(r) : service <actortype>) = i8(0) M(id(v*) : service <actortype>) = i8(1) M(v* : vec nat8) M(ref(r) : func <functype>) = i8(0) M(pub(s,n) : func <functype>) = i8(1) M(s : service {}) M(n : text) M(ref(r) : principal) = i8(0) M(id(v*) : principal) = i8(1) M(v* : vec nat8)
References
R
maps an Candid value to the sequence of references contained in that value.
The definition is indexed by type. We assume that the fields in a record value are sorted by increasing id.
R : <val> -> <primtype> -> <ref>* R(_ : <primtype>) = . R : <val> -> <constype> -> <ref>* R(null : opt <datatype>) = . R(?v : opt <datatype>) = R(v : <datatype>) R(v* : vec <datatype>) = R(v : <datatype>)* R(kv* : record {<fieldtype>*}) = R(kv : <fieldtype>)* R(kv : variant {<fieldtype>*}) = R(kv : <fieldtype>*[i]) R : (<nat>, <val>) -> <fieldtype> -> <ref>* R((k,v) : k:<datatype>) = R(v : <datatype>) R : <val> -> <reftype> -> <ref>* R(ref(r) : service <actortype>) = r R(id(b*) : service <actortype>) = . R(ref(r) : func <functype>) = r R(pub(s,n) : func <functype>) = . R(ref(r) : principal) = r R(id(b*) : principal) = .
Note:
-
It is unspecified how references r are represented, neither internally nor externally. When binding to Wasm, their internal representation is expected to be based on Wasm reference types, i.e.,
anyref
or subtypes thereof. It is up to the system how to represent or translate the reference table on the wire.
Deserialisation
Deserialisation is the parallel application of the inverse functions of T
, M
, and R
defined above, with the following mechanism for robustness towards future extensions:
-
A serialised type may be headed by an opcode other than the ones defined above (i.e., less than -24). Any such opcode is followed by an LEB128-encoded count, and then a number of bytes corresponding to this count. A type represented that way is called a future type.
-
A value corresponding to a future type is called a future value. It is represented by two LEB128-encoded counts, m and n, followed by a m bytes in the memory representation M and accompanied by n corresponding references in R.
These measures allow the serialisation format to be extended with new types in the future, as long as their representation and the representation of the corresponding values include a length prefix matching the above scheme, and thereby allowing an older deserialiser not understanding them to skip over them. The subtyping rules ensure that upgradability is maintained in this situation, i.e., an old deserialiser has no need to understand the encoded data.
Parameters and Results
A
defines the argument mapping. Essentially, an argument list is serialised into the triple (T,M,R) as if it was a single closed record.
T and M are combined into a single byte stream B, where they are preceded by the string ``DIDL'' as a magic number and a possible list of type definitions.
We assume that the argument values are sorted by increasing id.
A(kv* : <datatype>*) = ( B(kv* : <datatype>*), R(kv* : <datatype>*) ) B(kv* : <datatype>*) = i8('D') i8('I') i8('D') i8('L') magic number T*(<datatype>*) type definition table I*(<datatype>*) types of the argument list M(kv* : <datatype>*) values of argument list
The vector T*(<datatype>*)
contains an arbitrary sequence of type definitions (see above), to be referenced in the serialisation of the other <datatype>
vector.
The same representation is used for function results.
Note:
-
It is unspecified how the pair (B,R) representing a serialised value is bundled together in an external environment.
Text Format
To enable convenient debugging, we also specify a text format for Candid values. The types of these values are assumed to be known from context, so the syntax does not attempt to be self-describing.
<val> ::= | <primval> | <consval> | <refval> | ( <annval> ) <annval> ::= | <val> | <val> : <datatype> <primval> ::= | <nat> | <int> | <float> (TODO: same as Motoko grammar plus sign) | <text> (TODO: same as Motoko grammar) | true | false | null <consval> ::= | opt <val> | vec { <annval>;* } | record { <fieldval>;* } | variant { <fieldval> } <fieldval> ::= <nat> = <annval> <refval> ::= | service <text> (canister URI) | func <text> . <id> (canister URI and message name) | principal <text> (principal URI) <arg> ::= ( <annval>,* )
Syntactic Shorthands
Analoguous to types, a few syntactic shorthands are supported that can be reduced to the basic value forms:
<consval> ::= ... | blob <text> := vec { N;* } where N* are of bytes in the string, interpreted [as in the WebAssembly textual format](https://webassembly.github.io/spec/core/text/values.html#strings) <fieldval> ::= ... | <name> = <annval> := <hash(name)> = <annval> | <annval> := N = <annval> where N is either 0 or previous + 1 (only in records) | <nat> := <nat> = null (only in variants) | <name> := <name> = null (only in variants)