본문으로 건너뛰기

TL-B Overview [deprecated]

advanced level

This information is very low-level and could be hard to understand for newcomers.
So feel free to read about it later.

TL-B stands for "Typed Language - Binary". It is used to describe a scheme of (de)serialization of objects to cells. Here are detailed and complete TL-B schemes for all objects in TOS: https://github.com/tos-network/tos/blob/master/crypto/block/block.tlb.

Scheme

Each TL-B scheme consists of declarations. Each declaration describes a constructor for some type. For instance, a Bool type may have two constructors for true and false values.

Typical TL-B declarations are shown below:

bool_false$0 = Bool;
bool_true$1 = Bool;

unary_zero$0 = Unary ~0;
unary_succ$1 {n:#} x:(Unary ~n) = Unary ~(n + 1);

acc_trans#5 account_addr:bits256
transactions:(HashmapAug 64 ^Transaction CurrencyCollection)
state_update:^(HASH_UPDATE Account)
= AccountBlock;

Each TL-B declaration consist of:

  • Constructor: a constructor name immediately followed by an optional constructor tag
  • a list of explicit and implicit field definitions that are separated by whitespaces (" ", "\n", etc)
  • sign =
  • (optionally parametrized) Type name

Example: two constructors (with different binary prefixes) for a Bool type.

bool_false$0 = Bool;
bool_true$1 = Bool;

Constructor

A constructor is declared via a constructor_name[separator,tag].

A constructor_name consists of [A-z0-9_] symbols. snake_case names are conventionally used.

A constructor name can be followed by a separator. The absence of a separator means that the tag will be calculated automatically as a 32bit CRC32-sum of constructor declarations. If a separator is present, it can take two values # and $. The former means that a tag will be given in a hexadecimal form, the latter means a binary tag. After both separators, there may be an underscore symbol _ which stands for an empty tag.

There is also a special constructorname `` (called 'anonymous constructor') which means that there is only one unnamed constructor with an empty tag for a given type.

The table below displays possible tag definitions.

Constructortag
_empty tag for anonymous constructor
someautomatically calculated 32-bit tag
some#bba12-bit tag equal to 0b101110111010
some$010115-bit tag equal to 0b01011
some#_empty tag
some$_empty tag

Note that pregenerated tages are not usually used; explicitly declared are preferred.

Field definitions

Explicit

Each field definition has the ident : type-expr, where ident is an identifier for the name of the field (replaced by an underscore _ for anonymous fields) and type-expr is the field type. The type provided here is a type expression, which may include simple types or parametrized types with suitable parameters. Variables — i.e., the (identifiers of the) previously defined fields of types # (natural numbers) or Type (type of types) — may be used as parameters for the parametrized types.

There are a few predefined types:

  • # - means an unsigned 32-bit number
  • ## N - the same as uintN - means an unsigned N-bit number
  • #<= N - means a number between 0 and N (including both). Such a number is stored in ceil(log2(N+1)) bits.
  • N * Bit - means N-bit slice
  • ^Cell - means an arbitrary cell in reference
  • ^[ field_definitions ] - means that field definitions are stored in the referenced cell
  • Type - stands for arbitrary type (but only presents in implicit definitions).

type-expr usually consist of (optionally parametrized) Type only as: last_trans_lt:uint64 or _:StateInit. However, it is possilbe that type-expr also contains conditions. In that case, type-expr consist of ident, :, condition, ?, type. If a condition (which can refer to previously defined fields) renders to false, the corresponding field is not presented. For instance prev:n?^(ProofChain n) means that prev field is only presented for objects when n>0.

Implicit

Some fields may be implicit. Their definitions are surrounded by curly brackets, which indicate that the field is not actually present in the serialization, but that its value must be deduced from some other data (usually the parameters of the type being serialized). For instance

nothing$0 {X:Type} = Maybe X;
just$1 {X:Type} value:X = Maybe X;

means the following: some other constructor may define the field var:(Maybe #). In that case, the variable will be serialized either as 1 bit and a serialization of # (uint32) if var is present or as 0 bit if var is absent. That way Maybe is declared as a C++-like template type for arbitrary type X. However, if Maybe is declared as nothing$0 {X:#} = Maybe X;, that will mean that Maybe is declared for an arbitrary number (not totally arbitrary type X).

Type definition

A type name consist of [A-z0-9_] symbols. By convention it is a CamelCase name.

It can be parametrized by one or more parameters.

Some occurrences of “variables” are prefixed by a tilde (~). This means that, prior to deserialization, the exact value of that variable is not known, but instead will be computed during deserialization.

Let's consider:

unary_zero$0 = Unary ~0;
unary_succ$1 {n:#} x:(Unary ~n) = Unary ~(n + 1);

and the case when we want to deserialize the Unary ~N object from the slice containing 0b1111111100101 bit string. When we say that we want to deserialize Unary ~N, this means that we do not know yet whether we deserialize Unary 0, Unary 7 or Unary 1020. Instead we start with 0b1111111100101 and compare it with the constructor prefixes 0b0 for unary_zero and 0b1 for unary_succ. We see that we have unary_succ, but again the value of N cannot be deducted, instead we should obtain it from the deserialization of variable x. This variable has type Unary ~(N-1) and the value of N-1 can be deducted from the deserialization of the remaining bits in the slice. We get the remaining bits of the slice and try to deserialize Unary ~(N-1) and again see the unary_succ tag. That way we recursively dive into Unary until we get to the Unary ~(N-8). At that level we see that the rest of the slice starts from unary_zero tag and thus constitutes a Unary 0 object. Popping back up we can see that we initially had a Unary 8 object. So after the deserialization of Unary ~N from Slice(0b1111111100101) we get a Unary 8 object and the remaining slice(0b0101) from which subsequent variables of the constructor can be deserialized.

Constraints

Some implicit fields may contain constraints, for instance {n <= m}. It means that the previously defined variables n and m should satisfy the corresponding inequality. This inequality is an inherent property of the constructor. It should be checked during serialization and objects with variables which do not satisfy these constraints are invalid.

An example of constructors with constraints:

hml_short$0 {m:#} {n:#} len:(Unary ~n) {n <= m} s:(n * Bit) = HmLabel ~n m;
hml_long$10 {m:#} n:(#<= m) s:(n * Bit) = HmLabel ~n m;
hml_same$11 {m:#} v:Bit n:(#<= m) = HmLabel ~n m;

Comments

TL-B schemas support C-like comments:

/* 
This is a
multiline
comment
*/

// This is one line comment

(De)serialization

Given the TL-B scheme, any object can be serialized to builder and deserialized from the slice. In particular, when we deserialize an object we need to start with the determination of the corresponding constructor using a tag and then deserialize variables one by one from left to right (recursively jumping to the serialization of variables which are TL-B objects themselves). During serialization we go the other way, by finding and writing to the builder tag which corresponds to a given object of that type and then continue from left to right with each variable.

For parsers, it is recommended to read the scheme once and generate a serializator and deserializator for each type, instead of referring to the scheme on the fly.

BNF grammar

The Backus–Naur form can be found at TlbParser.bnf, thanks to @andreypfau.

TL-B is also supported by intellij-tos plugin.

Docs on TL-B can be found in the TVM Whitepaper and in a concise (they have been collected in one place) format here.

Generator of serializators and deserializators

An example of a generator used by a TOS node can be found in the Tos node sources.

What's next?

If you want to know more about TL-B serialization and see some examples of complex structures parsing, you can continue by reading: