본문으로 건너뛰기

TL-B Language

advanced level

This information is very low-level and could be hard to understand for newcomers. So feel free to read about it later.

TL-B (Type Language - Binary) serves to describe the type system, constructors and existing functions. For example, we can use TL-B schemes to build binary structures associated with TOS Blockchain. Special TL-B parsers can read schemes to deserialize binary data into different objects.

Overview

We refer to any set of TL-B constructs as TL-B documents. A TL-B document usually consists of declarations of types (i.e. their constructors) and functional combinators. The declaration of each combinator ends with a semicolon (;).

Constructors

Constructors are used to specify the type of combinator, including the state at serialization. For example, constructors can also be used when you want to specify an op in query to a smart contract in TOS.

// ....
hm_edge#_ {n:#} {X:Type} {l:#} {m:#} label:(HmLabel ~l n)
{n = (~m) + l} node:(HashmapNode m X) = Hashmap n X;

hmn_leaf#_ {X:Type} value:X = HashmapNode 0 X;
// ....

The left-hand side of each equation describes the way to define, or serialize, a value of the type indicated on the right-hand side. Such a description begins with the name of a constructor, such as hm_edge or hml_long, immediately followed by an optional constructor tag, such as #_ or $10, which describes the bitstring used to encode (serialize) the constructor in question.

Learn by examples!

constructorserialization
some#3f5476ca32-bit uint serialize from hex value
some$0101serialize 0101 raw bits
someserialize crc32(equation) \| 0x80000000
some#_ or some$_serialize nothing
some# or some$TL-B parsers will get it wrong

Tags may be given in either binary (after a dollar sign) or hexadecimal notation (after a hash sign). If a tag is not explicitly provided, the TL-B parser must compute a default 32-bit constructor tag by hashing with with CRC32 algorithm the text of the “equation” with | 0x80000000 defining this constructor in a certain fashion. Therefore, empty tags must be explicitly provided by #_ or $_.

All constructor names must be distinct and constructor tags for the same type must constitute a prefix code (otherwise the deserialization would not be unique); i.e. no tag can be a prefix of any other.

This is an example from the TosToken repository that shows
us how to implement an internal message TL-B scheme:

extra#_ amount:Grams = Extra;

addr_std$10 anycast:(## 1) {anycast = 0}
workchain_id:int8 address:bits256 = MsgAddrSmpl;

transfer#4034a3c0 query_id:uint64
reciever:MsgAddrSmpl amount:Extra body:Any = Request;

In this example, transfer#4034a3c0 will be serialized as a 32-bit unsigned integer from the hex value after the hash sign(#). This meets the standard requirements for an op in the smart contract guidelines.

To meet the standard described in paragraph 5 of the smart contract guidelines, it is not enough for us to calculate the CRC32. You can utilise the following examples to define an op in requests or responses from smart contracts in a TL-B scheme:

import binascii


def main():
req_text = "some_request"
req = format(binascii.crc32(bytes(req_text, "utf-8")) & 0x7fffffff, 'x')
print(f"{req_text}#{req} = Request;") # some_request#733d0d35 = Request;

rsp_text = "some_response"
rsp = format(binascii.crc32(bytes(rsp_text, "utf-8")) | 0x80000000, 'x')
print(f"{rsp_text}#{rsp} = Response;") # some_response#88b0eb8f = Response;


if __name__ == "__main__":
main()

Field definitions

The constructor and its optional tag are followed by field definitions. Each field definition is of the form ident:type-expr, where ident is an identifier with the name of the field16 (replaced by an underscore for anonymous fields), and type-expr is the field’s type. The type provided here is a type expression, which may include simple types or parametrized types with suitable parameters. Variables — i.e. the (identifiers of the) previously defined fields of types # (natural numbers) or Type (type of types) — may be used as parameters for the parametrized types. The serialization process recursively serializes each field according to its type and the serialization of a value ultimately consists of the concatenation of bitstrings representing the constructor (i.e. the constructor tag) and the field values.

Some fields may be implicit. Their definitions are surrounded by curly brackets({, }), which indicate that the field is not actually present in the serialization, but that its value must be deduced from other data (usually the parameters of the type being serialized). Example:

nothing$0 {X:Type} = Maybe X;
just$1 {X:Type} value:X = Maybe X;

Finally, some equalities/inequalities may be included in curly brackets as well. These are certain “equations” which must be satisfied by the “variables” included in them. If one of the variables is prefixed by a tilde (~), its value will be uniquely determined by the values of all other variables participating in the equation (which must be known at this point) when the definition is processed from the left to the right. For example:

addr_std$10 anycast:(## 1) {anycast = 0}
workchain_id:int8 address:bits256 = MsgAddrSmpl;

Some occurrences of “variables” (i.e. already-defined fields) are prefixed by a tilde(~). This indicates that the variable’s occurrence is used in the opposite way to the default behavior: on the left-hand side of the equation, it means that the variable will be deduced (computed) based on this occurrence, instead of substituting its previously computed value; in the right-hand side, conversely, it means that the variable will not be deduced from the type being serialized, but rather that it will be computed during the deserialization process. In other words, a tilde transforms an “input argument” into an “output argument” or vice versa.

For example, we can use this to write a TL-B scheme for a simple transaction in TOS with comment (which must be serialized as a sequence of cells):

empty#_ b:bits = Snake ~0;
cons#_ {n:#} b:bits next:^(Snake ~n) = Snake ~(n + 1);

op:#0 comment:Snake = Request;

A caret (ˆ) preceding a type X means that instead of serializing a value of type X as a bitstring inside the current cell, we place this value into a separate cell and add a reference to it into the current cell. Therefore ˆX means “the type of references to cells containing values of type X”.

Parametrized type #<= p with p : # (this notation means “p of type #”, i.e. a natural number) denotes the subtype of the natural numbers type #, consisting of integers 0 ... p; it is serialized into [log2(p + 1)] bits as an unsigned big-endian integer. Type # by itself is serialized as an unsigned 32-bit integer. Parametrized type ## b with b : #<=31 is equivalent to #<= 2^b − 1 (i.e. it is an unsigned b-bit integer). For example:

action_send_msg#0ec3c86d mode:(## 8) 
out_msg:^(MessageRelaxed Any) = OutAction;

In this scheme mode:(## 8) will be serialized as an 8-bit unsigned integer.

Conditional (optional) fields

The serialization of conditional fields is determined by the other already specified fields.

For example (from block.tlb):

block_info#9bc7a987 version:uint32 
not_master:(## 1)
after_merge:(## 1) before_split:(## 1)
after_split:(## 1)
want_split:Bool want_merge:Bool
key_block:Bool vert_seqno_incr:(## 1)
flags:(## 8) { flags <= 1 }
seq_no:# vert_seq_no:# { vert_seq_no >= vert_seqno_incr }
{ prev_seq_no:# } { ~prev_seq_no + 1 = seq_no }
shard:ShardIdent gen_utime:uint32
start_lt:uint64 end_lt:uint64
gen_validator_list_hash_short:uint32
gen_catchain_seqno:uint32
min_ref_mc_seqno:uint32
prev_key_block_seqno:uint32
gen_software:flags . 0?GlobalVersion
master_ref:not_master?^BlkMasterInfo
prev_ref:^(BlkPrevInfo after_merge)
prev_vert_ref:vert_seqno_incr?^(BlkPrevInfo 0)
= BlockInfo;

In this example, the cell reference ^BlkMasterInfo will be serialized only if not_master > 0. And the GlobalVersion will be serialized only if the bit at index 0 in a binary representation of flags is set.

Namespaces

Available in the TL version from Telegram, but as it turned out, it was not used in TL-B

Comments

Comments are the same as in C++

/* 
This is
a comment
*/

// This is one line comment

Library usage

You can use TL-B libraries to extend your documents and to avoid writing repetitive schemes. We have prepared a set of ready-made libraries that you can use. They are mostly based on block.tlb but we have also added some combinators of our own.

  • tosstdlib.tlb
  • tosextlib.tlb
  • hashmap.tlb

In TL-B libraries there is no concept of cyclic import. Just indicate the dependency on some other document (library) at the top of the document with the keyword dependson. For example:

file mydoc.tlb:

//
// dependson "libraries/tosstdlib.tlb"
//

op:uint32 data:Any = MsgBody;
something$0101 data:(Maybe ^MsgBody) = SomethingImportant;

In dependencies, you are required to specify the correct relative path. The example above is located in such a tree:

.
├── mydoc.tlb
├── libraries
│ ├── ...
│ └── tosstdlib.tlb
└── ...

IDE Support

The intellij-tos plugin supports Fift, FunC and also TL-B.
The TL-B grammar is described in the TlbParser.bnf file.

Useful sources

Thanks to the Vudi and cryshado for contributing to the community!