Encoding
Accumulate uses a custom binary encoding, loosely based off of Protocol Buffers. The fields of structured types (i.e. structs) are numbered, and when a field is encoded the (encoded) value of the field is prefixed with the field's number. However, unlike Protocol Buffers, Accumulate does not permit fields to be reordered - fields must be presented in order of their field number. For example, fields 2 and 4 of the same value must appear in that order - if they appear in the reverse order, the value will be rejected.
Value formats
Field numbers must be between 1 and 31 (inclusive) and thus they can be (and are) encoded in a single byte. Structured types cannot have more than 31 fields.
Integer values with a max size of 64 bytes or smaller are encoded as base 128 varints (signed).
All other variable length values (i.e. excluding varints) are encoded to bytes, then prefixed with their byte length encoded as an unsigned varint.
Arbitrary precision integer values are encoded as bytes (length-prefixed) in big-endian order.
Floating point values are encoded as in IEEE 754 double-precision binary floating-point format (float64/binary64).
Boolean values are encoded as 0x00 (false) or 0x01 (true).
Dates and timestamps are converted to a UTC Unix timestamp (seconds) and encoded as a signed varint.
Durations are split into seconds and nanoseconds and encoded as a pair of unsigned varints.
Enumerations are converted to an integer and encoded as an unsigned varint.
Hashes (32-byte values, aka 256-bit integers) are encoded raw, as 32 bytes (with no length prefix).
Byte sequences are encoded raw, with a length prefix.
Strings are encoded as a byte sequence.
URLs and transaction IDs are converted to a string and encoded as a byte sequence.
Forwards compatibility and epilogues
Scenario: Two systems, A and B, wish to communicate with each other via binary encoded records. However, system A is using a newer type definitions for records, and these new definitions include additional fields that system B is not aware of. If A sends a message that includes a record with one of these additional fields, B will record the additional fields as the record's epilogue.
The epilogue is essentially extra bytes that are appended to the end of the record when converting it to binary. However, the epilogue has a condition: it must be prefixed with a field number higher than that of any of the known fields, such that it appears to be an additional field. For example, if record R defines three fields (numbered 1, 2, and 3), then the epilogue must be prefixed with 04
. This allows the binary encoding format to be forwards compatible. If system A thinks record R has four fields, and system B thinks record R has three fields, and system A sends a message including a record of type R with all four fields set, the value of the fourth field will be prefixed with it's field number and thus system B will treat it as an epilogue.
This is not a license to append private data to objects. If system A adds a fourth, private field knowing that other systems will interpret that data as an epilogue, and then the other systems add a fourth field of a different type, the other systems will not be able to decode messages from system A due to the conflicting data type of the fourth field.
Repeatable fields
A repeatable field (usually typed as an array/slice/list) is encoded simply by repeating the field. For example, given struct A with a repeatable unsigned integer field X (number 1), A{X: [7, 8, 9]}
would be encoded as 01 07 01 08 01 09
.
Nested structured values
A field that is itself a structured value is encoded by encoding the structured value, then encoding the resulting bytes as a byte sequence (i.e. with a length prefix). For example, given struct A with a field X of type B, and struct B with an unsigned integer field Y, B{Y: 15}
would be encoded as 01 0F
thus A{X: B{Y: 15}}
would be encoded as 01 02 01 0F
.
Unions
Accumulate uses tagged unions. Each union type has a corresponding enumeration that enumerates the members of the union. Each union member has an implicit type field, numbered 1, with its value set to the enumeration value that corresponds to the member type. For example, the Account union type corresponds to the AccountType enumeration, and the KeyBook struct type (a member of the Account union) corresponds to AccountTypeKeyBook (a value of the AccountType enumeration).
Union values are encoded in the same way as any other structured type. Since the first field of a member of a union type is always the corresponding enumeration value, union values are decoded by decoding that first field, looking up the corresponding union member, then decoding the remainder of the fields using that type definition.
For example, KeyBook{Url: "foo", PageCount: 1}
(with Type: AccountTypeKeyBook
implicit) is encoded as 01 0a 02 03 66 6f 6f 05 01
. To decode this, first the type field (numbered 1) is decoded; since AccountTypeKeyBook (0x0a) corresponds to the KeyBook struct type, the remaining values are decoded using that type's field definitions.
Last updated