Serialization¶
Serialization, as describes here, refers to the process of turning a variable of some type in the sending model’s language into bytes and then reversing the process (deserialization) by turning the bytes into a variable of an analagous type in the language of the receiving model.
For most types, yggdrasil uses an enhanced version of the JSON data interchange format to serialize data structures and uses JSON Schema to describe, validate, and normalize them. The use of JSON and JSON schema has the advantage that there are many existing tools for reading, writing, and validating JSON serialized data that will make it easier to expand support to other languages in the future. For more complex data structures that are not easily or efficiently represented in JSON, yggdrasil supports serialization classes that are associated with a new expansion type to the core set of JSON types.
The mapping between datatypes in the different supported languages is shown below.
Schema |
R |
C |
C++ |
Fortran |
Julia |
Matlab |
Python |
Notes |
---|---|---|---|---|---|---|---|---|
array |
|
|
|
|
|
|
|
|
boolean |
|
|
|
|
|
|
|
|
bytes |
|
|
|
|
|
|
|
Precision X is preserved. |
class |
|
|
|
|||||
complex |
|
|
|
|
|
|
|
Precision X is preserved. |
float |
|
|
|
|
|
|
|
Precision X is preserved. |
function |
|
|
|
|||||
instance |
|
|
|
|||||
int |
|
|
|
|
|
|
|
Precision X is preserved. |
integer |
|
|
|
|||||
ndarray |
|
|
|
|
|
|
|
|
null |
|
|
|
|
|
|
|
|
number |
|
|
|
|
This covers the JSON default for floating point or integer values. |
|||
obj |
|
|
|
|
|
|
|
|
object |
|
|
|
|
|
|
|
|
ply |
|
|
|
|
|
|
|
|
scalar |
yggdrasil defines scalars as an umbrella type encompassing int, uint, float, bytes, and unicode. |
|||||||
schema |
|
|
|
|
|
|
|
|
string |
|
|
|
|
|
|
|
User can specify an encoding for scalar strings. |
uint |
|
|
|
|
|
|
|
Precision X is preserved. |
unicode |
|
|
|
|
|
|
|
Precision X is preserved. |
Message Structure¶
Each message consists of two parts, the header and the body which are joined by
a dedicated string YGG_MSG_HEAD
such that:
message = 'YGG_MSG_HEAD<header>YGG_MSG_HEAD<body>'
Message Header¶
Message headers are serialized JSON objects with properites that describe the message and can be used to ensure the message is complete on receipt. Although each header can contain any information, at a minimum the header must include the following properties:
Property |
Type |
Description |
Purpose |
---|---|---|---|
type |
string |
Name of body datatype. |
Deserialization and message validation. |
size |
int |
Size of message body. |
Verifying that the received message is complete. |
id |
string |
Unique message identifier. |
Message tracking. |
The header is also used to pass information between models reguarding communication. This information can include things like the type of messages that should be expected, metadata about the messages that was not serialized, addresses of temporary communication resources, acknowledgement that a message was received, and notifications about new or broken connections.
Message Body¶
The message body is also a serialized JSON document, but during the encoding process, data of some types undergo an additional pre-encoding step in order to transform the data into a form that is more efficiently JSON encodable prior to actually serializing the data via JSON encoding. This is handled in an extended version of the rapidjson package with a corresponding Python wrapper. For example, encoding one million 64bit floats would 1) produce a very large JSON array and 2) require a large number of digits in order to preserve precision to the level of round-off error as would be expected when passing floats between programs written in the same language. For such datatypes, it is more efficient to use an alternative method of pre-encoding the data into a form that can be efficiently JSON encoded.
Scalars¶
As described above, precision can be lost when using JSON encoding for floating point numbers. In addition, there are some number types that JSON encoding dosn’t support. More specifically, JSON encoding does not support complex number, unsigned integers, or explicit precision numbers (e.g. float32_t vs. float64_t in C). To overcome this limitation, yggdrasil passes scalar numbers by pre-encoding their raw bytes into an ASCII string representation using the standard base64 encoding. To allow decoding and validation on receipt of encoded scalars, yggdrasil includes the subtype (float, int, uint, complex), precision (in bits), and units in the the message header.
Strings¶
Because there are a large number of string encodings (e.g. ASCII, UTF-8)
with varying degrees of support in different languages, yggdrasil supports
three different string related types. In addition to the core JSON string
type, which will be encoded and mapped to a datatype in
the programming language of the receiving model according to the JSON
implementation that is used (the JSON spec indicates UTF-8 should be used by
default), yggdrasil also supports scalar strings that have explicit encodings. Support string scalar encodings currently include “ASCII”, “UTF8”, “UTF16”, “UTF32”, and “UCS4”. If no encoding is specified for a string scalar, “ASCII” is assumed. In addition to string
scalars, two additional strings are supported for backwards compatibility: unicode
and bytes
. If not explicit encoding is provided with these types, bytes
is treated as implying “ASCII” encoding and unicode
is treated as implying “UCS4” encoding.
Like other scalars, string scalars will also be encoded has ASCII via base64 on serialization and decoded when deserialized.
If the receiving language dosn’t have a built-in unicode type (e.g. C),
the message will be preserved in the encoded UTF-32 bytes format.
Homogeneous Arrays¶
As mentioned above, large arrays can produce large JSON documents that can dramatically increase the size of messages and thereby the time required to send/receive data. However, as most languages provide built-in support for arrays that are continuous in memory, it is much more efficient to pass the arrays in a continuous format. Arrays elements are encoded in the same way as scalars using base64 in row-major order. On receipt, languages which are column-major order must re-order the data. As is done for scalars, yggdrasil passes the data subtype, precision, and units for arrays and also sends the array size (for one-dimensional arrays) or array shape (for multi-dimensional arrays). Arrays of strings are also serialized in this way, but strings are padded so that every string has the same width (i.e. elements are fixed width).
For languages that allow for mixed-type arrays, yggdrasil sends these data as JSON arrays of the differently typed elements (or JSON objects if field names are provided).
Ply/Obj¶
Ply and Obj file formats are designed to efficiently represent 3D structures containing large numbers of elements that would require large JSON structures containing nested objects and arrays. As such, yggdrasil defines Ply and Obj objects as their own data types that are serialized using the Ply and Obj file standards during pre-encoding.