Adding Support for a New Language¶
The yggdrasil package has been redesigned to make adding support for a new language as easy as possible, but developers will need some Python programming knowledge and a descent familiarity with the language being added.
Write the Language Driver¶
The first step in adding language support is to write a driver for the language.
Model drivers take care of things like writting any necessary wrappers, compiling
the code (if necessary), and running the code. Generally, new languages will fall
into one of two categories, interpreted or compiled. Based on the category that
the language falls under, developers should use the associated base class
(yggdrasil.drivers.InterpretedModelDriver.InterpretedModelDriver
or
yggdrasil.drivers.CompiledModelDriver.CompiledModelDriver
) as a parent class.
These base classes parameterize the required model driver operations so that
developers should not have to write a large amount of code.
In addition to the type specific steps below, developers can control the behavior of their class by defining the following class attributes:
and the class method is_library_installed
, which is used to determine if
dependencies are installed.
Model drivers should go in the yggdrasil/drivers
directory and tests should
go in the yggdrasil/drivers/tests
directory.
Interpreted Languages¶
Additional class attributes specific to interpreted model drivers include:
Compiled Languages¶
Additional class attributes specific to compiled model drivers include:
Compilation Tools¶
For compiled languages, yggdrasil allows multiple compilation tools to be defined for the same language, particularly when different tools are required on different operating systems. In these cases, developers should create classes for the compilation tools (i.e. compilers, linkers, archivers) associated with the language. yggdrasil defines several base classes for this purpose which should be used as parent classes for any new tools.
For compilers, the class is yggdrasil.drivers.CompiledModelDriver.CompilerBase
.
The behavior of the compiler is defined by these class attributes:
Most compilers, also serve as linkers so it is unlikely that developers will
need to define new linkers (outside of the linker related compiler class attributes
above), but there is also a linker base class if developers need finer tuned access
to the class’s behavior. For linkers, the class is
yggdrasil.drivers.CompiledModelDriver.LinkerBase
adn the behavior of the
linker is defined by these class attributes:
Many archivers can be used for multiple languages so check the other languages
before adding a new one. If the target archiver already exists for other languages,
developers should add the new language to the accepted list of languages on the
class associated with the archiver. For archivers, the base class is
yggdrasil.drivers.CompiledModelDriver.ArchiverBase
.
The behavior of the archiver is defined by these class attributes:
Write the Language Communication Interface¶
The second phase of adding support for a new language is to write the language interface. This step is more involved than writing the model driver for the language, but the majority of the required development will be in the language being added. Tools required for language support that are not meant to be accessed via the yggdrasil Python package (e.g. the language interface or conversion functions) should go in specific language directory under yggdrasil/languages with a name identifying the languagye (e.g. yggdrasil/languages/MATLAB for the MATLAB interface contains conversion functions and the interface classes/functions written in MATLAB).
For new languages, developers should first do a review to identify existing tools for calling code in one of the languages that yggdrasil already supports (e.g. the R interface uses the reticulate package to call the Python interface from R). If such a tool exists, then the developers task is must easier.
From an Interface in a Supported Language (Recommnded)¶
If there is an existing tool for accessing code written in one of the supporting languages, the developer will use that tool to wrap the interface from the already supported language. Examples of this can be found in the Matlab and R interface which both wrap the Python interface. The wrapper interface must have, at minimum:
Functions/Classes for creating communicator objects. The created functions/classes should take as input a channel name (and optional format string for creating communicator objects for output), calls the wrapped interface, and returns the class/object representing the communicator in a form that can be used in the language being added. For object oriented languages, it may be easiest to create a new class that wraps access to the object returned by the wrapped interface. There must be a way to distinguish from input and output communicators either by exposing separate functions/classes or via an explicit argument.
Functions/Methods for calling the wrapped send/recv functions/methods. The created functions/classes must be able to access the wrapped communicator class or data object and call the appropriate send/recv function or method, converting the inputs and outputs of these functions into forms that make sense for the language being added (See next point).
Conversion functions/methods. While tools for calling external programming languages often handle most of the type conversion necessary for the two languages to interact, these conversions are often incomplete or insufficient for the purposes of yggdrasil (e.g. R does not have built-in support for variable precision integers and float). In such cases, the developer adding the language may need to write a conversion function that handles these inconsistencies.
From Scratch¶
Create “Comm” Class/Object¶
For a language to added, there must be an interface to at least one of the supported communication mechanisms. Because it is widely supported in different programming languages, we recommend adding a ZeroMQ communication interface as a starting point. The new interface will need to defined a class or data object that wraps access to the underlying communication mechanism (e.g. ZeroMQ).
This includes creating
the communication connection based on a channel name. yggdrasil will store
information about the communication associated with the connection in an
environment variable with the name '<channel_name>_IN'
if the channel
is providing input to the model and '<channel_name>_OUT'
if the channel is
handling output from the model. The exact content of the environment variable
depends on the communication mechanisms as shown in the table below.
Required Methods/Functions¶
The interface must also provide methods/functions for accesing the underlying communication class’s methods for sending and receiving messages. This is usually straight forward (assuming serialization is implemented), but there are some communication mechanisms with require some additional features (e.g. ZeroMQ). The table below includes some notes on implementing each of the supported communication mechanisms.
Comm |
Address |
Developer Notes |
---|---|---|
buffer |
||
ipc |
An IPC message queue key. |
The default size limit for IPC message queues is 2048 bytes on Mac operating systems so it is important that implementation of this communication mechanism properly split and send messages larger than this limit as more than one message. |
mpi |
The partner communicator ID(s). |
|
rest |
||
rmq |
AMPQ queue address of the form |
It is not advised that new language implement a RabbitMQ communication interface. Rather RMQ communication is included explicitly for connections between models that are not co-located on the same machine and are used by the yggdrasil framework connections on the Python side. |
value |
[] |
|
zmq |
A ZeroMQ endpoint of the form <transport>://<address>, where the format of address depends on the transport. Additional information can be found here. |
yggdrasil uses the tcp transport by default with a PAIR socket type. For every connection, yggdrasil establishes a second request/reply connection that is used to confirm messages passed between the primary PAIR of sockets. On the first send, the model should create a REP socket on an open tcp address and send that address in the header of the first message under the key ‘zmq_reply’. Receiving models should check message headers for this key and, on receipt, establish the partner REQ socket with the specified address (receiving comms can receive from more than one source so they can have more than one request addresses at at time for this purpose). Following every message, the sending model should wait for a message on the reply socket and, on receipt, return the message. Following every message, the receiving model should send the message ‘YGG_REPLY’ on the request socket and wait for a reply. When creating worker comms for sending large messages, the sending model should create the reply comm for the worker in advanced and send it in the header with the worker address under the key ‘zmq_reply_worker’. |
Implement Serialization¶
The most involved part of writing the interface will be writing the serialization and deserialization routines. The specific serialization protocol used by yggdrasil is described in detail here here. The developer adding support for the new language will need to add support for serialization of all of the datatypes listed in this table.
Sending Large Messages¶
Most communication mechanisms have limits on the sizes of messages that can be sent as single messages (e.g. 2048 for IPC on MacOS). To overcome this yggdrasil splits up serialized messages that are larger than the limit (including the serialized header) into smaller messages and sending them through a new connection created explicitly for carrying the large message.
When a model is sending to output, it should check the size of each outgoing message (including the header). If a message exceeds the limit, it should
Create a new work comm and add the work comm’s address to the message header under the ‘address’ key.
Send the revised header with as much of the message as will fit within the limit.
Send the remains of the message as chunks with sizes set by the limit through the new work comm.
When a model is receiving, it should check that the message is not smaller than the size indicated by the header. If the message is smaller, it should
Create a new work comm using the address in the message header.
Continue receiving messages through the work comm until the message is complete (or the connection is closed).
Combine the received message chunks to form the complete messages and deserialize them.
When creating an interface in a new language, the developer must replicate this behavior.
Installation Script [OPTIONAL]¶
If there are additional steps that should be taken during the installation of
yggdrasil to allow a language to be supported (e.g. installing dependencies
that are not covered by a Python package manager), developers can add these
to a script called install.py
in the directory they create for their language
under yggdrasil/languages
. This file should, at minimum, include a function
called install
that dosn’t require any input and returns a boolean indicating
the success or failuer of the additional installation steps. This function can
also be used to check for the existance of dependencies so that a warning is
printed during install to advise the user. In addition to the install
function
developer can also set a name_in_pragmas
variable. This should be a string
that is used to set coverage pragmas that will be ignored during coverage if
a language will not be set. (e.g. lines marked by # pragma: matlab
are
not covered if MATLAB is not supported while lines marked with # pragma: no matlab
are not covered if MATLAB is installed). If not set, the lower case version of
the language directory name is assumed for the pragmas. This does not change the
behavior of the code, only how the coverage report is generated.