Adding Support for a New Language ################################# The |yggdrasil| package has been redesigned to make adding support for a new language as easy as possible, but developers will need some Python programming knowledge and a descent familiarity with the language being added. Write the Language Driver ========================= The first step in adding language support is to write a driver for the language. Model drivers take care of things like writting any necessary wrappers, compiling the code (if necessary), and running the code. Generally, new languages will fall into one of two categories, interpreted or compiled. Based on the category that the language falls under, developers should use the associated base class (:class:`yggdrasil.drivers.InterpretedModelDriver.InterpretedModelDriver` or :class:`yggdrasil.drivers.CompiledModelDriver.CompiledModelDriver`) as a parent class. These base classes parameterize the required model driver operations so that developers should not have to write a large amount of code. In addition to the type specific steps below, developers can control the behavior of their class by defining the following class attributes: .. include:: ../class_tables/class_table_ModelDriver_classattr.rst and the class method ``is_library_installed``, which is used to determine if dependencies are installed. Model drivers should go in the ``yggdrasil/drivers`` directory and tests should go in the ``yggdrasil/drivers/tests`` directory. Interpreted Languages --------------------- Additional class attributes specific to interpreted model drivers include: .. include:: ../class_tables/class_table_InterpretedModelDriver_classattr.rst Compiled Languages ------------------ Additional class attributes specific to compiled model drivers include: .. include:: ../class_tables/class_table_CompiledModelDriver_classattr.rst Compilation Tools ................. For compiled languages, |yggdrasil| allows multiple compilation tools to be defined for the same language, particularly when different tools are required on different operating systems. In these cases, developers should create classes for the compilation tools (i.e. compilers, linkers, archivers) associated with the language. |yggdrasil| defines several base classes for this purpose which should be used as parent classes for any new tools. For compilers, the class is :class:`yggdrasil.drivers.CompiledModelDriver.CompilerBase`. The behavior of the compiler is defined by these class attributes: .. include:: ../class_tables/class_table_CompilerBase_classattr.rst Most compilers, also serve as linkers so it is unlikely that developers will need to define new linkers (outside of the linker related compiler class attributes above), but there is also a linker base class if developers need finer tuned access to the class's behavior. For linkers, the class is :class:`yggdrasil.drivers.CompiledModelDriver.LinkerBase` adn the behavior of the linker is defined by these class attributes: .. include:: ../class_tables/class_table_LinkerBase_classattr.rst Many archivers can be used for multiple languages so check the other languages before adding a new one. If the target archiver already exists for other languages, developers should add the new language to the accepted list of languages on the class associated with the archiver. For archivers, the base class is :class:`yggdrasil.drivers.CompiledModelDriver.ArchiverBase`. The behavior of the archiver is defined by these class attributes: .. include:: ../class_tables/class_table_ArchiverBase_classattr.rst Write the Language Communication Interface ========================================== The second phase of adding support for a new language is to write the language interface. This step is more involved than writing the model driver for the language, but the majority of the required development will be in the language being added. Tools required for language support that are not meant to be accessed via the |yggdrasil| Python package (e.g. the language interface or conversion functions) should go in specific language directory under `yggdrasil/languages` with a name identifying the languagye (e.g. `yggdrasil/languages/MATLAB` for the MATLAB interface contains conversion functions and the interface classes/functions written in MATLAB). For new languages, developers should first do a review to identify existing tools for calling code in one of the languages that |yggdrasil| already supports (e.g. the R interface uses the `reticulate `_ package to call the Python interface from R). If such a tool exists, then the developers task is must easier. From an Interface in a Supported Language (Recommnded) ------------------------------------------------------ If there is an existing tool for accessing code written in one of the supporting languages, the developer will use that tool to wrap the interface from the already supported language. Examples of this can be found in the Matlab and R interface which both wrap the Python interface. The wrapper interface must have, at minimum: #. *Functions/Classes for creating communicator objects.* The created functions/classes should take as input a channel name (and optional format string for creating communicator objects for output), calls the wrapped interface, and returns the class/object representing the communicator in a form that can be used in the language being added. For object oriented languages, it may be easiest to create a new class that wraps access to the object returned by the wrapped interface. There must be a way to distinguish from input and output communicators either by exposing separate functions/classes or via an explicit argument. #. *Functions/Methods for calling the wrapped send/recv functions/methods.* The created functions/classes must be able to access the wrapped communicator class or data object and call the appropriate send/recv function or method, converting the inputs and outputs of these functions into forms that make sense for the language being added (See next point). #. *Conversion functions/methods.* While tools for calling external programming languages often handle most of the type conversion necessary for the two languages to interact, these conversions are often incomplete or insufficient for the purposes of |yggdrasil| (e.g. R does not have built-in support for variable precision integers and float). In such cases, the developer adding the language may need to write a conversion function that handles these inconsistencies. From Scratch ------------ Create "Comm" Class/Object .......................... For a language to added, there must be an interface to at least one of the supported communication mechanisms. Because it is widely supported in different programming languages, we recommend adding a ZeroMQ communication interface as a starting point. The new interface will need to defined a class or data object that wraps access to the underlying communication mechanism (e.g. ZeroMQ). This includes creating the communication connection based on a channel name. |yggdrasil| will store information about the communication associated with the connection in an environment variable with the name ``'_IN'`` if the channel is providing input to the model and ``'_OUT'`` if the channel is handling output from the model. The exact content of the environment variable depends on the communication mechanisms as shown in the table below. Required Methods/Functions .......................... The interface must also provide methods/functions for accesing the underlying communication class's methods for sending and receiving messages. This is usually straight forward (assuming serialization is implemented), but there are some communication mechanisms with require some additional features (e.g. ZeroMQ). The table below includes some notes on implementing each of the supported communication mechanisms. .. include:: ../tables/comm_devnotes_table.rst Implement Serialization ....................... The most involved part of writing the interface will be writing the serialization and deserialization routines. The specific serialization protocol used by |yggdrasil| is described in detail here :ref:`here `. The developer adding support for the new language will need to add support for serialization of all of the datatypes listed in :ref:`this table `. .. Message Headers ............... |yggdrasil| uses message headers to send information about the data contained in the messages, as well as, information about the communication pattern. New language interfaces should define and send the information listed in :ref:`this table `, as well as any metadata required to deserialize the message. Sending Large Messages ...................... Most communication mechanisms have limits on the sizes of messages that can be sent as single messages (e.g. 2048 for IPC on MacOS). To overcome this |yggdrasil| splits up serialized messages that are larger than the limit (including the serialized header) into smaller messages and sending them through a new connection created explicitly for carrying the large message. When a model is sending to output, it should check the size of each outgoing message (including the header). If a message exceeds the limit, it should #. Create a new work comm and add the work comm's address to the message header under the 'address' key. #. Send the revised header with as much of the message as will fit within the limit. #. Send the remains of the message as chunks with sizes set by the limit through the new work comm. When a model is receiving, it should check that the message is not smaller than the size indicated by the header. If the message is smaller, it should #. Create a new work comm using the address in the message header. #. Continue receiving messages through the work comm until the message is complete (or the connection is closed). #. Combine the received message chunks to form the complete messages and deserialize them. When creating an interface in a new language, the developer must replicate this behavior. Installation Script [OPTIONAL] ============================== If there are additional steps that should be taken during the installation of |yggdrasil| to allow a language to be supported (e.g. installing dependencies that are not covered by a Python package manager), developers can add these to a script called ``install.py`` in the directory they create for their language under ``yggdrasil/languages``. This file should, at minimum, include a function called ``install`` that dosn't require any input and returns a boolean indicating the success or failuer of the additional installation steps. This function can also be used to check for the existance of dependencies so that a warning is printed during install to advise the user. In addition to the ``install`` function developer can also set a ``name_in_pragmas`` variable. This should be a string that is used to set coverage pragmas that will be ignored during coverage if a language will not be set. (e.g. lines marked by ``# pragma: matlab`` are not covered if MATLAB is not supported while lines marked with ``# pragma: no matlab`` are not covered if MATLAB is installed). If not set, the lower case version of the language directory name is assumed for the pragmas. This does not change the behavior of the code, only how the coverage report is generated.