Getting started¶

The yggdrasil framework runs user defined models and orchestrates asynchronous communication between models using drivers that coordinate the different components via threads. Model drivers run the models as seperate processes and monitor them to redirect output to stdout and determine if the model is still running, needs to be terminated, or has encountered an error. Input/output drivers connect communication channels (comms) between models and/or files. On the model side, interface API functions/classes are provided in different programming languages to allow models to access these comms.

Running a model¶

Models are run by creating a YAML file that specifies the location of the model code and the type of model. Consider the following model which just prints a single line of output to stdout:

Model Code:

print('Hello from Python')

(Example in other languages)

The YAML file to run this model would then be:

Model YAML:

models:
  - name: python_model
    language: python  # Runs the python script using default Python
    args: ./src/gs_lesson1.py

(Example in other languages)

The first line signals that there is a model, the second line is the name that should be associated with the model for logging, the third line tells the framework which language the model is written in (and therefore which driver should be used to execute the model), and the forth line is the path to the model source code that should be run. There are specialized drivers for simple source written in Python, Matlab, C, and C++, but any executable can be run as a model using language: executable and passing the path to the executable to the args parameter. Additional information on the format yggdrasil YAML files should take can be found in the YAML Files section.

This model can then be run using the yggdrasil framework by calling the commandline entry point yggrun followed by the path to the YAML.:

$ yggrun model.yml

Running multiple models¶

Multiple models can be run by either passing multiple YAML files to yggrun:

$ yggrun model1.yml model2.yml

or including multiple models in a single YAML file.

Model YAML:

models:
  - name: python_model1
    language: python  # Runs the python script using default Python
    args: ./src/gs_lesson2.py

  - name: python_model2
    language: python
    args: ./src/gs_lesson2.py

(Example in other languages)

Running remote models¶

Models stored on remote Git repositories can be run by prepending ‘git:’ to the YAML file:

$ yggrun git:http://github.com/foo/bar/yam/remote_model.yml

yggrun will clone the repo (foo/bar in this example) and then process remote_model.yml as normal. The host site need not be specified if it is github.com:

$ yggrun git:foo/bar/yam/remote_model.yml

will behave identically to the first example. Remote and local models can be mixed on the command line:

$ yggrun model1.yml git:foo/bar/yam/remote_model.yml model2.yml

Model file input/output¶

Models can get input from or send output to files via input and output channels. To do so yggdrasil provides several useful functions for interfacing with these channels. In the example below, the model receives input from a channel named input and sends output to a channel named output.

Model Code:

# Import classes for input/output channels
from yggdrasil.interface.YggInterface import YggInput, YggOutput

# Initialize input/output channels
in_channel = YggInput('input')
out_channel = YggOutput('output')

# Loop until there is no longer input or the queues are closed
while True:
    
    # Receive input from input channel
    # If there is an error, the flag will be False
    flag, msg = in_channel.recv()
    if not flag:
        print("No more input.")
        break

    # Print received message
    print(msg)

    # Send output to output channel
    # If there is an error, the flag will be False
    flag = out_channel.send(msg)
    if not flag:
        print("Error sending output.")
        break

(Example in other languages)

Note

Real models YAMLs should use more description names for the input and output channels to make it easier for collaborators to determine the information begin passed through the channel.

In the YAML used to run this model, those channels are declared in the model definition and then linked to files by entries in the connections section of the YAML.

Model YAML:

models:
  - name: python_model
    language: python  # Runs the python script using default Python
    args: ./src/gs_lesson3.py
    inputs:
      - input
    outputs:
      - output

connections:
  - input_file: ./Input/input.txt
    output: input
  - input: output
    output: ./output.txt

(Example in other languages)

The input_file and output_file connection fields can either be the path to the file (either absolute or relative to the directory containing the YAML file) or a mapping with fields descripting the file. In particular, the filetype keyword specifies the format of the file being read/written. Supported values include:

Filetype	Description
ascii	This file is read/written as encoded text one line at a time.
bam	bam sequence I/O
bcf	bcf sequence I/O
binary	[DEFAULT] The entire file is read/written all at once as bytes.
bmp	bmp image I/O
cabo	The file is a CABO parameter file.
cram	cram sequence I/O
eps	eps image I/O
excel	The file is read/written as Excel
fasta	fasta sequence I/O
fastq	fastq sequence I/O
gif	gif image I/O
jpeg	jpeg image I/O
json	The file contains a JSON serialized object.
map	The file contains a key/value mapping with one key/value pair per line and separated by some delimiter.
mat	The file is a Matlab .mat file containing one or more serialized Matlab variables.
netcdf	The file is read/written as netCDF.
obj	The file is in the Obj data format for 3D structures.
pandas	The file is a Pandas frame output as a table.
pickle	The file contains one or more pickled Python objects.
ply	The file is in the Ply data format for 3D structures.
png	png image I/O
sam	sam sequence I/O
table	The file is an ASCII table that will be read/written one row at a time. If `as_array` is `True`, the table will be read/written all at once.
tiff	tiff image I/O
vcf	vcf sequence I/O
yaml	The file contains a YAML serialized object.

The above example shows the basic case of receiving raw messages from a channel, but there are also interface functions which can process these raw messages to extract variables and fields for the model inputs and outputs to specify how that should be done. For examples of how to use formatted messages with the above file types and input/output options, see Formatted I/O.

Model-to-model communication (with connections)¶

Models can also communicate with each other in the same fashion. In the example below, model A receives input from a channel named ‘inputA’ and sends output to a channel named ‘outputA’, while model B receives input from a channel named ‘inputB’ and sends output to a channel named ‘outputB’.

Model Code:

# Import classes for input/output channels
from yggdrasil.interface.YggInterface import YggInput, YggOutput

# Initialize input/output channels
in_channel = YggInput('inputA')
out_channel = YggOutput('outputA')

# Loop until there is no longer input or the queues are closed
while True:

    # Receive input from input channel
    # If there is an error, the flag will be False
    flag, msg = in_channel.recv()
    if not flag:
        print("Model A: No more input.")
        break

    # Print received message
    print('Model A: %s' % msg)

    # Send output to output channel
    # If there is an error, the flag will be False
    flag = out_channel.send(msg)
    if not flag:
        raise RuntimeError("Model A: Error sending output.")

# Import classes for input/output channels
from yggdrasil.interface.YggInterface import YggInput, YggOutput

# Initialize input/output channels
in_channel = YggInput('inputB')
out_channel = YggOutput('outputB')

# Loop until there is no longer input or the queues are closed
while True:

    # Receive input from input channel
    # If there is an error, the flag will be False
    flag, msg = in_channel.recv()
    if not flag:
        print("Model B: No more input.")
        break

    # Print received message
    print('Model B: %s' % msg)

    # Send output to output channel
    # If there is an error, the flag will be False
    flag = out_channel.send(msg)
    if not flag:
        raise RuntimeError("Model B: Error sending output.")

(Example in other languages)

In the YAML, ‘inputA’ is connected to a local file, ‘outputA’ is connected to ‘inputB’, and ‘outputB’ is connected to a local file in the connections section of the YAML.

Model YAML:

models:
  - name: python_modelA
    language: python
    args: ./src/gs_lesson4_modelA.py
    inputs: inputA
    outputs: outputA

  - name: python_modelB
    language: python
    args: ./src/gs_lesson4_modelB.py
    inputs: inputB
    outputs: outputB

connections:
  - input: outputA  # Connection between model A output & model B input
    output: inputB
  - input: ./Input/input.txt  # Connection between file and model A input
    output: inputA
  - input: outputB  # Connection between model B output and file
    output: ./output.txt

(Example in other languages)

Model-to-model communication (with drivers)¶

For backwards compatibility, connections can also be specified in terms of the underlying drivers without an explicit connections section. The exact same models from the previous example can be connected using the following YAML.

Model YAML:

models:
  - name: python_modelA
    language: python
    args: ./src/gs_lesson4b_modelA.py

    inputs:
      - name: inputA
        driver: FileInputDriver
        args: ./Input/input.txt

    outputs:
      - name: outputA
        driver: OutputDriver  # Output to another channel
        args: A_to_B  # Connection to inputB

  - name: python_modelB
    language: python
    args: ./src/gs_lesson4b_modelB.py

    inputs:
      - name: inputB
        driver: InputDriver  # Input from another channel
        args: A_to_B  # Conneciton to inputA

    outputs:
      - name: outputB
        driver: FileOutputDriver
        args: ./output.txt

(Example in other languages)

In this schema, model input and output entries must have the following fields:

Field	Description
name	The name of the channel that will be used by the model.
driver	The name of the driver that should be used to process input/output.
args	A string matching the args field of an opposing `input` / `output` field in another model or the path to a file that should be read/written.

A list of possible Input/Output drivers can be found here.

Todo

Link to example with translation at connection.