Getting started

The yggdrasil framework runs user defined models and orchestrates asynchronous communication between models using drivers that coordinate the different components via threads. Model drivers run the models as seperate processes and monitor them to redirect output to stdout and determine if the model is still running, needs to be terminated, or has encountered an error. Input/output drivers connect communication channels (comms) between models and/or files. On the model side, interface API functions/classes are provided in different programming languages to allow models to access these comms.

Running a model

Models are run by creating a YAML file that specifies the location of the model code and the type of model. Consider the following model which just prints a single line of output to stdout:

Model Code:

1print('Hello from Python')

(Example in other languages)

The YAML file to run this model would then be:

Model YAML:

1models:
2  - name: python_model
3    language: python  # Runs the python script using default Python
4    args: ./src/gs_lesson1.py

(Example in other languages)

The first line signals that there is a model, the second line is the name that should be associated with the model for logging, the third line tells the framework which language the model is written in (and therefore which driver should be used to execute the model), and the forth line is the path to the model source code that should be run. There are specialized drivers for simple source written in Python, Matlab, C, and C++, but any executable can be run as a model using language: executable and passing the path to the executable to the args parameter. Additional information on the format yggdrasil YAML files should take can be found in the YAML Files section.

This model can then be run using the yggdrasil framework by calling the commandline entry point yggrun followed by the path to the YAML.:

$ yggrun model.yml

Running multiple models

Multiple models can be run by either passing multiple YAML files to yggrun:

$ yggrun model1.yml model2.yml

or including multiple models in a single YAML file.

Model YAML:

1models:
2  - name: python_model1
3    language: python  # Runs the python script using default Python
4    args: ./src/gs_lesson2.py
5
6  - name: python_model2
7    language: python
8    args: ./src/gs_lesson2.py

(Example in other languages)

Running remote models

Models stored on remote Git repositories can be run by prepending ‘git:’ to the YAML file:

$ yggrun git:http://github.com/foo/bar/yam/remote_model.yml

yggrun will clone the repo (foo/bar in this example) and then process remote_model.yml as normal. The host site need not be specified if it is github.com:

$ yggrun git:foo/bar/yam/remote_model.yml

will behave identically to the first example. Remote and local models can be mixed on the command line:

$ yggrun model1.yml git:foo/bar/yam/remote_model.yml model2.yml

Model file input/output

Models can get input from or send output to files via input and output channels. To do so yggdrasil provides several useful functions for interfacing with these channels. In the example below, the model receives input from a channel named input and sends output to a channel named output.

Model Code:

 1# Import classes for input/output channels
 2from yggdrasil.interface.YggInterface import YggInput, YggOutput
 3
 4# Initialize input/output channels
 5in_channel = YggInput('input')
 6out_channel = YggOutput('output')
 7
 8# Loop until there is no longer input or the queues are closed
 9while True:
10    
11    # Receive input from input channel
12    # If there is an error, the flag will be False
13    flag, msg = in_channel.recv()
14    if not flag:
15        print("No more input.")
16        break
17
18    # Print received message
19    print(msg)
20
21    # Send output to output channel
22    # If there is an error, the flag will be False
23    flag = out_channel.send(msg)
24    if not flag:
25        print("Error sending output.")
26        break

(Example in other languages)

Note

Real models YAMLs should use more description names for the input and output channels to make it easier for collaborators to determine the information begin passed through the channel.

In the YAML used to run this model, those channels are declared in the model definition and then linked to files by entries in the connections section of the YAML.

Model YAML:

 1models:
 2  - name: python_model
 3    language: python  # Runs the python script using default Python
 4    args: ./src/gs_lesson3.py
 5    inputs:
 6      - input
 7    outputs:
 8      - output
 9
10connections:
11  - input_file: ./Input/input.txt
12    output: input
13  - input: output
14    output: ./output.txt

(Example in other languages)

The input_file and output_file connection fields can either be the path to the file (either absolute or relative to the directory containing the YAML file) or a mapping with fields descripting the file. In particular, the filetype keyword specifies the format of the file being read/written. Supported values include:

Filetype

Description

ascii

This file is read/written as encoded text one line at a time.

bam

bam sequence I/O

bcf

bcf sequence I/O

binary

[DEFAULT] The entire file is read/written all at once as bytes.

bmp

bmp image I/O

cabo

The file is a CABO parameter file.

cram

cram sequence I/O

eps

eps image I/O

excel

The file is read/written as Excel

fasta

fasta sequence I/O

fastq

fastq sequence I/O

gif

gif image I/O

jpeg

jpeg image I/O

json

The file contains a JSON serialized object.

map

The file contains a key/value mapping with one key/value pair per line and separated by some delimiter.

mat

The file is a Matlab .mat file containing one or more serialized Matlab variables.

netcdf

The file is read/written as netCDF.

obj

The file is in the Obj data format for 3D structures.

pandas

The file is a Pandas frame output as a table.

pickle

The file contains one or more pickled Python objects.

ply

The file is in the Ply data format for 3D structures.

png

png image I/O

sam

sam sequence I/O

table

The file is an ASCII table that will be read/written one row at a time. If as_array is True, the table will be read/written all at once.

tiff

tiff image I/O

vcf

vcf sequence I/O

yaml

The file contains a YAML serialized object.

The above example shows the basic case of receiving raw messages from a channel, but there are also interface functions which can process these raw messages to extract variables and fields for the model inputs and outputs to specify how that should be done. For examples of how to use formatted messages with the above file types and input/output options, see Formatted I/O.

Model-to-model communication (with connections)

Models can also communicate with each other in the same fashion. In the example below, model A receives input from a channel named ‘inputA’ and sends output to a channel named ‘outputA’, while model B receives input from a channel named ‘inputB’ and sends output to a channel named ‘outputB’.

Model Code:

 1# Import classes for input/output channels
 2from yggdrasil.interface.YggInterface import YggInput, YggOutput
 3
 4# Initialize input/output channels
 5in_channel = YggInput('inputA')
 6out_channel = YggOutput('outputA')
 7
 8# Loop until there is no longer input or the queues are closed
 9while True:
10
11    # Receive input from input channel
12    # If there is an error, the flag will be False
13    flag, msg = in_channel.recv()
14    if not flag:
15        print("Model A: No more input.")
16        break
17
18    # Print received message
19    print('Model A: %s' % msg)
20
21    # Send output to output channel
22    # If there is an error, the flag will be False
23    flag = out_channel.send(msg)
24    if not flag:
25        raise RuntimeError("Model A: Error sending output.")
 1# Import classes for input/output channels
 2from yggdrasil.interface.YggInterface import YggInput, YggOutput
 3
 4# Initialize input/output channels
 5in_channel = YggInput('inputB')
 6out_channel = YggOutput('outputB')
 7
 8# Loop until there is no longer input or the queues are closed
 9while True:
10
11    # Receive input from input channel
12    # If there is an error, the flag will be False
13    flag, msg = in_channel.recv()
14    if not flag:
15        print("Model B: No more input.")
16        break
17
18    # Print received message
19    print('Model B: %s' % msg)
20
21    # Send output to output channel
22    # If there is an error, the flag will be False
23    flag = out_channel.send(msg)
24    if not flag:
25        raise RuntimeError("Model B: Error sending output.")

(Example in other languages)

In the YAML, ‘inputA’ is connected to a local file, ‘outputA’ is connected to ‘inputB’, and ‘outputB’ is connected to a local file in the connections section of the YAML.

Model YAML:

 1models:
 2  - name: python_modelA
 3    language: python
 4    args: ./src/gs_lesson4_modelA.py
 5    inputs: inputA
 6    outputs: outputA
 7
 8  - name: python_modelB
 9    language: python
10    args: ./src/gs_lesson4_modelB.py
11    inputs: inputB
12    outputs: outputB
13
14connections:
15  - input: outputA  # Connection between model A output & model B input
16    output: inputB
17  - input: ./Input/input.txt  # Connection between file and model A input
18    output: inputA
19  - input: outputB  # Connection between model B output and file
20    output: ./output.txt

(Example in other languages)

Model-to-model communication (with drivers)

For backwards compatibility, connections can also be specified in terms of the underlying drivers without an explicit connections section. The exact same models from the previous example can be connected using the following YAML.

Model YAML:

 1models:
 2  - name: python_modelA
 3    language: python
 4    args: ./src/gs_lesson4b_modelA.py
 5
 6    inputs:
 7      - name: inputA
 8        driver: FileInputDriver
 9        args: ./Input/input.txt
10
11    outputs:
12      - name: outputA
13        driver: OutputDriver  # Output to another channel
14        args: A_to_B  # Connection to inputB
15
16  - name: python_modelB
17    language: python
18    args: ./src/gs_lesson4b_modelB.py
19
20    inputs:
21      - name: inputB
22        driver: InputDriver  # Input from another channel
23        args: A_to_B  # Conneciton to inputA
24
25    outputs:
26      - name: outputB
27        driver: FileOutputDriver
28        args: ./output.txt

(Example in other languages)

In this schema, model input and output entries must have the following fields:

Field

Description

name

The name of the channel that will be used by the model.

driver

The name of the driver that should be used to process input/output.

args

A string matching the args field of an opposing input / output field in another model or the path to a file that should be read/written.

A list of possible Input/Output drivers can be found here.

Todo

Link to example with translation at connection.