Getting started¶
The yggdrasil framework runs user defined models and orchestrates asynchronous communication between models using drivers that coordinate the different components via threads. Model drivers run the models as seperate processes and monitor them to redirect output to stdout and determine if the model is still running, needs to be terminated, or has encountered an error. Input/output drivers connect communication channels (comms) between models and/or files. On the model side, interface API functions/classes are provided in different programming languages to allow models to access these comms.
Running a model¶
Models are run by creating a YAML file that specifies the location of the model code and the type of model. Consider the following model which just prints a single line of output to stdout:
Model Code:
1print('Hello from Python')
The YAML file to run this model would then be:
Model YAML:
1models:
2 - name: python_model
3 language: python # Runs the python script using default Python
4 args: ./src/gs_lesson1.py
The first line signals that there is a model, the second line is the name that
should be associated with the model for logging, the third line tells the
framework which language the model is written in (and therefore which driver
should be used to execute the model), and the forth line is
the path to the model source code that should be run. There are specialized
drivers for simple source written in Python, Matlab, C, and C++, but any
executable can be run as a model using language: executable
and passing the
path to the executable to the args
parameter. Additional information on
the format yggdrasil YAML files should take can be found in the
YAML Files section.
This model can then be run using the yggdrasil framework by calling the commandline entry point yggrun followed by the path to the YAML.:
$ yggrun model.yml
Running multiple models¶
Multiple models can be run by either passing multiple YAML files to yggrun:
$ yggrun model1.yml model2.yml
or including multiple models in a single YAML file.
Model YAML:
1models:
2 - name: python_model1
3 language: python # Runs the python script using default Python
4 args: ./src/gs_lesson2.py
5
6 - name: python_model2
7 language: python
8 args: ./src/gs_lesson2.py
Running remote models¶
Models stored on remote Git repositories can be run by prepending ‘git:’ to the YAML file:
$ yggrun git:http://github.com/foo/bar/yam/remote_model.yml
yggrun will clone the repo (foo/bar in this example) and then process remote_model.yml as normal. The host site need not be specified if it is github.com:
$ yggrun git:foo/bar/yam/remote_model.yml
will behave identically to the first example. Remote and local models can be mixed on the command line:
$ yggrun model1.yml git:foo/bar/yam/remote_model.yml model2.yml
Model file input/output¶
Models can get input from or send output to files via input and output channels.
To do so yggdrasil provides several useful functions for interfacing with
these channels. In the example below, the model receives input from a channel
named input
and sends output to a channel named output
.
Model Code:
1# Import classes for input/output channels
2from yggdrasil.interface.YggInterface import YggInput, YggOutput
3
4# Initialize input/output channels
5in_channel = YggInput('input')
6out_channel = YggOutput('output')
7
8# Loop until there is no longer input or the queues are closed
9while True:
10
11 # Receive input from input channel
12 # If there is an error, the flag will be False
13 flag, msg = in_channel.recv()
14 if not flag:
15 print("No more input.")
16 break
17
18 # Print received message
19 print(msg)
20
21 # Send output to output channel
22 # If there is an error, the flag will be False
23 flag = out_channel.send(msg)
24 if not flag:
25 print("Error sending output.")
26 break
Note
Real models YAMLs should use more description names for the input and output channels to make it easier for collaborators to determine the information begin passed through the channel.
In the YAML used to run this model, those channels are declared in the model
definition and then linked to files by entries in the connections
section
of the YAML.
Model YAML:
1models:
2 - name: python_model
3 language: python # Runs the python script using default Python
4 args: ./src/gs_lesson3.py
5 inputs:
6 - input
7 outputs:
8 - output
9
10connections:
11 - input_file: ./Input/input.txt
12 output: input
13 - input: output
14 output: ./output.txt
The input_file
and output_file
connection fields can either be
the path to the file (either absolute or relative to the directory
containing the YAML file) or a mapping with fields descripting the
file. In particular, the filetype
keyword specifies the format of
the file being read/written. Supported values include:
Filetype |
Description |
---|---|
ascii |
This file is read/written as encoded text one line at a time. |
bam |
bam sequence I/O |
bcf |
bcf sequence I/O |
binary |
[DEFAULT] The entire file is read/written all at once as bytes. |
bmp |
bmp image I/O |
cabo |
The file is a CABO parameter file. |
cram |
cram sequence I/O |
eps |
eps image I/O |
excel |
The file is read/written as Excel |
fasta |
fasta sequence I/O |
fastq |
fastq sequence I/O |
gif |
gif image I/O |
jpeg |
jpeg image I/O |
json |
The file contains a JSON serialized object. |
map |
The file contains a key/value mapping with one key/value pair per line and separated by some delimiter. |
mat |
The file is a Matlab .mat file containing one or more serialized Matlab variables. |
netcdf |
The file is read/written as netCDF. |
obj |
The file is in the Obj data format for 3D structures. |
pandas |
The file is a Pandas frame output as a table. |
pickle |
The file contains one or more pickled Python objects. |
ply |
The file is in the Ply data format for 3D structures. |
png |
png image I/O |
sam |
sam sequence I/O |
table |
The file is an ASCII table that will be read/written one row at a time. If
|
tiff |
tiff image I/O |
vcf |
vcf sequence I/O |
yaml |
The file contains a YAML serialized object. |
The above example shows the basic case of receiving raw messages from a channel,
but there are also interface functions which can process these raw messages to
extract variables and fields for the model inputs
and outputs
to
specify how that should be done. For examples of how to use formatted messages
with the above file types and input/output options, see
Formatted I/O.
Model-to-model communication (with connections)¶
Models can also communicate with each other in the same fashion. In the example below, model A receives input from a channel named ‘inputA’ and sends output to a channel named ‘outputA’, while model B receives input from a channel named ‘inputB’ and sends output to a channel named ‘outputB’.
Model Code:
1# Import classes for input/output channels
2from yggdrasil.interface.YggInterface import YggInput, YggOutput
3
4# Initialize input/output channels
5in_channel = YggInput('inputA')
6out_channel = YggOutput('outputA')
7
8# Loop until there is no longer input or the queues are closed
9while True:
10
11 # Receive input from input channel
12 # If there is an error, the flag will be False
13 flag, msg = in_channel.recv()
14 if not flag:
15 print("Model A: No more input.")
16 break
17
18 # Print received message
19 print('Model A: %s' % msg)
20
21 # Send output to output channel
22 # If there is an error, the flag will be False
23 flag = out_channel.send(msg)
24 if not flag:
25 raise RuntimeError("Model A: Error sending output.")
1# Import classes for input/output channels
2from yggdrasil.interface.YggInterface import YggInput, YggOutput
3
4# Initialize input/output channels
5in_channel = YggInput('inputB')
6out_channel = YggOutput('outputB')
7
8# Loop until there is no longer input or the queues are closed
9while True:
10
11 # Receive input from input channel
12 # If there is an error, the flag will be False
13 flag, msg = in_channel.recv()
14 if not flag:
15 print("Model B: No more input.")
16 break
17
18 # Print received message
19 print('Model B: %s' % msg)
20
21 # Send output to output channel
22 # If there is an error, the flag will be False
23 flag = out_channel.send(msg)
24 if not flag:
25 raise RuntimeError("Model B: Error sending output.")
In the YAML, ‘inputA’ is connected to a local file, ‘outputA’ is connected to
‘inputB’, and ‘outputB’ is connected to a local file in the connections
section of the YAML.
Model YAML:
1models:
2 - name: python_modelA
3 language: python
4 args: ./src/gs_lesson4_modelA.py
5 inputs: inputA
6 outputs: outputA
7
8 - name: python_modelB
9 language: python
10 args: ./src/gs_lesson4_modelB.py
11 inputs: inputB
12 outputs: outputB
13
14connections:
15 - input: outputA # Connection between model A output & model B input
16 output: inputB
17 - input: ./Input/input.txt # Connection between file and model A input
18 output: inputA
19 - input: outputB # Connection between model B output and file
20 output: ./output.txt
Model-to-model communication (with drivers)¶
For backwards compatibility, connections can also be specified in terms of
the underlying drivers without an explicit connections
section. The
exact same models from the previous example can be connected using the
following YAML.
Model YAML:
1models:
2 - name: python_modelA
3 language: python
4 args: ./src/gs_lesson4b_modelA.py
5
6 inputs:
7 - name: inputA
8 driver: FileInputDriver
9 args: ./Input/input.txt
10
11 outputs:
12 - name: outputA
13 driver: OutputDriver # Output to another channel
14 args: A_to_B # Connection to inputB
15
16 - name: python_modelB
17 language: python
18 args: ./src/gs_lesson4b_modelB.py
19
20 inputs:
21 - name: inputB
22 driver: InputDriver # Input from another channel
23 args: A_to_B # Conneciton to inputA
24
25 outputs:
26 - name: outputB
27 driver: FileOutputDriver
28 args: ./output.txt
In this schema, model input
and output
entries
must have the following fields:
Field |
Description |
---|---|
name |
The name of the channel that will be used by the model. |
driver |
The name of the driver that should be used to process input/output. |
args |
A string matching the args field of an opposing |
A list of possible Input/Output drivers can be found here.
Todo
Link to example with translation at connection.