YAML Files

Models and communication between models during are specified by the user in one or more YAML files. YAML files have a human readable structure that can be parsed by many different programming languages to recreate data structures. While the YAML language can express very complex data structures (more information can be found here), only a few key concepts are needed to create a YAML file for use with the yggdrasil framework.

  • Indentation: Entries with the same indentation belong to the same collection.

  • Sequences: Entries that begin with a dash and a space (- ) are members of a sequence collection. Members of a sequence can be text, collections, or a mix of both.

  • Mappings: Mapping entries use a colon and a space (: ) to seperate a key: value pair. Keys are text and values can be text or a collection.

Models

At the root level of a yggdrasil YAML, should be a mapping key model: or models:. This denotes information pertaining to the model(s) that should be run. The value for this key can be a single model entry:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py

or a sequence of model entries:

models:
  - name: modelA
    language: python
    args: ./src/gs_lesson4_modelA.py
  - name: modelB
    language: c
    args: ./src/gs_lesson4_modelB.c

Inputs and outputs to the models are then controlled via the input/inputs and/or output/outputs keys. Each input/output entry for the models need only contain a unique name for the communication channel. This can be specified as text:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py
  input: channel_name

or a key, value mapping:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py
  input:
    name: channel_name

The key/value mapping form should be used when other information about the communication channel needs to be provided (e.g. message format, field names, units). (See Input/Output Options for information about the available options for communication channels).

Models can also contain more than one input and/or output:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py
  inputs:
    - in_channel_name1
    - in_channel_name2
  outputs:
    - out_channel_name1
    - out_channel_name2
    - out_channel_name3

Connections

In order to connect models inputs/outputs to files and/or other model inputs/outputs, the yaml(s) must all contain a connections key/value pair. The coordesponding value for the connections key should be one or more mapping collection describing a connection entry. At a minimum each connection entry should have one input key (input, inputs, input_file) and one output key (output, outputs, output_file):

connections:
  - input: out_channel_name1
    output: in_channel_name1
  - input: file1.txt
    output: in_channel_name2
  - inputs:
      - out_channel_name2
      - out_channel_name3
    output: file2.txt

Key

Description

input/input_file

The channel/file that messages should be recieved from. To specify a model channel, this should be the name of an entry in a model’s outputs section. If this is a file, it should be the absolute path to the file or the relative path to the file from the directory containing the YAML.

output/output_file

The channel/file that messages recieved from the input channel/file should be sent to. If the input value is a file, the output value cannot be a file. To specify a model channel, this should be the name of an entry in a model’s inputs section.

Additional information about connection entries, including the full list of available options, can be found here.

The connection entries are used to determine which driver should be used to connect communication channels/files. Any additional keys in the connection entry will be passed to the input/output driver that is created for the connection.

Validation

yggdrasil uses a JSON schema to validate the provided YAML specification files. If you would like to validate a set of YAML specification files without running the integration, this can be done via the yggvalidate CLI.:

$ yggvalidate name1.yml name2.yml ...

Model Options

General Model Options

Option

Type

Required

Description

args

array

X

The path to the file containing the model program that will be run by the driver for the model’s language and/or a list of arguments that should be passed as input to the model program or language executable (e.g. source code or configuration file for a domain specific language).

inputs

array

X

Zero or more channels carrying input to the model. A full description of channel entries and the options available for channels can be found here.

name

string

X

Name used for component in log messages.

outputs

array

X

Zero or more channels carrying output from the model. A full description of channel entries and the options available for channels can be found here.

working_dir

string

X

Working directory. If not provided, the current working directory is used.

additional_dependencies

object

A mapping between languages and lists of packages in those languages that are required by the model.

allow_threading

boolean

If True, comm connections will be set up so that the model-side comms can be used by more than one thread. Defaults to False.

client_of

array

The names of one or more models that this model will call as a server. If there are more than one, this should be specified as a sequence collection (list). The corresponding channel(s) that should be passed to the yggdrasil API will be the name of the server model joined with the name of the client model with an underscore <server_model>_<client_model>. There will be one channel created for each server the model is a client of. Defaults to empty list. Use of client_of with function is not currently supported.

contact_email

string

Email address that should be used to contact the maintainer of the model. This parameter is only used in the model repository.

copies

integer

The number of copies of the model that should be created. Defaults to 1.

dependencies

array

A list of packages required by the model that are written in the same language as the model. If the package requires dependencies outside the language of the model. use the additional_dependencies parameter to provide them. If you need a version of the package from a specific package manager, a mapping with ‘package’ and ‘package_manager’ fields can be provided instead of just the name of the package.

description

string

Description of the model. This parameter is only used in the model repository or when providing the model as a service.

driver

string

[DEPRECATED] Name of driver class that should be used.

env

object

Dictionary of environment variables that should be set when the driver starts. Defaults to {}.

function

string

If provided, an integrated model is created by wrapping the function named here. The function must be located within the file specified by the source file listed in the first argument. If not provided, the model must contain it’s own calls to the yggdrasil interface.

is_server

If True, the model is assumed to be a server for one or more client models and an instance of yggdrasil.drivers.ServerDriver is started. The corresponding channel that should be passed to the yggdrasil API will be the name of the model. If is_server is a dictionary, it should contain an ‘input’ key and an ‘output’ key. These are required to be the names of existing input and output channels in the model that will be co-opted by the server. (Note: This requires that the co-opted output channel’s send method is called once for each time the co-opted input channel’s recv method is called. If used with the function parameter, is_server must be a dictionary. Defaults to False.

iter_function_over

array

Variable(s) that should be received or sent as an array, but iterated over. Defaults to an empty array and is ignored.

language

string

The programming language that the model is written in. A list of available languages can be found here. (Options described here)

logging_level

string

The level of logging messages that should be displayed by the model. Defaults to the logging level as determined by the configuration file and environment variables.

outputs_in_inputs

boolean

If True, outputs from wrapped model functions are passed by pointer as inputs for modification and the return value will be a flag. If False, outputs are limited to return values. Defaults to the value of the class attribute outputs_in_inputs.

overwrite

boolean

If True, any existing model products (compilation products, wrapper scripts, etc.) are removed prior to the run. If False, the products are not removed. Defaults to True. Setting this to False can improve the performance, particularly for models that take a long time to compile, but this should only be done once the model has been fully debugged to ensure that each run is tested on a clean copy of the model. The value of this keyword also determines whether or not products are removed after a run.

preserve_cache

boolean

If True model products will be kept following the run, otherwise all products will be cleaned up. Defaults to False. This keyword is superceeded by overwrite.

products

array

Paths to files created by the model that should be cleaned up when the model exits. Entries can be absolute paths or paths relative to the working directory. Defaults to [].

repository_commit

string

Commit that should be checked out in the model repository specified by repository_url. If not provided, the most recent commit on the default branch will be used.

repository_url

string

URL for the git repository containing the model source code. If provided, relative paths in the model YAML definition will be considered relative to the repository root directory.

source_products

array

Files created by running the model that are source files. These files will be removed without checking their extension so users should avoid adding files to this list unless they are sure they should be deleted. Defaults to [].

strace_flags

array

Flags to pass to strace (or dtrace). Defaults to [].

timesync

array

If set, the model is assumed to call a send then receive of the state at each timestep for syncronization with other models that are also integrating in time. If a string is provided, it is assumed to be the name of the server that will handle timestep synchronization. If a boolean is provided, the name of the server will be assumed to be ‘timestep’. Defaults to False.

valgrind_flags

array

Flags to pass to valgrind. Defaults to [].

validation_command

string

Path to a validation command that can be used to verify that the model ran as expected. A non-zero return code is taken to indicate failure.

with_debugger

string

Debugger tool that should be used to run models. This string should include the tool executable and any flags that should be passed to it.

with_strace

boolean

If True, the command is run with strace (on Linux) or dtrace (on MacOS). Defaults to False.

with_valgrind

boolean

If True, the command is run with valgrind. Defaults to False.

Available Languages

Language

Description

Aliases

R

Model is written in R.

[‘r’]

c

Model is written in C.

c++

Model is written in C++.

[‘cpp’, ‘cxx’]

cmake

Model is written in C/C++ and has a CMake build system.

dummy

The programming language that the model is written in. A list of available languages can be found here.

executable

[DEFAULT] Model is an executable.

fortran

Model is written in Fortran.

julia

Model is written in Julia.

lpy

Model is an LPy system.

make

Model is written in C/C++ and has a Makefile for compilation.

matlab

Model is written in Matlab.

mpi

Model is being run on another MPI process and this driver is used as as stand-in to monitor it on the root process.

osr

Model is an OSR model.

python

Model is written in Python.

pytorch

Model is a PyTorch model

sbml

Model is an SBML model.

timesync

Model is dedicated to synchronizing timesteps between other models.

Language Specific Model Options

Option

Type

Valid For ‘Language’ Of

Description

additional_variables

object

[‘timesync’]

aggregation

[‘timesync’]

args

array

[‘python’, ‘cxx’, ‘R’, ‘lpy’, ‘executable’, ‘r’, ‘fortran’, ‘c’, ‘mpi’, ‘pytorch’, ‘make’, ‘cmake’, ‘osr’, ‘dummy’, ‘c++’, ‘matlab’, ‘sbml’, ‘cpp’, ‘julia’, ‘timesync’]

The path to the file containing the model program that will be run by the driver for the model’s language and/or a list of arguments that should be passed as input to the model program or language executable (e.g. source code or configuration file for a domain specific language).

builddir

string

[‘make’, ‘cmake’]

Directory where the build should be saved. Defaults to <sourcedir>/build. It can be relative to working_dir or absolute.

buildfile

string

[‘make’, ‘cmake’]

compiler

string

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]

Command or path to executable that should be used to compile the model. If not provided, the compiler will be determined based on configuration options for the language (if present) and the registered compilers that are available on the current operating system.

compiler_flags

array

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]

Flags that should be passed to the compiler during compilation. If nto provided, the compiler flags will be determined based on configuration options for the language (if present), the compiler defaults, and the default_compiler_flags class attribute.

configuration

string

[‘cmake’]

Build type/configuration that should be built. Defaults to ‘Release’.

copy_xml_to_osr

boolean

[‘osr’]

If True, the XML file(s) will be copied to the OSR repository InputFiles direcitory before running. This is necessary if the XML file(s) use any of the files located there since OSR always assumes the included file paths are relative. Defaults to False.

disable_python_c_api

boolean

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘osr’, ‘c++’]

If True, the Python C API will be disabled. Defaults to False.

driver

string

[‘python’, ‘cxx’, ‘R’, ‘lpy’, ‘executable’, ‘r’, ‘fortran’, ‘c’, ‘mpi’, ‘pytorch’, ‘make’, ‘cmake’, ‘osr’, ‘dummy’, ‘c++’, ‘matlab’, ‘sbml’, ‘cpp’, ‘julia’, ‘timesync’]

[DEPRECATED] Name of driver class that should be used.

env_compiler

string

[‘make’, ‘cmake’]

Environment variable where the compiler executable should be stored for use within the Makefile. If not provided, this will be determined by the target language driver.

env_compiler_flags

string

[‘make’, ‘cmake’]

Environment variable where the compiler flags should be stored (including those required to compile against the yggdrasil interface). If not provided, this will be determined by the target language driver.

env_linker

string

[‘make’, ‘cmake’]

Environment variable where the linker executable should be stored for use within the Makefile. If not provided, this will be determined by the target language driver.

env_linker_flags

string

[‘make’, ‘cmake’]

Environment variable where the linker flags should be stored (including those required to link against the yggdrasil interface). If not provided, this will be determined by the target language driver.

input_transform

function

[‘pytorch’]

Transformation that should be applied to input to get it into the format expected by the model (including transformation to pytorch tensors as necessary). This function should return a tuple of arguments for the model.

integrator

string

[‘sbml’]

Name of integrator that should be used. Valid options include [‘cvode’, ‘gillespie’, ‘rk4’, ‘rk45’]. Defaults to ‘cvode’.

integrator_settings

object

[‘sbml’]

Settings for the integrator. Defaults to empty dict.

interpolation

[‘timesync’]

interpreter

string

[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’]

Name or path of interpreter executable that should be used to run the model. If not provided, the interpreter will be determined based on configuration options for the language (if present) and the default_interpreter class attribute.

interpreter_flags

array

[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’]

Flags that should be passed to the interpreter when running the model. If not provided, the flags are determined based on configuration options for the language (if present) and the default_interpreter_flags class attribute.

linker

string

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]

Command or path to executable that should be used to link the model. If not provided, the linker will be determined based on configuration options for the language (if present) and the registered linkers that are available on the current operating system

linker_flags

array

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]

Flags that should be passed to the linker during compilation. If not provided, the linker flags will be determined based on configuration options for the language (if present), the linker defaults, and the default_linker_flags class attribute.

makedir

string

[‘make’]

Directory where make should be invoked from if it is not the same as the directory containing the makefile. Defaults to directory containing makefile if provided, otherwise working_dir.

makefile

string

[‘make’]

Path to make file either absolute, relative to makedir (if provided), or relative to working_dir. Defaults to Makefile.

only_output_final_step

boolean

[‘sbml’]

If True, only the final timestep is output. Defaults to False.

output_transform

function

[‘pytorch’]

Transformation that should be applied to model output to get it into a format that can be serialized by yggdrasil (i.e. not a pytorch Tensor or model sepecific type).

reset

boolean

[‘sbml’]

If True, the simulation will be reset to it’s initial values before each call (including the start time). Defaults to False.

selections

array

[‘sbml’]

Variables to include in the output. Defaults to None and the time/floating selections will be returned.

skip_interpreter

boolean

[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’]

If True, no interpreter will be added to the arguments. This should only be used for subclasses that will not be invoking the model via the command line. Defaults to False.

skip_start_time

boolean

[‘sbml’]

If True, the results for the initial time step will not be output. Defaults to False. This option is ignored if only_output_final_step is True.

source_files

array

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]

Source files that should be compiled into an executable. Defaults to an empty list and the driver will search for a source file based on the model executable (the first model argument).

sourcedir

string

[‘cmake’]

Source directory to call cmake on. If not provided it is set to working_dir. This should be the directory containing the CMakeLists.txt file. It can be relative to working_dir or absolute.

standard

string

[‘fortran’]

Fortran standard that should be used. Defaults to ‘f2003’.

start_time

number

[‘sbml’]

Time that simulation should be started from. If ‘reset’ is True, the start time will always be the provided value, otherwise, the start time will be the end of the previous call after the first call. Defaults to 0.0.

steps

integer

[‘sbml’]

Number of steps that should be output. Defaults to None.

sync_vars_in

array

[‘osr’]

Variables that should be synchronized from other models. Defaults to [].

sync_vars_out

array

[‘osr’]

Variables that should be synchronized to other models. Defaults to [].

synonyms

object

[‘timesync’]

Mapping from model names to mappings from base variables names to information about one or more alternate variable names used by the named model that should be converted to the base variable. Values for providing information about alternate variables can either be strings (implies equivalence with the base variable in everything but name and units) or mappings with the keys:

target

string

[‘make’, ‘cmake’]

Make target that should be built to create the model executable. Defaults to None.

target_compiler

string

[‘make’, ‘cmake’]

Compilation tool that should be used to compile the target language. Defaults to None and will be set based on the selected language driver.

target_compiler_flags

array

[‘make’, ‘cmake’]

Compilation flags that should be passed to the target language compiler. Defaults to [].

target_language

string

[‘make’, ‘cmake’]

Language that the target is written in. Defaults to None and will be set based on the source files provided.

target_linker

string

[‘make’, ‘cmake’]

Compilation tool that should be used to link the target language. Defaults to None and will be set based on the selected language driver.

target_linker_flags

array

[‘make’, ‘cmake’]

Linking flags that should be passed to the target language linker. Defaults to [].

update_interval

object

[‘osr’]

Max simulation interval at which synchronization should occur (in days). Defaults to 1.0 if not provided. If the XML input file loads additional export modules that output at a shorter rate, the existing table of values will be extrapolated.

use_symunit

boolean

[‘matlab’]

If True, input/output variables with units will be represented in Matlab using symunit. Defaults to False.

weights

string

[‘pytorch’]

Path to file where model weights are saved

with_asan

boolean

[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘osr’, ‘c++’]

If True, the model will be compiled and linked with the address sanitizer enabled (if there is one available for the selected compiler).

Input/Output Options

General Input/Output Comm Options

Option

Type

Required

Description

datatype

schema

X

JSON schema (with expanded core types defined by yggdrasil) that constrains the type of data that should be sent/received by this object. Defaults to {‘type’: ‘bytes’}. Additional information on specifying datatypes can be found here.

name

string

X

Name used for component in log messages.

address

string

Communication info. Default to None and address is taken from the environment variable.

as_array

boolean

[DEPRECATED] If True and the datatype is table-like, tables are sent/recieved with either columns rather than row by row. Defaults to False.

commtype

string

Communication mechanism that should be used. (Options described here)

default_file

object

Schema for file components.

default_value

any

Value that should be returned in the event that a yaml does not pair the comm with another model comm or a file.

dont_copy

boolean

If True, the comm will not be duplicated in the even a model is duplicated via the ‘copies’ parameter. Defaults to False except for in the case that a model is wrapped and the comm is inside the loop or that a model is a RPC input to a model server.

driver

string

[DEPRECATED] Name of driver class that should be used.

field_names

array

[DEPRECATED] Field names that should be used to label fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, field names are created based on the field order.

field_units

array

[DEPRECATED] Field units that should be used to convert fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, all fields are assumed to be unitless.

filter

object

Schema for filter components.

for_service

boolean

If True, this comm bridges the gap to an integration running as a service, possibly on a remote machine. Defaults to False.

format_str

string

String that should be used to format/parse messages. Default to None.

is_default

boolean

If True, this comm was created to handle all input/output variables to/from a model. Defaults to False. This variable is used internally and should not be set explicitly in the YAML.

length_map

object

Map from pointer variable names to the names of variables where their length will be stored. Defaults to {}.

onexit

string

[DEPRECATED] Method of input/output driver to call when the connection closes

outside_loop

boolean

If True, and the comm is an input/outputs to/from a model being wrapped. The receive/send calls for this comm will be outside the loop for the model. Defaults to False.

serializer

object

Schema for serializer components.

transform

array

One or more transformations that will be applied to messages that are sent/received. Ignored if not provided.

vars

array

Names of variables to be sent/received by this comm. Defaults to [].

working_dir

string

Working directory. If not provided, the current working directory is used.

Available Comm Types

Commtype

Description

buffer

Communication mechanism that should be used.

default

[DEFAULT] Communication mechanism selected based on the current platform.

ipc

Interprocess communication (IPC) queue.

mpi

MPI communicator.

rest

RESTful API.

rmq

RabbitMQ connection.

rmq_async

Asynchronous RabbitMQ connection.

value

Constant value.

zmq

ZeroMQ socket.

File Options

General File Options

Option

Type

Required

Description

name

string

X

Name used for component in log messages.

working_dir

string

X

Working directory. If not provided, the current working directory is used.

address

string

Communication info. Default to None and address is taken from the environment variable.

append

boolean

If True and writing, file is openned in append mode. If True and reading, file is kept open even if the end of the file is reached to allow for another process to write to the file in append mode. Defaults to False.

as_array

boolean

[DEPRECATED] If True and the datatype is table-like, tables are sent/recieved with either columns rather than row by row. Defaults to False.

count

integer

When reading a file, read the file this many of times. Defaults to 0.

driver

string

[DEPRECATED] Name of driver class that should be used.

field_names

array

[DEPRECATED] Field names that should be used to label fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, field names are created based on the field order.

field_units

array

[DEPRECATED] Field units that should be used to convert fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, all fields are assumed to be unitless.

filetype

string

The type of file that will be read from or written to. (Options described here)

filter

object

Schema for filter components.

for_service

boolean

If True, this comm bridges the gap to an integration running as a service, possibly on a remote machine. Defaults to False.

format_str

string

String that should be used to format/parse messages. Default to None.

in_temp

boolean

If True, the path will be considered relative to the platform temporary directory. Defaults to False.

is_series

boolean

If True, input/output will be done to a series of files. If reading, each file will be processed until the end is reached. If writing, each output will be to a new file in the series. The addressed is assumed to contain a format for the index of the file. Defaults to False.

length_map

object

Map from pointer variable names to the names of variables where their length will be stored. Defaults to {}.

onexit

string

[DEPRECATED] Method of input/output driver to call when the connection closes

transform

array

One or more transformations that will be applied to messages that are sent/received. Ignored if not provided.

vars

array

Names of variables to be sent/received by this comm. Defaults to [].

wait_for_creation

number

Time (in seconds) that should be waited before opening for the file to be created if it dosn’t exist. Defaults to 0 s and file will attempt to be opened immediately.

Available File Types

Filetype

Description

ascii

This file is read/written as encoded text one line at a time.

bam

bam sequence I/O

bcf

bcf sequence I/O

binary

[DEFAULT] The entire file is read/written all at once as bytes.

bmp

bmp image I/O

cabo

The file is a CABO parameter file.

cram

cram sequence I/O

eps

eps image I/O

excel

The file is read/written as Excel

fasta

fasta sequence I/O

fastq

fastq sequence I/O

gif

gif image I/O

jpeg

jpeg image I/O

json

The file contains a JSON serialized object.

map

The file contains a key/value mapping with one key/value pair per line and separated by some delimiter.

mat

The file is a Matlab .mat file containing one or more serialized Matlab variables.

netcdf

The file is read/written as netCDF.

obj

The file is in the Obj data format for 3D structures.

pandas

The file is a Pandas frame output as a table.

pickle

The file contains one or more pickled Python objects.

ply

The file is in the Ply data format for 3D structures.

png

png image I/O

sam

sam sequence I/O

table

The file is an ASCII table that will be read/written one row at a time. If as_array is True, the table will be read/written all at once.

tiff

tiff image I/O

vcf

vcf sequence I/O

yaml

The file contains a YAML serialized object.

File Type Specific Options

Option

Type

Valid For ‘Filetype’ Of

Description

args

string

[‘ply’, ‘mat’, ‘pickle’, ‘obj’, ‘map’, ‘ascii’, ‘binary’, ‘table’, ‘pandas’]

[DEPRECATED] Arguments that should be provided to the driver.

columns

array

[‘excel’]

Names of columns to read/write.

comment

string

[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’]

One or more characters indicating a comment. Defaults to ‘# ‘.

datatype

schema

[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’]

JSON schema defining the type of object that the serializer will be used to serialize/deserialize. Defaults to default_datatype.

default_flow_style

boolean

[‘yaml’]

If True, nested collections will be serialized in the block style. If False, they will always be serialized in the flow style. See PyYAML Documentation.

delimiter

string

[‘map’, ‘table’, ‘cabo’, ‘pandas’]

Delimiter that should be used to separate name/value pairs in the map. Defaults to t.

driver

string

[‘netcdf’, ‘gif’, ‘mat’, ‘bcf’, ‘obj’, ‘sam’, ‘fastq’, ‘binary’, ‘png’, ‘ply’, ‘excel’, ‘jpeg’, ‘ascii’, ‘json’, ‘yaml’, ‘tiff’, ‘cabo’, ‘cram’, ‘pickle’, ‘map’, ‘eps’, ‘bmp’, ‘bam’, ‘vcf’, ‘table’, ‘pandas’, ‘fasta’]

[DEPRECATED] Name of driver class that should be used.

encoding

string

[‘yaml’]

Encoding that should be used to serialize the object. Defaults to ‘utf-8’.

endcol

integer

[‘excel’]

Column to stop read at (non-inclusive).

endrow

integer

[‘excel’]

Row to stop read at (non-inclusive).

flush_on_write

boolean

[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]

If true, the file will be flushed when written to.

header

object

[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]

Header defining sequence identifiers. A header is required for writing SAM, BAM, and CRAM files.

indent

[‘string’, ‘int’]

[‘yaml’, ‘json’]

String or number of spaces that should be used to indent each level within the seiralized structure. Defaults to ‘t’.

index

string

[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]

Path to file containing index if different from the standard naming convention for BAM and CRAM files.

name

string

[‘netcdf’, ‘gif’, ‘mat’, ‘bcf’, ‘obj’, ‘sam’, ‘fastq’, ‘binary’, ‘png’, ‘ply’, ‘excel’, ‘jpeg’, ‘ascii’, ‘json’, ‘yaml’, ‘tiff’, ‘cabo’, ‘cram’, ‘pickle’, ‘map’, ‘eps’, ‘bmp’, ‘bam’, ‘vcf’, ‘table’, ‘pandas’, ‘fasta’]

Name used for component in log messages.

newline

string

[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’]

One or more characters indicating a newline. Defaults to ‘n’.

no_header

boolean

[‘pandas’]

If True, headers will not be read or serialized from/to tables. Defaults to False.

params

object

[‘gif’, ‘jpeg’, ‘eps’, ‘png’, ‘tiff’, ‘bmp’]

Parameters that should be based to the PIL.Image save/open command

piecemeal

boolean

[‘fastq’, ‘fasta’]

If possible read the the file incrementally in multiple messages. This should be used for large files that cannot be loaded into memory.

prune_duplicates

boolean

[‘obj’, ‘ply’]

If True, serialized meshes in array format will be pruned of duplicates when being normalized into a Ply object. If False, duplicates will not be pruned. Defaults to True.

read_attributes

boolean

[‘netcdf’]

If True, the attributes are read in as well as the variables. Defaults to False.

read_meth

string

[‘binary’]

Method that should be used to read data from the file. Defaults to ‘read’. Ignored if direction is ‘send’.

record_ids

array

[‘fastq’, ‘fasta’]

IDs of records to read/write. Other records will be ignored.

regions

array

[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]

Region parameters (reference name, start, and end) defining the regions that should be read. If not provided, all regions will be read.

serializer

[‘binary’]

Class with serialize and deserialize methods that should be used to process sent and received messages or a dictionary describing a serializer that obeys the serializer schema.

sheet_template

string

[‘excel’]

Format string that can be completed with % operator to generate names for each subsequent sheet when writing.

sheets

array

[‘excel’]

Name(s) of one more more sheets that should be read/written. If not provided during read, all sheets will be read.

sort_keys

boolean

[‘json’]

If True, the serialization of dictionaries will be in key sorted order. Defaults to True.

startcol

integer

[‘excel’]

Column to start read/write at.

startrow

integer

[‘excel’]

Row to start read/write at.

str_as_bytes

boolean

[‘excel’, ‘pandas’]

If true, strings in columns are read as bytes

use_astropy

boolean

[‘table’, ‘pandas’]

If True, the astropy package will be used to serialize/deserialize table. Defaults to False.

variables

array

[‘netcdf’]

List of variables to read in. If not provided, all variables will be read.

version

integer

[‘netcdf’]

Version of netCDF format that should be used. Defaults to 1. Options are 1 (classic format) and 2 (64-bit offset format).

Connection Options

General Connection Options

Option

Type

Required

Description

inputs

array

X

One or more name(s) of model output channel(s) and/or new channel/file objects that the connection should receive messages from. A full description of file entries and the available options can be found here.

outputs

array

X

One or more name(s) of model input channel(s) and/or new channel/file objects that the connection should send messages to. A full description of file entries and the available options can be found here.

args

string

[DEPRECATED] Arguments that should be provided to the driver.

connection_type

string

Connection between one or more comms/files and one or more comms/files. (Options described here)

driver

string

[DEPRECATED] Name of driver class that should be used.

input_pattern

string

The communication pattern that should be used to handle incoming messages when there is more than one input communicators present. Defaults to ‘cycle’. Options include: ‘cycle’: Receive from the next available input communicator. ‘gather’: Receive lists of messages with one element from each communicator where a message is only returned when there is a message from each.

onexit

string

Class method that should be called when a model that the connection interacts with exits, but before the connection driver is shut down. Defaults to None.

output_pattern

string

The communication pattern that should be used to handling outgoing messages when there is more than one output communicator present. Defaults to ‘broadcast’. Options include: ‘cycle’: Rotate through output comms, sending one message to each. ‘broadcast’: Send the same message to each comm. ‘scatter’: Send part of message (must be a list) to each comm.

read_meth

string

transform

array

Function or string specifying function that should be used to translate messages from the input communicator before passing them to the output communicator. If a string, the format should be “<package.module>:<function>” so that <function> can be imported from <package>. Defaults to None and messages are passed directly. This can also be a list of functions/strings that will be called on the messages in the order they are provided.

working_dir

string

Working directory. If not provided, the current working directory is used.

write_meth

string

Available Connection Types

Connection_Type

Description

connection

Connection between one or more comms/files and one or more comms/files.

file_input

Connection between a file and a model.

file_output

Connection between a model and a file.

input

Connection between one or more comms/files and a model.

output

Connection between a model and one or more comms/files.

rpc_request

Connection between one or more comms/files and one or more comms/files.

rpc_response

Connection between one or more comms/files and one or more comms/files.

Additional Options

In addition the the options above, there are several comm (channel/file) options that are also valid options for connections for convenience (i.e. at the level of the connection rather than as part of the connection’s input/output values). These options include:

Key

Description

format_str

A C-style format string specifying how messages should be formatted/parsed from/to language specifying types (see C-Style Format Strings).

field_names

A sequence collection of names for the fields present in the format string.

field_units

A sequence collection of units for the fields present in the format string (see Units).

as_array

True or False. If True and filetype is table, the table will be read in it’s entirety and passed as an array.

filetype

Only valid for connections that direct messages from a file to a model input channel or from a model output channel to a file. Values indicate how messages should be read from the file. See this table for a list of available file types.

Driver Method

For backwards compatibility, yggdrasil also allows connections to be specified using drivers. In this scheme, there is no connections section in the yaml(s). In specifying communication via drivers, each input/output entry for the models should be a mapping collection with, at minimum, the following keys:

Key

Description

name

The name of the channel that will be provided by the model to the yggdrasil API. This can be any text, but should be unique.

driver

The name of the input/output driver class that should be used. A list of available input/output drivers can be found here.

args

For connections made to other models, this should be text that matches that of the other model’s corresponding driver. For connections made to files, this should be the path to the file, relative to the location of the YAML file.

To make a connection between two models’ input and outputs, the values for their args key should match.

Any additional keys in the input/output entry will be passed to the input/output driver. A full description of the available input/output drivers and potential arguments can be found here.

In general, this method of specifying connections is not recommended.