YAML Files¶

Models and communication between models during are specified by the user in one or more YAML files. YAML files have a human readable structure that can be parsed by many different programming languages to recreate data structures. While the YAML language can express very complex data structures (more information can be found here), only a few key concepts are needed to create a YAML file for use with the yggdrasil framework.

Indentation: Entries with the same indentation belong to the same collection.
Sequences: Entries that begin with a dash and a space (- ) are members of a sequence collection. Members of a sequence can be text, collections, or a mix of both.
Mappings: Mapping entries use a colon and a space (: ) to seperate a key: value pair. Keys are text and values can be text or a collection.

Models¶

At the root level of a yggdrasil YAML, should be a mapping key model: or models:. This denotes information pertaining to the model(s) that should be run. The value for this key can be a single model entry:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py

or a sequence of model entries:

models:
  - name: modelA
    language: python
    args: ./src/gs_lesson4_modelA.py
  - name: modelB
    language: c
    args: ./src/gs_lesson4_modelB.c

Inputs and outputs to the models are then controlled via the input/inputs and/or output/outputs keys. Each input/output entry for the models need only contain a unique name for the communication channel. This can be specified as text:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py
  input: channel_name

or a key, value mapping:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py
  input:
    name: channel_name

The key/value mapping form should be used when other information about the communication channel needs to be provided (e.g. message format, field names, units). (See Input/Output Options for information about the available options for communication channels).

Models can also contain more than one input and/or output:

models:
  name: modelA
  language: python
  args: ./src/gs_lesson4_modelA.py
  inputs:
    - in_channel_name1
    - in_channel_name2
  outputs:
    - out_channel_name1
    - out_channel_name2
    - out_channel_name3

Connections¶

In order to connect models inputs/outputs to files and/or other model inputs/outputs, the yaml(s) must all contain a connections key/value pair. The coordesponding value for the connections key should be one or more mapping collection describing a connection entry. At a minimum each connection entry should have one input key (input, inputs, input_file) and one output key (output, outputs, output_file):

connections:
  - input: out_channel_name1
    output: in_channel_name1
  - input: file1.txt
    output: in_channel_name2
  - inputs:
      - out_channel_name2
      - out_channel_name3
    output: file2.txt

Key	Description
input/input_file	The channel/file that messages should be recieved from. To specify a model channel, this should be the name of an entry in a model’s `outputs` section. If this is a file, it should be the absolute path to the file or the relative path to the file from the directory containing the YAML.
output/output_file	The channel/file that messages recieved from the `input` channel/file should be sent to. If the `input` value is a file, the `output` value cannot be a file. To specify a model channel, this should be the name of an entry in a model’s `inputs` section.

Additional information about connection entries, including the full list of available options, can be found here.

The connection entries are used to determine which driver should be used to connect communication channels/files. Any additional keys in the connection entry will be passed to the input/output driver that is created for the connection.

Validation¶

yggdrasil uses a JSON schema to validate the provided YAML specification files. If you would like to validate a set of YAML specification files without running the integration, this can be done via the yggvalidate CLI.:

$ yggvalidate name1.yml name2.yml ...

Model Options¶

General Model Options¶

Option	Type	Required	Description
args	array	X	The path to the file containing the model program that will be run by the driver for the model’s language and/or a list of arguments that should be passed as input to the model program or language executable (e.g. source code or configuration file for a domain specific language).
inputs	array	X	Zero or more channels carrying input to the model. A full description of channel entries and the options available for channels can be found here.
name	string	X	Name used for component in log messages.
outputs	array	X	Zero or more channels carrying output from the model. A full description of channel entries and the options available for channels can be found here.
working_dir	string	X	Working directory. If not provided, the current working directory is used.
additional_dependencies	object		A mapping between languages and lists of packages in those languages that are required by the model.
allow_threading	boolean		If True, comm connections will be set up so that the model-side comms can be used by more than one thread. Defaults to False.
client_of	array		The names of one or more models that this model will call as a server. If there are more than one, this should be specified as a sequence collection (list). The corresponding channel(s) that should be passed to the yggdrasil API will be the name of the server model joined with the name of the client model with an underscore <server_model>_<client_model>. There will be one channel created for each server the model is a client of. Defaults to empty list. Use of client_of with function is not currently supported.
contact_email	string		Email address that should be used to contact the maintainer of the model. This parameter is only used in the model repository.
copies	integer		The number of copies of the model that should be created. Defaults to 1.
dependencies	array		A list of packages required by the model that are written in the same language as the model. If the package requires dependencies outside the language of the model. use the additional_dependencies parameter to provide them. If you need a version of the package from a specific package manager, a mapping with ‘package’ and ‘package_manager’ fields can be provided instead of just the name of the package.
description	string		Description of the model. This parameter is only used in the model repository or when providing the model as a service.
driver	string		[DEPRECATED] Name of driver class that should be used.
env	object		Dictionary of environment variables that should be set when the driver starts. Defaults to {}.
function	string		If provided, an integrated model is created by wrapping the function named here. The function must be located within the file specified by the source file listed in the first argument. If not provided, the model must contain it’s own calls to the yggdrasil interface.
is_server			If True, the model is assumed to be a server for one or more client models and an instance of `yggdrasil.drivers.ServerDriver` is started. The corresponding channel that should be passed to the yggdrasil API will be the name of the model. If is_server is a dictionary, it should contain an ‘input’ key and an ‘output’ key. These are required to be the names of existing input and output channels in the model that will be co-opted by the server. (Note: This requires that the co-opted output channel’s send method is called once for each time the co-opted input channel’s recv method is called. If used with the function parameter, is_server must be a dictionary. Defaults to False.
iter_function_over	array		Variable(s) that should be received or sent as an array, but iterated over. Defaults to an empty array and is ignored.
language	string		The programming language that the model is written in. A list of available languages can be found here. (Options described here)
logging_level	string		The level of logging messages that should be displayed by the model. Defaults to the logging level as determined by the configuration file and environment variables.
outputs_in_inputs	boolean		If True, outputs from wrapped model functions are passed by pointer as inputs for modification and the return value will be a flag. If False, outputs are limited to return values. Defaults to the value of the class attribute outputs_in_inputs.
overwrite	boolean		If True, any existing model products (compilation products, wrapper scripts, etc.) are removed prior to the run. If False, the products are not removed. Defaults to True. Setting this to False can improve the performance, particularly for models that take a long time to compile, but this should only be done once the model has been fully debugged to ensure that each run is tested on a clean copy of the model. The value of this keyword also determines whether or not products are removed after a run.
preserve_cache	boolean		If True model products will be kept following the run, otherwise all products will be cleaned up. Defaults to False. This keyword is superceeded by overwrite.
products	array		Paths to files created by the model that should be cleaned up when the model exits. Entries can be absolute paths or paths relative to the working directory. Defaults to [].
repository_commit	string		Commit that should be checked out in the model repository specified by repository_url. If not provided, the most recent commit on the default branch will be used.
repository_url	string		URL for the git repository containing the model source code. If provided, relative paths in the model YAML definition will be considered relative to the repository root directory.
source_products	array		Files created by running the model that are source files. These files will be removed without checking their extension so users should avoid adding files to this list unless they are sure they should be deleted. Defaults to [].
strace_flags	array		Flags to pass to strace (or dtrace). Defaults to [].
timesync	array		If set, the model is assumed to call a send then receive of the state at each timestep for syncronization with other models that are also integrating in time. If a string is provided, it is assumed to be the name of the server that will handle timestep synchronization. If a boolean is provided, the name of the server will be assumed to be ‘timestep’. Defaults to False.
valgrind_flags	array		Flags to pass to valgrind. Defaults to [].
validation_command	string		Path to a validation command that can be used to verify that the model ran as expected. A non-zero return code is taken to indicate failure.
with_debugger	string		Debugger tool that should be used to run models. This string should include the tool executable and any flags that should be passed to it.
with_strace	boolean		If True, the command is run with strace (on Linux) or dtrace (on MacOS). Defaults to False.
with_valgrind	boolean		If True, the command is run with valgrind. Defaults to False.

Available Languages¶

Language	Description	Aliases
R	Model is written in R.	[‘r’]
c	Model is written in C.
c++	Model is written in C++.	[‘cpp’, ‘cxx’]
cmake	Model is written in C/C++ and has a CMake build system.
dummy	The programming language that the model is written in. A list of available languages can be found here.
executable	[DEFAULT] Model is an executable.
fortran	Model is written in Fortran.
julia	Model is written in Julia.
lpy	Model is an LPy system.
make	Model is written in C/C++ and has a Makefile for compilation.
matlab	Model is written in Matlab.
mpi	Model is being run on another MPI process and this driver is used as as stand-in to monitor it on the root process.
osr	Model is an OSR model.
python	Model is written in Python.
pytorch	Model is a PyTorch model
sbml	Model is an SBML model.
timesync	Model is dedicated to synchronizing timesteps between other models.

Language Specific Model Options¶

Option	Type	Valid For ‘Language’ Of	Description
additional_variables	object	[‘timesync’]
aggregation		[‘timesync’]
args	array	[‘python’, ‘cxx’, ‘R’, ‘lpy’, ‘executable’, ‘r’, ‘fortran’, ‘c’, ‘mpi’, ‘pytorch’, ‘make’, ‘cmake’, ‘osr’, ‘dummy’, ‘c++’, ‘matlab’, ‘sbml’, ‘cpp’, ‘julia’, ‘timesync’]	The path to the file containing the model program that will be run by the driver for the model’s language and/or a list of arguments that should be passed as input to the model program or language executable (e.g. source code or configuration file for a domain specific language).
builddir	string	[‘make’, ‘cmake’]	Directory where the build should be saved. Defaults to <sourcedir>/build. It can be relative to working_dir or absolute.
buildfile	string	[‘make’, ‘cmake’]
compiler	string	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]	Command or path to executable that should be used to compile the model. If not provided, the compiler will be determined based on configuration options for the language (if present) and the registered compilers that are available on the current operating system.
compiler_flags	array	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]	Flags that should be passed to the compiler during compilation. If nto provided, the compiler flags will be determined based on configuration options for the language (if present), the compiler defaults, and the default_compiler_flags class attribute.
configuration	string	[‘cmake’]	Build type/configuration that should be built. Defaults to ‘Release’.
copy_xml_to_osr	boolean	[‘osr’]	If True, the XML file(s) will be copied to the OSR repository InputFiles direcitory before running. This is necessary if the XML file(s) use any of the files located there since OSR always assumes the included file paths are relative. Defaults to False.
disable_python_c_api	boolean	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘osr’, ‘c++’]	If True, the Python C API will be disabled. Defaults to False.
driver	string	[‘python’, ‘cxx’, ‘R’, ‘lpy’, ‘executable’, ‘r’, ‘fortran’, ‘c’, ‘mpi’, ‘pytorch’, ‘make’, ‘cmake’, ‘osr’, ‘dummy’, ‘c++’, ‘matlab’, ‘sbml’, ‘cpp’, ‘julia’, ‘timesync’]	[DEPRECATED] Name of driver class that should be used.
env_compiler	string	[‘make’, ‘cmake’]	Environment variable where the compiler executable should be stored for use within the Makefile. If not provided, this will be determined by the target language driver.
env_compiler_flags	string	[‘make’, ‘cmake’]	Environment variable where the compiler flags should be stored (including those required to compile against the yggdrasil interface). If not provided, this will be determined by the target language driver.
env_linker	string	[‘make’, ‘cmake’]	Environment variable where the linker executable should be stored for use within the Makefile. If not provided, this will be determined by the target language driver.
env_linker_flags	string	[‘make’, ‘cmake’]	Environment variable where the linker flags should be stored (including those required to link against the yggdrasil interface). If not provided, this will be determined by the target language driver.
input_transform	function	[‘pytorch’]	Transformation that should be applied to input to get it into the format expected by the model (including transformation to pytorch tensors as necessary). This function should return a tuple of arguments for the model.
integrator	string	[‘sbml’]	Name of integrator that should be used. Valid options include [‘cvode’, ‘gillespie’, ‘rk4’, ‘rk45’]. Defaults to ‘cvode’.
integrator_settings	object	[‘sbml’]	Settings for the integrator. Defaults to empty dict.
interpolation		[‘timesync’]
interpreter	string	[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’]	Name or path of interpreter executable that should be used to run the model. If not provided, the interpreter will be determined based on configuration options for the language (if present) and the default_interpreter class attribute.
interpreter_flags	array	[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’]	Flags that should be passed to the interpreter when running the model. If not provided, the flags are determined based on configuration options for the language (if present) and the default_interpreter_flags class attribute.
linker	string	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]	Command or path to executable that should be used to link the model. If not provided, the linker will be determined based on configuration options for the language (if present) and the registered linkers that are available on the current operating system
linker_flags	array	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]	Flags that should be passed to the linker during compilation. If not provided, the linker flags will be determined based on configuration options for the language (if present), the linker defaults, and the default_linker_flags class attribute.
makedir	string	[‘make’]	Directory where make should be invoked from if it is not the same as the directory containing the makefile. Defaults to directory containing makefile if provided, otherwise working_dir.
makefile	string	[‘make’]	Path to make file either absolute, relative to makedir (if provided), or relative to working_dir. Defaults to Makefile.
only_output_final_step	boolean	[‘sbml’]	If True, only the final timestep is output. Defaults to False.
output_transform	function	[‘pytorch’]	Transformation that should be applied to model output to get it into a format that can be serialized by yggdrasil (i.e. not a pytorch Tensor or model sepecific type).
reset	boolean	[‘sbml’]	If True, the simulation will be reset to it’s initial values before each call (including the start time). Defaults to False.
selections	array	[‘sbml’]	Variables to include in the output. Defaults to None and the time/floating selections will be returned.
skip_interpreter	boolean	[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’]	If True, no interpreter will be added to the arguments. This should only be used for subclasses that will not be invoking the model via the command line. Defaults to False.
skip_start_time	boolean	[‘sbml’]	If True, the results for the initial time step will not be output. Defaults to False. This option is ignored if only_output_final_step is True.
source_files	array	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’]	Source files that should be compiled into an executable. Defaults to an empty list and the driver will search for a source file based on the model executable (the first model argument).
sourcedir	string	[‘cmake’]	Source directory to call cmake on. If not provided it is set to working_dir. This should be the directory containing the CMakeLists.txt file. It can be relative to working_dir or absolute.
standard	string	[‘fortran’]	Fortran standard that should be used. Defaults to ‘f2003’.
start_time	number	[‘sbml’]	Time that simulation should be started from. If ‘reset’ is True, the start time will always be the provided value, otherwise, the start time will be the end of the previous call after the first call. Defaults to 0.0.
steps	integer	[‘sbml’]	Number of steps that should be output. Defaults to None.
sync_vars_in	array	[‘osr’]	Variables that should be synchronized from other models. Defaults to [].
sync_vars_out	array	[‘osr’]	Variables that should be synchronized to other models. Defaults to [].
synonyms	object	[‘timesync’]	Mapping from model names to mappings from base variables names to information about one or more alternate variable names used by the named model that should be converted to the base variable. Values for providing information about alternate variables can either be strings (implies equivalence with the base variable in everything but name and units) or mappings with the keys:
target	string	[‘make’, ‘cmake’]	Make target that should be built to create the model executable. Defaults to None.
target_compiler	string	[‘make’, ‘cmake’]	Compilation tool that should be used to compile the target language. Defaults to None and will be set based on the selected language driver.
target_compiler_flags	array	[‘make’, ‘cmake’]	Compilation flags that should be passed to the target language compiler. Defaults to [].
target_language	string	[‘make’, ‘cmake’]	Language that the target is written in. Defaults to None and will be set based on the source files provided.
target_linker	string	[‘make’, ‘cmake’]	Compilation tool that should be used to link the target language. Defaults to None and will be set based on the selected language driver.
target_linker_flags	array	[‘make’, ‘cmake’]	Linking flags that should be passed to the target language linker. Defaults to [].
update_interval	object	[‘osr’]	Max simulation interval at which synchronization should occur (in days). Defaults to 1.0 if not provided. If the XML input file loads additional export modules that output at a shorter rate, the existing table of values will be extrapolated.
use_symunit	boolean	[‘matlab’]	If True, input/output variables with units will be represented in Matlab using symunit. Defaults to False.
weights	string	[‘pytorch’]	Path to file where model weights are saved
with_asan	boolean	[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘osr’, ‘c++’]	If True, the model will be compiled and linked with the address sanitizer enabled (if there is one available for the selected compiler).

Input/Output Options¶

General Input/Output Comm Options¶

Option	Type	Required	Description
datatype	schema	X	JSON schema (with expanded core types defined by yggdrasil) that constrains the type of data that should be sent/received by this object. Defaults to {‘type’: ‘bytes’}. Additional information on specifying datatypes can be found here.
name	string	X	Name used for component in log messages.
address	string		Communication info. Default to None and address is taken from the environment variable.
as_array	boolean		[DEPRECATED] If True and the datatype is table-like, tables are sent/recieved with either columns rather than row by row. Defaults to False.
commtype	string		Communication mechanism that should be used. (Options described here)
default_file	object		Schema for file components.
default_value	any		Value that should be returned in the event that a yaml does not pair the comm with another model comm or a file.
dont_copy	boolean		If True, the comm will not be duplicated in the even a model is duplicated via the ‘copies’ parameter. Defaults to False except for in the case that a model is wrapped and the comm is inside the loop or that a model is a RPC input to a model server.
driver	string		[DEPRECATED] Name of driver class that should be used.
field_names	array		[DEPRECATED] Field names that should be used to label fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, field names are created based on the field order.
field_units	array		[DEPRECATED] Field units that should be used to convert fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, all fields are assumed to be unitless.
filter	object		Schema for filter components.
for_service	boolean		If True, this comm bridges the gap to an integration running as a service, possibly on a remote machine. Defaults to False.
format_str	string		String that should be used to format/parse messages. Default to None.
is_default	boolean		If True, this comm was created to handle all input/output variables to/from a model. Defaults to False. This variable is used internally and should not be set explicitly in the YAML.
length_map	object		Map from pointer variable names to the names of variables where their length will be stored. Defaults to {}.
onexit	string		[DEPRECATED] Method of input/output driver to call when the connection closes
outside_loop	boolean		If True, and the comm is an input/outputs to/from a model being wrapped. The receive/send calls for this comm will be outside the loop for the model. Defaults to False.
serializer	object		Schema for serializer components.
transform	array		One or more transformations that will be applied to messages that are sent/received. Ignored if not provided.
vars	array		Names of variables to be sent/received by this comm. Defaults to [].
working_dir	string		Working directory. If not provided, the current working directory is used.

Available Comm Types¶

Commtype	Description
buffer	Communication mechanism that should be used.
default	[DEFAULT] Communication mechanism selected based on the current platform.
ipc	Interprocess communication (IPC) queue.
mpi	MPI communicator.
rest	RESTful API.
rmq	RabbitMQ connection.
rmq_async	Asynchronous RabbitMQ connection.
value	Constant value.
zmq	ZeroMQ socket.

File Options¶

General File Options¶

Option	Type	Required	Description
name	string	X	Name used for component in log messages.
working_dir	string	X	Working directory. If not provided, the current working directory is used.
address	string		Communication info. Default to None and address is taken from the environment variable.
append	boolean		If True and writing, file is openned in append mode. If True and reading, file is kept open even if the end of the file is reached to allow for another process to write to the file in append mode. Defaults to False.
as_array	boolean		[DEPRECATED] If True and the datatype is table-like, tables are sent/recieved with either columns rather than row by row. Defaults to False.
count	integer		When reading a file, read the file this many of times. Defaults to 0.
driver	string		[DEPRECATED] Name of driver class that should be used.
field_names	array		[DEPRECATED] Field names that should be used to label fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, field names are created based on the field order.
field_units	array		[DEPRECATED] Field units that should be used to convert fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, all fields are assumed to be unitless.
filetype	string		The type of file that will be read from or written to. (Options described here)
filter	object		Schema for filter components.
for_service	boolean		If True, this comm bridges the gap to an integration running as a service, possibly on a remote machine. Defaults to False.
format_str	string		String that should be used to format/parse messages. Default to None.
in_temp	boolean		If True, the path will be considered relative to the platform temporary directory. Defaults to False.
is_series	boolean		If True, input/output will be done to a series of files. If reading, each file will be processed until the end is reached. If writing, each output will be to a new file in the series. The addressed is assumed to contain a format for the index of the file. Defaults to False.
length_map	object		Map from pointer variable names to the names of variables where their length will be stored. Defaults to {}.
onexit	string		[DEPRECATED] Method of input/output driver to call when the connection closes
transform	array		One or more transformations that will be applied to messages that are sent/received. Ignored if not provided.
vars	array		Names of variables to be sent/received by this comm. Defaults to [].
wait_for_creation	number		Time (in seconds) that should be waited before opening for the file to be created if it dosn’t exist. Defaults to 0 s and file will attempt to be opened immediately.

Available File Types¶

Filetype	Description
ascii	This file is read/written as encoded text one line at a time.
bam	bam sequence I/O
bcf	bcf sequence I/O
binary	[DEFAULT] The entire file is read/written all at once as bytes.
bmp	bmp image I/O
cabo	The file is a CABO parameter file.
cram	cram sequence I/O
eps	eps image I/O
excel	The file is read/written as Excel
fasta	fasta sequence I/O
fastq	fastq sequence I/O
gif	gif image I/O
jpeg	jpeg image I/O
json	The file contains a JSON serialized object.
map	The file contains a key/value mapping with one key/value pair per line and separated by some delimiter.
mat	The file is a Matlab .mat file containing one or more serialized Matlab variables.
netcdf	The file is read/written as netCDF.
obj	The file is in the Obj data format for 3D structures.
pandas	The file is a Pandas frame output as a table.
pickle	The file contains one or more pickled Python objects.
ply	The file is in the Ply data format for 3D structures.
png	png image I/O
sam	sam sequence I/O
table	The file is an ASCII table that will be read/written one row at a time. If `as_array` is `True`, the table will be read/written all at once.
tiff	tiff image I/O
vcf	vcf sequence I/O
yaml	The file contains a YAML serialized object.

File Type Specific Options¶

Option	Type	Valid For ‘Filetype’ Of	Description
args	string	[‘ply’, ‘mat’, ‘pickle’, ‘obj’, ‘map’, ‘ascii’, ‘binary’, ‘table’, ‘pandas’]	[DEPRECATED] Arguments that should be provided to the driver.
columns	array	[‘excel’]	Names of columns to read/write.
comment	string	[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’]	One or more characters indicating a comment. Defaults to ‘# ‘.
datatype	schema	[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’]	JSON schema defining the type of object that the serializer will be used to serialize/deserialize. Defaults to default_datatype.
default_flow_style	boolean	[‘yaml’]	If True, nested collections will be serialized in the block style. If False, they will always be serialized in the flow style. See PyYAML Documentation.
delimiter	string	[‘map’, ‘table’, ‘cabo’, ‘pandas’]	Delimiter that should be used to separate name/value pairs in the map. Defaults to t.
driver	string	[‘netcdf’, ‘gif’, ‘mat’, ‘bcf’, ‘obj’, ‘sam’, ‘fastq’, ‘binary’, ‘png’, ‘ply’, ‘excel’, ‘jpeg’, ‘ascii’, ‘json’, ‘yaml’, ‘tiff’, ‘cabo’, ‘cram’, ‘pickle’, ‘map’, ‘eps’, ‘bmp’, ‘bam’, ‘vcf’, ‘table’, ‘pandas’, ‘fasta’]	[DEPRECATED] Name of driver class that should be used.
encoding	string	[‘yaml’]	Encoding that should be used to serialize the object. Defaults to ‘utf-8’.
endcol	integer	[‘excel’]	Column to stop read at (non-inclusive).
endrow	integer	[‘excel’]	Row to stop read at (non-inclusive).
flush_on_write	boolean	[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]	If true, the file will be flushed when written to.
header	object	[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]	Header defining sequence identifiers. A header is required for writing SAM, BAM, and CRAM files.
indent	[‘string’, ‘int’]	[‘yaml’, ‘json’]	String or number of spaces that should be used to indent each level within the seiralized structure. Defaults to ‘t’.
index	string	[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]	Path to file containing index if different from the standard naming convention for BAM and CRAM files.
name	string	[‘netcdf’, ‘gif’, ‘mat’, ‘bcf’, ‘obj’, ‘sam’, ‘fastq’, ‘binary’, ‘png’, ‘ply’, ‘excel’, ‘jpeg’, ‘ascii’, ‘json’, ‘yaml’, ‘tiff’, ‘cabo’, ‘cram’, ‘pickle’, ‘map’, ‘eps’, ‘bmp’, ‘bam’, ‘vcf’, ‘table’, ‘pandas’, ‘fasta’]	Name used for component in log messages.
newline	string	[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’]	One or more characters indicating a newline. Defaults to ‘n’.
no_header	boolean	[‘pandas’]	If True, headers will not be read or serialized from/to tables. Defaults to False.
params	object	[‘gif’, ‘jpeg’, ‘eps’, ‘png’, ‘tiff’, ‘bmp’]	Parameters that should be based to the PIL.Image save/open command
piecemeal	boolean	[‘fastq’, ‘fasta’]	If possible read the the file incrementally in multiple messages. This should be used for large files that cannot be loaded into memory.
prune_duplicates	boolean	[‘obj’, ‘ply’]	If True, serialized meshes in array format will be pruned of duplicates when being normalized into a Ply object. If False, duplicates will not be pruned. Defaults to True.
read_attributes	boolean	[‘netcdf’]	If True, the attributes are read in as well as the variables. Defaults to False.
read_meth	string	[‘binary’]	Method that should be used to read data from the file. Defaults to ‘read’. Ignored if direction is ‘send’.
record_ids	array	[‘fastq’, ‘fasta’]	IDs of records to read/write. Other records will be ignored.
regions	array	[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’]	Region parameters (reference name, start, and end) defining the regions that should be read. If not provided, all regions will be read.
serializer		[‘binary’]	Class with serialize and deserialize methods that should be used to process sent and received messages or a dictionary describing a serializer that obeys the serializer schema.
sheet_template	string	[‘excel’]	Format string that can be completed with % operator to generate names for each subsequent sheet when writing.
sheets	array	[‘excel’]	Name(s) of one more more sheets that should be read/written. If not provided during read, all sheets will be read.
sort_keys	boolean	[‘json’]	If True, the serialization of dictionaries will be in key sorted order. Defaults to True.
startcol	integer	[‘excel’]	Column to start read/write at.
startrow	integer	[‘excel’]	Row to start read/write at.
str_as_bytes	boolean	[‘excel’, ‘pandas’]	If true, strings in columns are read as bytes
use_astropy	boolean	[‘table’, ‘pandas’]	If True, the astropy package will be used to serialize/deserialize table. Defaults to False.
variables	array	[‘netcdf’]	List of variables to read in. If not provided, all variables will be read.
version	integer	[‘netcdf’]	Version of netCDF format that should be used. Defaults to 1. Options are 1 (classic format) and 2 (64-bit offset format).

Connection Options¶

General Connection Options¶

Option	Type	Required	Description
inputs	array	X	One or more name(s) of model output channel(s) and/or new channel/file objects that the connection should receive messages from. A full description of file entries and the available options can be found here.
outputs	array	X	One or more name(s) of model input channel(s) and/or new channel/file objects that the connection should send messages to. A full description of file entries and the available options can be found here.
args	string		[DEPRECATED] Arguments that should be provided to the driver.
connection_type	string		Connection between one or more comms/files and one or more comms/files. (Options described here)
driver	string		[DEPRECATED] Name of driver class that should be used.
input_pattern	string		The communication pattern that should be used to handle incoming messages when there is more than one input communicators present. Defaults to ‘cycle’. Options include: ‘cycle’: Receive from the next available input communicator. ‘gather’: Receive lists of messages with one element from each communicator where a message is only returned when there is a message from each.
onexit	string		Class method that should be called when a model that the connection interacts with exits, but before the connection driver is shut down. Defaults to None.
output_pattern	string		The communication pattern that should be used to handling outgoing messages when there is more than one output communicator present. Defaults to ‘broadcast’. Options include: ‘cycle’: Rotate through output comms, sending one message to each. ‘broadcast’: Send the same message to each comm. ‘scatter’: Send part of message (must be a list) to each comm.
read_meth	string
transform	array		Function or string specifying function that should be used to translate messages from the input communicator before passing them to the output communicator. If a string, the format should be “<package.module>:<function>” so that <function> can be imported from <package>. Defaults to None and messages are passed directly. This can also be a list of functions/strings that will be called on the messages in the order they are provided.
working_dir	string		Working directory. If not provided, the current working directory is used.
write_meth	string

Available Connection Types¶

Connection_Type	Description
connection	Connection between one or more comms/files and one or more comms/files.
file_input	Connection between a file and a model.
file_output	Connection between a model and a file.
input	Connection between one or more comms/files and a model.
output	Connection between a model and one or more comms/files.
rpc_request	Connection between one or more comms/files and one or more comms/files.
rpc_response	Connection between one or more comms/files and one or more comms/files.

Additional Options¶

In addition the the options above, there are several comm (channel/file) options that are also valid options for connections for convenience (i.e. at the level of the connection rather than as part of the connection’s input/output values). These options include:

Key	Description
format_str	A C-style format string specifying how messages should be formatted/parsed from/to language specifying types (see C-Style Format Strings).
field_names	A sequence collection of names for the fields present in the format string.
field_units	A sequence collection of units for the fields present in the format string (see Units).
as_array	True or False. If True and filetype is table, the table will be read in it’s entirety and passed as an array.
filetype	Only valid for connections that direct messages from a file to a model input channel or from a model output channel to a file. Values indicate how messages should be read from the file. See this table for a list of available file types.

Driver Method¶

For backwards compatibility, yggdrasil also allows connections to be specified using drivers. In this scheme, there is no connections section in the yaml(s). In specifying communication via drivers, each input/output entry for the models should be a mapping collection with, at minimum, the following keys:

Key	Description
name	The name of the channel that will be provided by the model to the yggdrasil API. This can be any text, but should be unique.
driver	The name of the input/output driver class that should be used. A list of available input/output drivers can be found here.
args	For connections made to other models, this should be text that matches that of the other model’s corresponding driver. For connections made to files, this should be the path to the file, relative to the location of the YAML file.

To make a connection between two models’ input and outputs, the values for their args key should match.

Any additional keys in the input/output entry will be passed to the input/output driver. A full description of the available input/output drivers and potential arguments can be found here.

In general, this method of specifying connections is not recommended.