YAML Files¶
Models and communication between models during are specified by the user in one or more YAML files. YAML files have a human readable structure that can be parsed by many different programming languages to recreate data structures. While the YAML language can express very complex data structures (more information can be found here), only a few key concepts are needed to create a YAML file for use with the yggdrasil framework.
Indentation: Entries with the same indentation belong to the same collection.
Sequences: Entries that begin with a dash and a space (- ) are members of a sequence collection. Members of a sequence can be text, collections, or a mix of both.
Mappings: Mapping entries use a colon and a space (: ) to seperate a
key: value
pair. Keys are text and values can be text or a collection.
Models¶
At the root level of a yggdrasil YAML, should be a mapping key model:
or models:
. This denotes information pertaining to the model(s) that should
be run. The value for this key can be a single model entry:
models:
name: modelA
language: python
args: ./src/gs_lesson4_modelA.py
or a sequence of model entries:
models:
- name: modelA
language: python
args: ./src/gs_lesson4_modelA.py
- name: modelB
language: c
args: ./src/gs_lesson4_modelB.c
Inputs and outputs to the models are then controlled via the input
/inputs
and/or output
/outputs
keys. Each input/output entry for the models
need only contain a unique name for the communication channel. This can be
specified as text:
models:
name: modelA
language: python
args: ./src/gs_lesson4_modelA.py
input: channel_name
or a key, value mapping:
models:
name: modelA
language: python
args: ./src/gs_lesson4_modelA.py
input:
name: channel_name
The key/value mapping form should be used when other information about the communication channel needs to be provided (e.g. message format, field names, units). (See Input/Output Options for information about the available options for communication channels).
Models can also contain more than one input and/or output:
models:
name: modelA
language: python
args: ./src/gs_lesson4_modelA.py
inputs:
- in_channel_name1
- in_channel_name2
outputs:
- out_channel_name1
- out_channel_name2
- out_channel_name3
Connections¶
In order to connect models inputs/outputs to files and/or other model
inputs/outputs, the yaml(s) must all contain a connections
key/value pair.
The coordesponding value for the connections
key should be one or more
mapping collection describing a connection entry. At a minimum each connection
entry should have one input key (input
, inputs
, input_file
) and
one output key (output
, outputs
, output_file
):
connections:
- input: out_channel_name1
output: in_channel_name1
- input: file1.txt
output: in_channel_name2
- inputs:
- out_channel_name2
- out_channel_name3
output: file2.txt
Key |
Description |
---|---|
input/input_file |
The channel/file that messages should be recieved from. To
specify a model channel, this should be the name of an
entry in a model’s |
output/output_file |
The channel/file that messages recieved from the |
Additional information about connection entries, including the full list of available options, can be found here.
The connection entries are used to determine which driver should be used to connect communication channels/files. Any additional keys in the connection entry will be passed to the input/output driver that is created for the connection.
Validation¶
yggdrasil uses a JSON schema to validate the provided
YAML specification files. If you would like to validate a set of YAML specification
files without running the integration, this can be done via the yggvalidate
CLI.:
$ yggvalidate name1.yml name2.yml ...
Model Options¶
General Model Options¶
Option |
Type |
Required |
Description |
---|---|---|---|
args |
array |
X |
The path to the file containing the model program that will be run by the driver for the model’s language and/or a list of arguments that should be passed as input to the model program or language executable (e.g. source code or configuration file for a domain specific language). |
inputs |
array |
X |
Zero or more channels carrying input to the model. A full description of channel entries and the options available for channels can be found here. |
name |
string |
X |
Name used for component in log messages. |
outputs |
array |
X |
Zero or more channels carrying output from the model. A full description of channel entries and the options available for channels can be found here. |
working_dir |
string |
X |
Working directory. If not provided, the current working directory is used. |
additional_dependencies |
object |
A mapping between languages and lists of packages in those languages that are required by the model. |
|
allow_threading |
boolean |
If True, comm connections will be set up so that the model-side comms can be used by more than one thread. Defaults to False. |
|
client_of |
array |
The names of one or more models that this model will call as a server. If there are more than one, this should be specified as a sequence collection (list). The corresponding channel(s) that should be passed to the yggdrasil API will be the name of the server model joined with the name of the client model with an underscore <server_model>_<client_model>. There will be one channel created for each server the model is a client of. Defaults to empty list. Use of client_of with function is not currently supported. |
|
contact_email |
string |
Email address that should be used to contact the maintainer of the model. This parameter is only used in the model repository. |
|
copies |
integer |
The number of copies of the model that should be created. Defaults to 1. |
|
dependencies |
array |
A list of packages required by the model that are written in the same language as the model. If the package requires dependencies outside the language of the model. use the additional_dependencies parameter to provide them. If you need a version of the package from a specific package manager, a mapping with ‘package’ and ‘package_manager’ fields can be provided instead of just the name of the package. |
|
description |
string |
Description of the model. This parameter is only used in the model repository or when providing the model as a service. |
|
driver |
string |
[DEPRECATED] Name of driver class that should be used. |
|
env |
object |
Dictionary of environment variables that should be set when the driver starts. Defaults to {}. |
|
function |
string |
If provided, an integrated model is created by wrapping the function named here. The function must be located within the file specified by the source file listed in the first argument. If not provided, the model must contain it’s own calls to the yggdrasil interface. |
|
is_server |
If True, the model is assumed to be a server for one or more client models and
an instance of |
||
iter_function_over |
array |
Variable(s) that should be received or sent as an array, but iterated over. Defaults to an empty array and is ignored. |
|
language |
string |
The programming language that the model is written in. A list of available languages can be found here. (Options described here) |
|
logging_level |
string |
The level of logging messages that should be displayed by the model. Defaults to the logging level as determined by the configuration file and environment variables. |
|
outputs_in_inputs |
boolean |
If True, outputs from wrapped model functions are passed by pointer as inputs for modification and the return value will be a flag. If False, outputs are limited to return values. Defaults to the value of the class attribute outputs_in_inputs. |
|
overwrite |
boolean |
If True, any existing model products (compilation products, wrapper scripts, etc.) are removed prior to the run. If False, the products are not removed. Defaults to True. Setting this to False can improve the performance, particularly for models that take a long time to compile, but this should only be done once the model has been fully debugged to ensure that each run is tested on a clean copy of the model. The value of this keyword also determines whether or not products are removed after a run. |
|
preserve_cache |
boolean |
If True model products will be kept following the run, otherwise all products will be cleaned up. Defaults to False. This keyword is superceeded by overwrite. |
|
products |
array |
Paths to files created by the model that should be cleaned up when the model exits. Entries can be absolute paths or paths relative to the working directory. Defaults to []. |
|
repository_commit |
string |
Commit that should be checked out in the model repository specified by repository_url. If not provided, the most recent commit on the default branch will be used. |
|
repository_url |
string |
URL for the git repository containing the model source code. If provided, relative paths in the model YAML definition will be considered relative to the repository root directory. |
|
source_products |
array |
Files created by running the model that are source files. These files will be removed without checking their extension so users should avoid adding files to this list unless they are sure they should be deleted. Defaults to []. |
|
strace_flags |
array |
Flags to pass to strace (or dtrace). Defaults to []. |
|
timesync |
array |
If set, the model is assumed to call a send then receive of the state at each timestep for syncronization with other models that are also integrating in time. If a string is provided, it is assumed to be the name of the server that will handle timestep synchronization. If a boolean is provided, the name of the server will be assumed to be ‘timestep’. Defaults to False. |
|
valgrind_flags |
array |
Flags to pass to valgrind. Defaults to []. |
|
validation_command |
string |
Path to a validation command that can be used to verify that the model ran as expected. A non-zero return code is taken to indicate failure. |
|
with_debugger |
string |
Debugger tool that should be used to run models. This string should include the tool executable and any flags that should be passed to it. |
|
with_strace |
boolean |
If True, the command is run with strace (on Linux) or dtrace (on MacOS). Defaults to False. |
|
with_valgrind |
boolean |
If True, the command is run with valgrind. Defaults to False. |
Available Languages¶
Language |
Description |
Aliases |
---|---|---|
R |
Model is written in R. |
[‘r’] |
c |
Model is written in C. |
|
c++ |
Model is written in C++. |
[‘cpp’, ‘cxx’] |
cmake |
Model is written in C/C++ and has a CMake build system. |
|
dummy |
The programming language that the model is written in. A list of available languages can be found here. |
|
executable |
[DEFAULT] Model is an executable. |
|
fortran |
Model is written in Fortran. |
|
julia |
Model is written in Julia. |
|
lpy |
Model is an LPy system. |
|
make |
Model is written in C/C++ and has a Makefile for compilation. |
|
matlab |
Model is written in Matlab. |
|
mpi |
Model is being run on another MPI process and this driver is used as as stand-in to monitor it on the root process. |
|
osr |
Model is an OSR model. |
|
python |
Model is written in Python. |
|
pytorch |
Model is a PyTorch model |
|
sbml |
Model is an SBML model. |
|
timesync |
Model is dedicated to synchronizing timesteps between other models. |
Language Specific Model Options¶
Option |
Type |
Valid For ‘Language’ Of |
Description |
---|---|---|---|
additional_variables |
object |
[‘timesync’] |
|
aggregation |
[‘timesync’] |
||
args |
array |
[‘python’, ‘cxx’, ‘R’, ‘lpy’, ‘executable’, ‘r’, ‘fortran’, ‘c’, ‘mpi’, ‘pytorch’, ‘make’, ‘cmake’, ‘osr’, ‘dummy’, ‘c++’, ‘matlab’, ‘sbml’, ‘cpp’, ‘julia’, ‘timesync’] |
The path to the file containing the model program that will be run by the driver for the model’s language and/or a list of arguments that should be passed as input to the model program or language executable (e.g. source code or configuration file for a domain specific language). |
builddir |
string |
[‘make’, ‘cmake’] |
Directory where the build should be saved. Defaults to <sourcedir>/build. It can be relative to working_dir or absolute. |
buildfile |
string |
[‘make’, ‘cmake’] |
|
compiler |
string |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’] |
Command or path to executable that should be used to compile the model. If not provided, the compiler will be determined based on configuration options for the language (if present) and the registered compilers that are available on the current operating system. |
compiler_flags |
array |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’] |
Flags that should be passed to the compiler during compilation. If nto provided, the compiler flags will be determined based on configuration options for the language (if present), the compiler defaults, and the default_compiler_flags class attribute. |
configuration |
string |
[‘cmake’] |
Build type/configuration that should be built. Defaults to ‘Release’. |
copy_xml_to_osr |
boolean |
[‘osr’] |
If True, the XML file(s) will be copied to the OSR repository InputFiles direcitory before running. This is necessary if the XML file(s) use any of the files located there since OSR always assumes the included file paths are relative. Defaults to False. |
disable_python_c_api |
boolean |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘osr’, ‘c++’] |
If True, the Python C API will be disabled. Defaults to False. |
driver |
string |
[‘python’, ‘cxx’, ‘R’, ‘lpy’, ‘executable’, ‘r’, ‘fortran’, ‘c’, ‘mpi’, ‘pytorch’, ‘make’, ‘cmake’, ‘osr’, ‘dummy’, ‘c++’, ‘matlab’, ‘sbml’, ‘cpp’, ‘julia’, ‘timesync’] |
[DEPRECATED] Name of driver class that should be used. |
env_compiler |
string |
[‘make’, ‘cmake’] |
Environment variable where the compiler executable should be stored for use within the Makefile. If not provided, this will be determined by the target language driver. |
env_compiler_flags |
string |
[‘make’, ‘cmake’] |
Environment variable where the compiler flags should be stored (including those required to compile against the yggdrasil interface). If not provided, this will be determined by the target language driver. |
env_linker |
string |
[‘make’, ‘cmake’] |
Environment variable where the linker executable should be stored for use within the Makefile. If not provided, this will be determined by the target language driver. |
env_linker_flags |
string |
[‘make’, ‘cmake’] |
Environment variable where the linker flags should be stored (including those required to link against the yggdrasil interface). If not provided, this will be determined by the target language driver. |
input_transform |
function |
[‘pytorch’] |
Transformation that should be applied to input to get it into the format expected by the model (including transformation to pytorch tensors as necessary). This function should return a tuple of arguments for the model. |
integrator |
string |
[‘sbml’] |
Name of integrator that should be used. Valid options include [‘cvode’, ‘gillespie’, ‘rk4’, ‘rk45’]. Defaults to ‘cvode’. |
integrator_settings |
object |
[‘sbml’] |
Settings for the integrator. Defaults to empty dict. |
interpolation |
[‘timesync’] |
||
interpreter |
string |
[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’] |
Name or path of interpreter executable that should be used to run the model. If not provided, the interpreter will be determined based on configuration options for the language (if present) and the default_interpreter class attribute. |
interpreter_flags |
array |
[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’] |
Flags that should be passed to the interpreter when running the model. If not provided, the flags are determined based on configuration options for the language (if present) and the default_interpreter_flags class attribute. |
linker |
string |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’] |
Command or path to executable that should be used to link the model. If not provided, the linker will be determined based on configuration options for the language (if present) and the registered linkers that are available on the current operating system |
linker_flags |
array |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’] |
Flags that should be passed to the linker during compilation. If not provided, the linker flags will be determined based on configuration options for the language (if present), the linker defaults, and the default_linker_flags class attribute. |
makedir |
string |
[‘make’] |
Directory where make should be invoked from if it is not the same as the directory containing the makefile. Defaults to directory containing makefile if provided, otherwise working_dir. |
makefile |
string |
[‘make’] |
Path to make file either absolute, relative to makedir (if provided), or relative to working_dir. Defaults to Makefile. |
only_output_final_step |
boolean |
[‘sbml’] |
If True, only the final timestep is output. Defaults to False. |
output_transform |
function |
[‘pytorch’] |
Transformation that should be applied to model output to get it into a format that can be serialized by yggdrasil (i.e. not a pytorch Tensor or model sepecific type). |
reset |
boolean |
[‘sbml’] |
If True, the simulation will be reset to it’s initial values before each call (including the start time). Defaults to False. |
selections |
array |
[‘sbml’] |
Variables to include in the output. Defaults to None and the time/floating selections will be returned. |
skip_interpreter |
boolean |
[‘matlab’, ‘sbml’, ‘python’, ‘pytorch’, ‘R’, ‘lpy’, ‘julia’, ‘dummy’, ‘timesync’, ‘r’] |
If True, no interpreter will be added to the arguments. This should only be used for subclasses that will not be invoking the model via the command line. Defaults to False. |
skip_start_time |
boolean |
[‘sbml’] |
If True, the results for the initial time step will not be output. Defaults to False. This option is ignored if only_output_final_step is True. |
source_files |
array |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘c++’] |
Source files that should be compiled into an executable. Defaults to an empty list and the driver will search for a source file based on the model executable (the first model argument). |
sourcedir |
string |
[‘cmake’] |
Source directory to call cmake on. If not provided it is set to working_dir. This should be the directory containing the CMakeLists.txt file. It can be relative to working_dir or absolute. |
standard |
string |
[‘fortran’] |
Fortran standard that should be used. Defaults to ‘f2003’. |
start_time |
number |
[‘sbml’] |
Time that simulation should be started from. If ‘reset’ is True, the start time will always be the provided value, otherwise, the start time will be the end of the previous call after the first call. Defaults to 0.0. |
steps |
integer |
[‘sbml’] |
Number of steps that should be output. Defaults to None. |
sync_vars_in |
array |
[‘osr’] |
Variables that should be synchronized from other models. Defaults to []. |
sync_vars_out |
array |
[‘osr’] |
Variables that should be synchronized to other models. Defaults to []. |
synonyms |
object |
[‘timesync’] |
Mapping from model names to mappings from base variables names to information about one or more alternate variable names used by the named model that should be converted to the base variable. Values for providing information about alternate variables can either be strings (implies equivalence with the base variable in everything but name and units) or mappings with the keys: |
target |
string |
[‘make’, ‘cmake’] |
Make target that should be built to create the model executable. Defaults to None. |
target_compiler |
string |
[‘make’, ‘cmake’] |
Compilation tool that should be used to compile the target language. Defaults to None and will be set based on the selected language driver. |
target_compiler_flags |
array |
[‘make’, ‘cmake’] |
Compilation flags that should be passed to the target language compiler. Defaults to []. |
target_language |
string |
[‘make’, ‘cmake’] |
Language that the target is written in. Defaults to None and will be set based on the source files provided. |
target_linker |
string |
[‘make’, ‘cmake’] |
Compilation tool that should be used to link the target language. Defaults to None and will be set based on the selected language driver. |
target_linker_flags |
array |
[‘make’, ‘cmake’] |
Linking flags that should be passed to the target language linker. Defaults to []. |
update_interval |
object |
[‘osr’] |
Max simulation interval at which synchronization should occur (in days). Defaults to 1.0 if not provided. If the XML input file loads additional export modules that output at a shorter rate, the existing table of values will be extrapolated. |
use_symunit |
boolean |
[‘matlab’] |
If True, input/output variables with units will be represented in Matlab using symunit. Defaults to False. |
weights |
string |
[‘pytorch’] |
Path to file where model weights are saved |
with_asan |
boolean |
[‘c’, ‘cxx’, ‘cpp’, ‘fortran’, ‘make’, ‘cmake’, ‘osr’, ‘c++’] |
If True, the model will be compiled and linked with the address sanitizer enabled (if there is one available for the selected compiler). |
Input/Output Options¶
General Input/Output Comm Options¶
Option |
Type |
Required |
Description |
---|---|---|---|
datatype |
schema |
X |
JSON schema (with expanded core types defined by yggdrasil) that constrains the type of data that should be sent/received by this object. Defaults to {‘type’: ‘bytes’}. Additional information on specifying datatypes can be found here. |
name |
string |
X |
Name used for component in log messages. |
address |
string |
Communication info. Default to None and address is taken from the environment variable. |
|
as_array |
boolean |
[DEPRECATED] If True and the datatype is table-like, tables are sent/recieved with either columns rather than row by row. Defaults to False. |
|
commtype |
string |
Communication mechanism that should be used. (Options described here) |
|
default_file |
object |
Schema for file components. |
|
default_value |
any |
Value that should be returned in the event that a yaml does not pair the comm with another model comm or a file. |
|
dont_copy |
boolean |
If True, the comm will not be duplicated in the even a model is duplicated via the ‘copies’ parameter. Defaults to False except for in the case that a model is wrapped and the comm is inside the loop or that a model is a RPC input to a model server. |
|
driver |
string |
[DEPRECATED] Name of driver class that should be used. |
|
field_names |
array |
[DEPRECATED] Field names that should be used to label fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, field names are created based on the field order. |
|
field_units |
array |
[DEPRECATED] Field units that should be used to convert fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, all fields are assumed to be unitless. |
|
filter |
object |
Schema for filter components. |
|
for_service |
boolean |
If True, this comm bridges the gap to an integration running as a service, possibly on a remote machine. Defaults to False. |
|
format_str |
string |
String that should be used to format/parse messages. Default to None. |
|
is_default |
boolean |
If True, this comm was created to handle all input/output variables to/from a model. Defaults to False. This variable is used internally and should not be set explicitly in the YAML. |
|
length_map |
object |
Map from pointer variable names to the names of variables where their length will be stored. Defaults to {}. |
|
onexit |
string |
[DEPRECATED] Method of input/output driver to call when the connection closes |
|
outside_loop |
boolean |
If True, and the comm is an input/outputs to/from a model being wrapped. The receive/send calls for this comm will be outside the loop for the model. Defaults to False. |
|
serializer |
object |
Schema for serializer components. |
|
transform |
array |
One or more transformations that will be applied to messages that are sent/received. Ignored if not provided. |
|
vars |
array |
Names of variables to be sent/received by this comm. Defaults to []. |
|
working_dir |
string |
Working directory. If not provided, the current working directory is used. |
Available Comm Types¶
Commtype |
Description |
---|---|
buffer |
Communication mechanism that should be used. |
default |
[DEFAULT] Communication mechanism selected based on the current platform. |
ipc |
Interprocess communication (IPC) queue. |
mpi |
MPI communicator. |
rest |
RESTful API. |
rmq |
RabbitMQ connection. |
rmq_async |
Asynchronous RabbitMQ connection. |
value |
Constant value. |
zmq |
ZeroMQ socket. |
File Options¶
General File Options¶
Option |
Type |
Required |
Description |
---|---|---|---|
name |
string |
X |
Name used for component in log messages. |
working_dir |
string |
X |
Working directory. If not provided, the current working directory is used. |
address |
string |
Communication info. Default to None and address is taken from the environment variable. |
|
append |
boolean |
If True and writing, file is openned in append mode. If True and reading, file is kept open even if the end of the file is reached to allow for another process to write to the file in append mode. Defaults to False. |
|
as_array |
boolean |
[DEPRECATED] If True and the datatype is table-like, tables are sent/recieved with either columns rather than row by row. Defaults to False. |
|
count |
integer |
When reading a file, read the file this many of times. Defaults to 0. |
|
driver |
string |
[DEPRECATED] Name of driver class that should be used. |
|
field_names |
array |
[DEPRECATED] Field names that should be used to label fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, field names are created based on the field order. |
|
field_units |
array |
[DEPRECATED] Field units that should be used to convert fields in sent/received tables. This keyword is only valid for table-like datatypes. If not provided, all fields are assumed to be unitless. |
|
filetype |
string |
The type of file that will be read from or written to. (Options described here) |
|
filter |
object |
Schema for filter components. |
|
for_service |
boolean |
If True, this comm bridges the gap to an integration running as a service, possibly on a remote machine. Defaults to False. |
|
format_str |
string |
String that should be used to format/parse messages. Default to None. |
|
in_temp |
boolean |
If True, the path will be considered relative to the platform temporary directory. Defaults to False. |
|
is_series |
boolean |
If True, input/output will be done to a series of files. If reading, each file will be processed until the end is reached. If writing, each output will be to a new file in the series. The addressed is assumed to contain a format for the index of the file. Defaults to False. |
|
length_map |
object |
Map from pointer variable names to the names of variables where their length will be stored. Defaults to {}. |
|
onexit |
string |
[DEPRECATED] Method of input/output driver to call when the connection closes |
|
transform |
array |
One or more transformations that will be applied to messages that are sent/received. Ignored if not provided. |
|
vars |
array |
Names of variables to be sent/received by this comm. Defaults to []. |
|
wait_for_creation |
number |
Time (in seconds) that should be waited before opening for the file to be created if it dosn’t exist. Defaults to 0 s and file will attempt to be opened immediately. |
Available File Types¶
Filetype |
Description |
---|---|
ascii |
This file is read/written as encoded text one line at a time. |
bam |
bam sequence I/O |
bcf |
bcf sequence I/O |
binary |
[DEFAULT] The entire file is read/written all at once as bytes. |
bmp |
bmp image I/O |
cabo |
The file is a CABO parameter file. |
cram |
cram sequence I/O |
eps |
eps image I/O |
excel |
The file is read/written as Excel |
fasta |
fasta sequence I/O |
fastq |
fastq sequence I/O |
gif |
gif image I/O |
jpeg |
jpeg image I/O |
json |
The file contains a JSON serialized object. |
map |
The file contains a key/value mapping with one key/value pair per line and separated by some delimiter. |
mat |
The file is a Matlab .mat file containing one or more serialized Matlab variables. |
netcdf |
The file is read/written as netCDF. |
obj |
The file is in the Obj data format for 3D structures. |
pandas |
The file is a Pandas frame output as a table. |
pickle |
The file contains one or more pickled Python objects. |
ply |
The file is in the Ply data format for 3D structures. |
png |
png image I/O |
sam |
sam sequence I/O |
table |
The file is an ASCII table that will be read/written one row at a time. If
|
tiff |
tiff image I/O |
vcf |
vcf sequence I/O |
yaml |
The file contains a YAML serialized object. |
File Type Specific Options¶
Option |
Type |
Valid For ‘Filetype’ Of |
Description |
---|---|---|---|
args |
string |
[‘ply’, ‘mat’, ‘pickle’, ‘obj’, ‘map’, ‘ascii’, ‘binary’, ‘table’, ‘pandas’] |
[DEPRECATED] Arguments that should be provided to the driver. |
columns |
array |
[‘excel’] |
Names of columns to read/write. |
comment |
string |
[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’] |
One or more characters indicating a comment. Defaults to ‘# ‘. |
datatype |
schema |
[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’] |
JSON schema defining the type of object that the serializer will be used to serialize/deserialize. Defaults to default_datatype. |
default_flow_style |
boolean |
[‘yaml’] |
If True, nested collections will be serialized in the block style. If False, they will always be serialized in the flow style. See PyYAML Documentation. |
delimiter |
string |
[‘map’, ‘table’, ‘cabo’, ‘pandas’] |
Delimiter that should be used to separate name/value pairs in the map. Defaults to t. |
driver |
string |
[‘netcdf’, ‘gif’, ‘mat’, ‘bcf’, ‘obj’, ‘sam’, ‘fastq’, ‘binary’, ‘png’, ‘ply’, ‘excel’, ‘jpeg’, ‘ascii’, ‘json’, ‘yaml’, ‘tiff’, ‘cabo’, ‘cram’, ‘pickle’, ‘map’, ‘eps’, ‘bmp’, ‘bam’, ‘vcf’, ‘table’, ‘pandas’, ‘fasta’] |
[DEPRECATED] Name of driver class that should be used. |
encoding |
string |
[‘yaml’] |
Encoding that should be used to serialize the object. Defaults to ‘utf-8’. |
endcol |
integer |
[‘excel’] |
Column to stop read at (non-inclusive). |
endrow |
integer |
[‘excel’] |
Row to stop read at (non-inclusive). |
flush_on_write |
boolean |
[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’] |
If true, the file will be flushed when written to. |
header |
object |
[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’] |
Header defining sequence identifiers. A header is required for writing SAM, BAM, and CRAM files. |
indent |
[‘string’, ‘int’] |
[‘yaml’, ‘json’] |
String or number of spaces that should be used to indent each level within the seiralized structure. Defaults to ‘t’. |
index |
string |
[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’] |
Path to file containing index if different from the standard naming convention for BAM and CRAM files. |
name |
string |
[‘netcdf’, ‘gif’, ‘mat’, ‘bcf’, ‘obj’, ‘sam’, ‘fastq’, ‘binary’, ‘png’, ‘ply’, ‘excel’, ‘jpeg’, ‘ascii’, ‘json’, ‘yaml’, ‘tiff’, ‘cabo’, ‘cram’, ‘pickle’, ‘map’, ‘eps’, ‘bmp’, ‘bam’, ‘vcf’, ‘table’, ‘pandas’, ‘fasta’] |
Name used for component in log messages. |
newline |
string |
[‘ply’, ‘cabo’, ‘mat’, ‘pickle’, ‘obj’, ‘yaml’, ‘map’, ‘ascii’, ‘json’, ‘table’, ‘pandas’] |
One or more characters indicating a newline. Defaults to ‘n’. |
no_header |
boolean |
[‘pandas’] |
If True, headers will not be read or serialized from/to tables. Defaults to False. |
params |
object |
[‘gif’, ‘jpeg’, ‘eps’, ‘png’, ‘tiff’, ‘bmp’] |
Parameters that should be based to the PIL.Image save/open command |
piecemeal |
boolean |
[‘fastq’, ‘fasta’] |
If possible read the the file incrementally in multiple messages. This should be used for large files that cannot be loaded into memory. |
prune_duplicates |
boolean |
[‘obj’, ‘ply’] |
If True, serialized meshes in array format will be pruned of duplicates when being normalized into a Ply object. If False, duplicates will not be pruned. Defaults to True. |
read_attributes |
boolean |
[‘netcdf’] |
If True, the attributes are read in as well as the variables. Defaults to False. |
read_meth |
string |
[‘binary’] |
Method that should be used to read data from the file. Defaults to ‘read’. Ignored if direction is ‘send’. |
record_ids |
array |
[‘fastq’, ‘fasta’] |
IDs of records to read/write. Other records will be ignored. |
regions |
array |
[‘bam’, ‘cram’, ‘bcf’, ‘vcf’, ‘sam’] |
Region parameters (reference name, start, and end) defining the regions that should be read. If not provided, all regions will be read. |
serializer |
[‘binary’] |
Class with serialize and deserialize methods that should be used to process sent and received messages or a dictionary describing a serializer that obeys the serializer schema. |
|
sheet_template |
string |
[‘excel’] |
Format string that can be completed with % operator to generate names for each subsequent sheet when writing. |
sheets |
array |
[‘excel’] |
Name(s) of one more more sheets that should be read/written. If not provided during read, all sheets will be read. |
sort_keys |
boolean |
[‘json’] |
If True, the serialization of dictionaries will be in key sorted order. Defaults to True. |
startcol |
integer |
[‘excel’] |
Column to start read/write at. |
startrow |
integer |
[‘excel’] |
Row to start read/write at. |
str_as_bytes |
boolean |
[‘excel’, ‘pandas’] |
If true, strings in columns are read as bytes |
use_astropy |
boolean |
[‘table’, ‘pandas’] |
If True, the astropy package will be used to serialize/deserialize table. Defaults to False. |
variables |
array |
[‘netcdf’] |
List of variables to read in. If not provided, all variables will be read. |
version |
integer |
[‘netcdf’] |
Version of netCDF format that should be used. Defaults to 1. Options are 1 (classic format) and 2 (64-bit offset format). |
Connection Options¶
General Connection Options¶
Option |
Type |
Required |
Description |
---|---|---|---|
inputs |
array |
X |
One or more name(s) of model output channel(s) and/or new channel/file objects that the connection should receive messages from. A full description of file entries and the available options can be found here. |
outputs |
array |
X |
One or more name(s) of model input channel(s) and/or new channel/file objects that the connection should send messages to. A full description of file entries and the available options can be found here. |
args |
string |
[DEPRECATED] Arguments that should be provided to the driver. |
|
connection_type |
string |
Connection between one or more comms/files and one or more comms/files. (Options described here) |
|
driver |
string |
[DEPRECATED] Name of driver class that should be used. |
|
input_pattern |
string |
The communication pattern that should be used to handle incoming messages when there is more than one input communicators present. Defaults to ‘cycle’. Options include: ‘cycle’: Receive from the next available input communicator. ‘gather’: Receive lists of messages with one element from each communicator where a message is only returned when there is a message from each. |
|
onexit |
string |
Class method that should be called when a model that the connection interacts with exits, but before the connection driver is shut down. Defaults to None. |
|
output_pattern |
string |
The communication pattern that should be used to handling outgoing messages when there is more than one output communicator present. Defaults to ‘broadcast’. Options include: ‘cycle’: Rotate through output comms, sending one message to each. ‘broadcast’: Send the same message to each comm. ‘scatter’: Send part of message (must be a list) to each comm. |
|
read_meth |
string |
||
transform |
array |
Function or string specifying function that should be used to translate messages from the input communicator before passing them to the output communicator. If a string, the format should be “<package.module>:<function>” so that <function> can be imported from <package>. Defaults to None and messages are passed directly. This can also be a list of functions/strings that will be called on the messages in the order they are provided. |
|
working_dir |
string |
Working directory. If not provided, the current working directory is used. |
|
write_meth |
string |
Available Connection Types¶
Connection_Type |
Description |
---|---|
connection |
Connection between one or more comms/files and one or more comms/files. |
file_input |
Connection between a file and a model. |
file_output |
Connection between a model and a file. |
input |
Connection between one or more comms/files and a model. |
output |
Connection between a model and one or more comms/files. |
rpc_request |
Connection between one or more comms/files and one or more comms/files. |
rpc_response |
Connection between one or more comms/files and one or more comms/files. |
Additional Options¶
In addition the the options above, there are several comm (channel/file) options that are also valid options for connections for convenience (i.e. at the level of the connection rather than as part of the connection’s input/output values). These options include:
Key |
Description |
---|---|
format_str |
A C-style format string specifying how messages should be formatted/parsed from/to language specifying types (see C-Style Format Strings). |
field_names |
A sequence collection of names for the fields present in the format string. |
field_units |
A sequence collection of units for the fields present in the format string (see Units). |
as_array |
True or False. If True and filetype is table, the table will be read in it’s entirety and passed as an array. |
filetype |
Only valid for connections that direct messages from a file to a model input channel or from a model output channel to a file. Values indicate how messages should be read from the file. See this table for a list of available file types. |
Driver Method¶
For backwards compatibility, yggdrasil also allows connections to be specified
using drivers. In this scheme, there is no connections
section in the yaml(s).
In specifying communication via drivers, each input/output entry for the models
should be a mapping collection with, at minimum, the following keys:
Key |
Description |
---|---|
name |
The name of the channel that will be provided by the model to the yggdrasil API. This can be any text, but should be unique. |
driver |
The name of the input/output driver class that should be used. A list of available input/output drivers can be found here. |
args |
For connections made to other models, this should be text that matches that of the other model’s corresponding driver. For connections made to files, this should be the path to the file, relative to the location of the YAML file. |
To make a connection between two models’ input and outputs, the values for their
args
key should match.
Any additional keys in the input/output entry will be passed to the input/output driver. A full description of the available input/output drivers and potential arguments can be found here.
In general, this method of specifying connections is not recommended.