Skip to content

Commit

Permalink
Merge pull request #126 from galaxyproject/improved_docs
Browse files Browse the repository at this point in the history
Improved docs
  • Loading branch information
nuwang authored Mar 15, 2024
2 parents ddd4ddd + b913afd commit a99ec5a
Show file tree
Hide file tree
Showing 7 changed files with 399 additions and 175 deletions.
53 changes: 18 additions & 35 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,23 @@

.. centered:: Dynamic rules for routing Galaxy entities to destinations

TotalPerspectiveVortex (TPV) provides an installable set of dynamic rules for the
`Galaxy application`_ that can route entities (Tools, Users, Roles) to appropriate
destinations based on a configurable yaml file. The aim of TPV is to build on and
unify previous efforts, such as `Dynamic Tool Destinations`_, the `Job Router`_ and
`Sorting Hat`_, into a configurable set of rules that that can be extended arbitrarily
with custom Python logic.
TotalPerspectiveVortex (TPV) is a plugin for the `Galaxy application`_ that can route
entities (Tools, Users, Roles) to appropriate destinations with appropriate resource
alloations (cores, gpus, memory), based on a configurable yaml file. For example, it could
allocate 8 cores and 32GB of RAM to a bwa-mem job, and route it to a Slurm cluster, while
allocating 2 cores and 4GB of RAM to an upload job, and route it to a local runner. These
rules can also be shared community-wide, imported at runtime by any Galaxy deployment, and
overridden locally when necessary.

How it works
------------
TPV provides a dynamic rule that can be plugged into Galaxy via ``job_conf.yml``.
The dynamic rule will also have an associated configuration file, that maps entities
(tools, users, roles) to specific destination through a flexible tagging system.
Destinations can have arbitrary scheduling tags defined, and each entity can express a preference
or aversion to specific scheduling tags. Based on this tagging, jobs are routed to the most appropriate
destination. In addition, admins can also plugin arbitrary python based rules for making
more complex decisions, as well as custom ranking functions for choosing between matching
destinations. For example, a ranking function could query influx metrics to determine
the least loaded destination, and route jobs there, providing a basic form of
"metascheduling" functionality.
TPV can be plugged into Galaxy via ``job_conf.yml``. TPVs configuration file specifies how entities
(tools, users, roles) should be allocated resources (cores, gpus, memory) and in complex environments
with multiple job destinations, where to map the resulting jobs to (through a flexible
tagging system). Destinations can have arbitrary scheduling tags defined, and each entity can express a
preference or aversion to specific scheduling tags. This tagging affects how jobs are routed to
destinations. In addition, admins can also plugin arbitrary python based rules for making more complex
decisions, as well as custom ranking functions for choosing between matching destinations.

Shared database
---------------
Expand All @@ -29,34 +27,19 @@ A shared database of TPV rules are maintained in: https://github.com/galaxyproje
These rules are based on typical settings used in the usegalaxy.* federation, which you can override
based on local resource availability.

Getting Started
---------------

1. ``pip install total-perspective-vortex`` into Galaxy's python virtual environment
2. Configure Galaxy to use TPV's dynamic destination rule
3. Create the TPV job mapping yaml file, indicating job routing preferences
4. Submit jobs as usual

Standalone Installation
-----------------------

If you wish to install TPV outside of Galaxy's virtualenv (e.g. to use the ``tpv lint`` command locally or in a CI/CD
pipeline), use the ``[cli]`` pip requirement specifier to make sure the necessary Galaxy dependency packages are also
installed. **This should not be used in the Galaxy virtualenv**:

.. code-block:: console
$ pip install 'total-perspective-vortex[cli]'

.. toctree::
:maxdepth: 2
:caption: Contents:

topics/installation.rst
topics/tpv_by_example.rst
topics/advanced_topics.rst
topics/concepts.rst
topics/configure_galaxy.rst
topics/inner_workings.rst
topics/shell_commands.rst
topics/migration_guide.rst
topics/faq.rst

Indices and tables
==================
Expand Down
199 changes: 199 additions & 0 deletions docs/topics/advanced_topics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
###############
Advanced Topics
###############

Expressions
===========

Most TPV properties can be expressed as Python expressions. The rule of thumb is that all string expressions
are evaluated as python f-strings, and all integers or boolean expressions are evaluated as python code blocks.
For example, cpu, cores and mem are evaluated as python code blocks, as they evaluate to integer/float values.
However, env and params are evaluated as f-strings, as they result in string values. This is to improve the readability
and syntactic simplicity of TPV config files.

At the point of evaluating these functions, there is an evaluation context, which is a default set of variables
that are available to that expression. The following default variables are available to all expressions:

Default evaluation context
--------------------------
+----------+-----------------------------------------------------------------------------+
| Variable | Description |
+==========+=============================================================================+
| app | the Galaxy App object |
+----------+-----------------------------------------------------------------------------+
| tool | the Galaxy tool object |
+----------+-----------------------------------------------------------------------------+
| user | the current Galaxy user object |
+----------+-----------------------------------------------------------------------------+
| job | the Galaxy job object |
+----------+-----------------------------------------------------------------------------+
| mapper | the TPV mapper object, which can be used to access parsed TPV configs |
+----------+-----------------------------------------------------------------------------+
| entity | the TPV entity being currently evaluated. Can be a combined entity. |
+----------+-----------------------------------------------------------------------------+
| self | an alias for the current TPV entity. |
+----------+-----------------------------------------------------------------------------+

Custom evaluation contexts
---------------------------
These are user defined context values that can be defined globally, or locally at the level of each
entity. Any defined context value is available as a regular variable at the time the entity is evaluated.


Special evaluation contexts
---------------------------
In addition to the defaults above, additional context variables are available at different steps.

*gpu, core and mem expressions* - these are evaluated in order, and thus can be referred to in that same order.
For example, gpu expressions cannot refer to core and mem, as they have not been evaluated yet. cpu
expressions can be based on gpu values. mem expressions can refer to both cores and gpus.

*env and param expressions* - env expressions can be based on gpu, cores or mem. param expressions can additional
refer to evaluated env expressions.

*rank functions* - these can refer to all prior expressions, and are additional passed in a `candidate_destinations`
array, which is a list of matching TPV destinations.

Properties that do not support expressions
------------------------------------------

Some properties do not support expressions. These are primarily:

* max_accepted_cores, max_accepted_mem and max_accepted_gpus, which can only be defined on destinations. This is
because when a combined entity is matched with a destination, concrete values are required.
* tags defined on entities

Evaluation by expression type
-----------------------------

The simple rule of thumb here is that all string expressions are evaluated as python f-strings,
and all integers or boolean expressions are evaluated as python code blocks. If evaluated as an
f-string, the expressions must be a single line and must evaluate to a string. If evaluated as
a code-block, expressions may span multiple lines of arbitrary Python code, but the last line must
be an expression that evaluates to the expected return type (The return statement should not and cannot
be used)

+--------------------+---------------+----------------------+
| Field | Evaluated As | Expected type |
+====================+===============+======================+
| gpus | code block | float |
+--------------------+---------------+----------------------+
| cores | code block | float |
+--------------------+---------------+----------------------+
| mem | code block | float |
+--------------------+---------------+----------------------+
| env | f-strings | string |
+--------------------+---------------+----------------------+
| params | f-strings | string |
+--------------------+---------------+----------------------+
| min_gpus | code block | float |
+--------------------+---------------+----------------------+
| min_cores | code block | float |
+--------------------+---------------+----------------------+
| min_mem | code block | float |
+--------------------+---------------+----------------------+
| max_gpus | code block | float |
+--------------------+---------------+----------------------+
| max_cores | code block | float |
+--------------------+---------------+----------------------+
| max_mem | code block | float |
+--------------------+---------------+----------------------+
| rank | code block | list of destinations |
+--------------------+---------------+----------------------+
| context | not evaluated | string |
+--------------------+---------------+----------------------+
| scheduling tags | not evaluated | string |
+--------------------+---------------+----------------------+
| inherits | not evaluated | string |
+--------------------+---------------+----------------------+
| max_accepted_gpus | not evaluated | float |
+--------------------+---------------+----------------------+
| max_accepted_cores | not evaluated | float |
+--------------------+---------------+----------------------+
| max_accepted_mem | not evaluated | float |
+--------------------+---------------+----------------------+
| if | code block | bool |
+--------------------+---------------+----------------------+
| rules | not evaluated | list of rules |
+--------------------+---------------+----------------------+
| execute | code block | void |
+--------------------+---------------+----------------------+
| fail | f-string | string |
+--------------------+---------------+----------------------+
| resubmit | f-strings | string |
+--------------------+---------------+----------------------+


Scheduling
==========

TPV offers several mechanisms for controlling scheduling, all of which are optional.
In its simplest form, no scheduling constraints would be defined at all, in which case
the entity would schedule on the first available destination. Admins can use scheduling tags to exert additional control
over which destinations jobs can schedule on. Scheduling tags fall into one of four categories,
(required, preferred, accepted, rejected), ranging from indicating a requirement for a particular entity,
to indicating complete aversion.

+-----------+--------------------------------------------------------------------------------------------------------+
| Tag Type | Description |
+===========+========================================================================================================+
| require | required tags must match up for scheduling to occur. For example, if a tool is marked as requiring the |
| | `high-mem` tag, only destinations that are tagged as requiring, preferring or accepting the |
| | `high-mem` tag would be considering for scheduling. |
+-----------+--------------------------------------------------------------------------------------------------------+
| prefer | prefer tags are ranked higher that accept tags when scheduling decisions are made. |
+-----------+--------------------------------------------------------------------------------------------------------+
| accept | accept tags can be used to indicate that a entity can match up or support another entity, even |
| | if not preferentially. |
+-----------+--------------------------------------------------------------------------------------------------------+
| reject | reject tags cannot be present for scheduling to occur. For example, if a tool is marked as rejecting |
| | the `pulsar` tag, only destinations that do not have that tag are considered for scheduling. If two |
| | entities have the same reject tag, they still repel each other. |
+-----------+--------------------------------------------------------------------------------------------------------+


Scheduling tag compatibility table
----------------------------------

+------------+---------+--------+--------+--------+------------+
| Tag Type | Require | Prefer | Accept | Reject | Not Tagged |
+============+=========+========+========+========+============+
| Require ||||||
+------------+---------+--------+--------+--------+------------+
| Prefer ||||||
+------------+---------+--------+--------+--------+------------+
| Accept ||||||
+------------+---------+--------+--------+--------+------------+
| Reject ||||||
+------------+---------+--------+--------+--------+------------+
| Not Tagged ||||||
+------------+---------+--------+--------+--------+------------+


Scheduling by tag match
------------------------
Scheduling tags can be used to model anything from compatibility with a destination, to
permissions to execute a tool. (e.g. a tool can be tagged as requiring the "restricted"
tag, and users can be tagged as rejecting the "restricted" tag by default. Then, only users
who are specifically marked as requiring, tolerating, or preferring the "restricted" tag
can execute that tool. Of course, the destination must also be marked as not rejecting the
"restricted" tag.

Scheduling by rules
-------------------
Rules can be used to conditionally modify any entity requirement. Rules can be given an ID,
which can subsequently be used by an inheriting entity to override the rule. If no ID is
specified, a unique ID is generated, and the rule can no longer be overridden. Rules
are typically evaluated through an `if` clause, which specifies the logical condition under
which the rule matches. If the rule matches, cores, memory, scheduling tags etc. can be
specified to override inherited values. The special clause `fail` can be used to immediately
fail the job with an error message. The `execute` clause can be used to execute an arbitrary
code block on rule match.

Scheduling by custom ranking functions
--------------------------------------
The default rank function sorts destinations by scoring how well the tags match the job's requirements.
As this may often be too simplistic, the rank function can be overridden by specifying a custom
rank clause. The rank clause can contain an arbitrary code block, which can do the desired sorting,
for example by determining destination load by querying the job manager, influx statistics etc.
The final statement in the rank clause must be the list of sorted destinations.
Loading

0 comments on commit a99ec5a

Please sign in to comment.