diff --git a/.github/ISSUE_TEMPLATE/petab-extensions.md b/.github/ISSUE_TEMPLATE/petab-extensions.md new file mode 100644 index 00000000..ebe3a47b --- /dev/null +++ b/.github/ISSUE_TEMPLATE/petab-extensions.md @@ -0,0 +1,27 @@ +--- + +name: PEtab Extension +about: Suggest a new extension for PEtab core +title: '' +labels: file format +assignees: '' + +--- + +**Name of the Extension** +Please make sure that the extension name matches the regular expression `^[a-zA-Z_][\w-]*$`. + +**Which problem would you like to address?** +A clear and concise description of which use case you want to address and, if applicable, why the current specifications do not fulfill your requirements. + +**Describe the solution you would like** +A clear and concise description of the changes you want to propose. Please describe any additional fields / files you would want to add, including allowed inputs and implications. + +**Describe why this should not be implemented by changes to PEtab core** +A clear and concise description in what way the proposed changes introduce features that are orthogonal to the PEtab core specification. + +**List the extension library that implements validation checks** +A link to the website or github repository that accompanies the proposed extension. + +**List the toolboxes that support the proposed standard** +A link to the website or github repository that contains the software that implements support for the standard. diff --git a/CHANGELOG.md b/CHANGELOG.md index 9e8a28ea..cb7597d3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,13 @@ available at https://github.com/PEtab-dev/libpetab-python/. * Update tutorial.rst (#512) * Update how-to-cite (Closes #432) (#509) +## 0.2 series + +### 0.2.0 + +* Specify how PEtab functionality can be expanded through extensions. +* YAML files are now required for the specification of PEtab problems + ## 0.1 series ### 0.1.14 diff --git a/README.md b/README.md index 939ca9ee..0850c429 100644 --- a/README.md +++ b/README.md @@ -140,6 +140,8 @@ will have to: 1. Create a parameter table. +1. Create a yaml file that lists the model and all of the tables above. + If you are using Python, some handy functions of the [PEtab library](https://github.com/PEtab-dev/libpetab-python/) can help you with that. This includes also a PEtab validator called `petablint` which diff --git a/doc/_static/petab_schema.yaml b/doc/_static/petab_schema.yaml index bf012e57..78a5628d 100644 --- a/doc/_static/petab_schema.yaml +++ b/doc/_static/petab_schema.yaml @@ -6,8 +6,13 @@ description: PEtab parameter estimation problem config file schema properties: format_version: - type: integer - description: Version of the PEtab format (e.g. 1). + anyof: + - type: string + # (corresponding to PEP 440). + pattern: ^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$ + - type: integer + + description: Version of the PEtab format parameter_file: oneOf: @@ -17,7 +22,6 @@ properties: File name (absolute or relative) or URL to PEtab parameter table containing parameters of all models listed in `problems`. A single table may be split into multiple files and described as an array here. - problems: type: array description: | @@ -31,7 +35,7 @@ properties: type: object description: | A set of PEtab model, condition, observable and measurement - files and optional visualization files. + files and optional visualization and time courses files. properties: sbml_files: @@ -74,11 +78,44 @@ properties: type: string description: PEtab visualization file name or URL. + timecourse_files: + type: array + description: List of PEtab time courses files + + items: + type: string + description: PEtab time courses file name or URL. + required: - sbml_files - observable_files - measurement_files - - condition_files + + extensions: + type: object + description: | + PEtab extensions being used. + patternProperties: + "^[a-zA-Z][\\-\\w]*$": + + type: object + description: | + Information on a specific extension + properties: + version: + type: string + pattern: ^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$ + required: + type: bool + description: | + Indicates whether the extension is required for the + mathematical interpretation of problem. + required: + - version + - required + additionalProperties: true + + additionalProperties: false required: - format_version diff --git a/doc/development.rst b/doc/development.rst index 8c9f29c0..d2b79cbb 100644 --- a/doc/development.rst +++ b/doc/development.rst @@ -192,6 +192,56 @@ Upon a new release, the PEtab editors ensure that * the new version of the specifications is deposited at Zenodo * the new release is announced on the PEtab mailing list +PEtab Extensions +---------------- + +An elaborate, monolithic format would make it difficult to understand and +implement support for PEtab, leading to a steep learning curve and discouraging +support in new toolboxes. To address this issue, the PEtab format is modular and +permits modifications through extensions that complement the core standard. +This modular specification evens the learning curve and provides toolbox +developers with more guidance on which features to implement to maximize +support for real world applications. Moreover, such modular extensions +facilitate and promote the use of specialized tools for specific, non-parameter +estimation tasks such as visualization. + +Requirements for new extensions: + +* Specifications in PEtab extensions take precedence over PEtab core, i.e., they + can ease or refine format restrictions imposed by PEtab core. +* PEtab extensions should extend PEtab core with new orthogonal features or + tasks, i.e., they should not make trivial changes to PEtab core. +* PEtab extensions must be named according to ^[a-zA-Z][\w\-]*$ +* PEtab extensions must be versioned using semantic versioning. +* PEtab extensions required for interpretation of a problem specification must + be specified in the PEtab-YAML files +* There is at least one tool that supports the proposed extension +* The authors provide a library that provides test cases and implements + validation checks for the proposed format. + +Developers are free to develop any PEtab extension. To become an official +PEtab extension, it needs to go through the following process. + +#. The developers write a proposal describing the motivation and specification + of the extension, following the respective issue template provided in this + repository. +#. The proposal is submitted as an issue in this repository. +#. The technical specification and documentation of the extension is submitted + as a pull request in this repository that references the respective issue. + +The PEtab editors jointly decide whether an extension meets the requirements +described here. In case of a positive evaluation, they announce a poll for the +acceptance as official extension to the PEtab forum. All members of the PEtab +community are eligible to vote. If at least 50% of the votes are in favor, +the extension is accepted and the respective pull requests with specifications, +documentation and test cases are merged. There is no quorum number of votes +for acceptance. + +It is encouraged that extensions are informally discussed with the community +before initiating the process of becoming an official extension. Such +discussions can be conducted through the communication channels mentioned +above. + Versioning of the PEtab format ------------------------------ diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index 97795dfb..4dd8c085 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -50,20 +50,27 @@ and - A measurement file to fit the model to [TSV] -- A condition file specifying model inputs and condition-specific parameters +- (optional) A condition file specifying model inputs and condition-specific parameters [TSV] - An observable file specifying the observation model [TSV] -- A parameter file specifying optimization parameters and related information +- A parameter file specifying estimateable parameters and related information [TSV] +- A grouping file that lists all of the files and provides additional information + including employed extensions [YAML] + - (optional) A simulation file, which has the same format as the measurement file, but contains model simulations [TSV] - (optional) A visualization file, which contains specifications how the data and/or simulations should be plotted by the visualization routines [TSV] +- (optional) A timecourses file, which describes a sequence of different + experimental conditions that are applied to the model [TSV] + + .. figure:: gfx/petab_files.png :alt: Files constituting a PEtab problem @@ -79,7 +86,7 @@ defining the parameter estimation problem. Extensions of this format (e.g. additional columns in the measurement table) are possible and intended. However, while those columns may provide extra information for example for plotting, downstream analysis, or for more -efficient parameter estimation, they should not affect the optimization +efficient parameter estimation, they should not affect the estimation problem as such. **General remarks** @@ -151,10 +158,10 @@ Detailed field description - ``${speciesId}`` If a species ID is provided, it is interpreted as the initial - condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` - for the respective species, as concentration otherwise) and will override the + condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` + for the respective species, as concentration otherwise) and will override the initial condition given in the SBML model or given by a preequilibration - condition. If ``NaN`` is provided for a condition, the result of the + condition. If no value is provided for a condition, the result of the preequilibration (or initial condition from the SBML model, if no preequilibration is defined) is used. @@ -163,6 +170,56 @@ Detailed field description If a compartment ID is provided, it is interpreted as the initial compartment size. + - `expressions` + + Expressions containing more than a single parameter ID or numberical + value are allowed. Any model entity Id in the condition table will be interpreted as + the value of that model entity at the last time point before + changing to the condition represented by the current row (similar + to an SBML event with ``useValuesFromTriggerTime=True``). The first + condition of any timecourse may only refer to parameter IDs that + are listed in the parameter table, but not to any other model + entity (This is because there is no “last timepoint” before + changing to this first condition.) For example + + - given a timecourse ``0:condition1;10:condition2`` and two constant + model parameters ``par1``, ``par2`` and the two conditions: + + - ``condition1``: {``par1=0.1``, ``par2=0.2``} + - ``condition2``: {``par1=par2``, ``par2=par1``} + + This is okay, since no circular dependencies exist: ``par1 = 0.2``, ``par2=0.1`` + + - given a ``timecourse 0:condition1`` and two model parameters + ``par1``, ``par2`` with only a single condition: + + - ``condition1``: {``par1=par2``, ``par2=par1``} + + This is not allowed, in the first condition of the timecourse ``par1``, ``par2`` + cannot be used in the right-hand side of the assignment + + - Given a condition: ``condition1``: {``par1=par3``, ``par2=2*par3``} + + This is allowed. + + Condition changes should be implemented to respect the dependency + graph between model components: + + - When a condition changes quantity ``A`` and ``B``, and ``B`` is dependent on + ``A``, the change in quantity A should be applied first such that the + new value for ``B`` is consistent with what is specified in the + condition. + + - For example, concentrations are generally dependent on volume + i.e. when a model compartment volume changes, the concentrations + of all species in that compartment change too, because mass is + usually conserved. In this case, if a condition change involves a + change in both a compartment volume and a species concentration, + then the compartment change should be applied first. Otherwise, + the species concentration after the condition is applied, will not + match the concentration specified by the user, because it would be + modified by the volume change. + Measurement table ----------------- @@ -173,13 +230,13 @@ model training or validation. Expected to have the following named columns in any (but preferably this) order: -+--------------+-------------------------------+-----------------------+-------------+--------------+ -| observableId | [preequilibrationConditionId] | simulationConditionId | measurement | time | -+==============+===============================+=======================+=============+==============+ -| observableId | [conditionId] | conditionId | NUMERIC | NUMERIC\|inf | -+--------------+-------------------------------+-----------------------+-------------+--------------+ -| ... | ... | ... | ... | ... | -+--------------+-------------------------------+-----------------------+-------------+--------------+ ++--------------+--------------+-------------+--------------+ +| observableId | timecourseId | measurement | time | ++==============+==============+=============+==============+ +| observableId | timecourseId | NUMERIC | NUMERIC\|inf | ++--------------+--------------+-------------+--------------+ +| ... | ... | ... | ... | ++--------------+--------------+-------------+--------------+ *(wrapped for readability)* @@ -212,17 +269,7 @@ Detailed field description - ``observableId`` [STRING, NOT NULL, REFERENCES(observables.observableID)] - Observable ID as defined in the observables table described below. - -- ``preequilibrationConditionId`` [STRING OR NULL, REFERENCES(conditionsTable.conditionID), OPTIONAL] - - The ``conditionId`` to be used for preequilibration. E.g. for drug - treatments, the model would be preequilibrated with the no-drug condition. - Empty for no preequilibration. - -- ``simulationConditionId`` [STRING, NOT NULL, REFERENCES(conditionsTable.conditionID)] - - ``conditionId`` as provided in the condition table, specifying the condition-specific parameters used for simulation. + Observable ID as defined in the observable table described below. - ``measurement`` [NUMERIC, NOT NULL] @@ -232,6 +279,12 @@ Detailed field description Time point of the measurement in the time unit specified in the SBML model, numeric value or ``inf`` (lower-case) for steady-state measurements. +- ``timecourseId`` [STRING, NOT NULL, REFERENCES(timecoursesTable.timecourseID)] + + Timecourse ID as defined in the time courses table described below. This column may + have ``NA`` values, which are interpreted as *use the model as is*. + This avoids the need for “dummy” conditions and timecourses. + - ``observableParameters`` [NUMERIC, STRING OR NULL, OPTIONAL] This field allows overriding or introducing condition-specific versions of @@ -248,7 +301,7 @@ Detailed field description Different lines for the same ``observableId`` may specify different parameters. This may be used to account for condition-specific or - batch-specific parameters. This will translate into an extended optimization + batch-specific parameters. This will translate into an extended estimation parameter vector. All placeholders defined in the observation model must be overwritten here. @@ -256,7 +309,7 @@ Detailed field description - ``noiseParameters`` [NUMERIC, STRING OR NULL, OPTIONAL] - The measurement standard deviation or ``NaN`` if the corresponding sigma is a + The measurement standard deviation or empty if the corresponding sigma is a model parameter. Numeric values or parameter names are allowed. Same rules apply as for @@ -277,8 +330,8 @@ Detailed field description ``datasetId``, which is helpful for plotting e.g. error bars. -Observables table ------------------ +Observable table +---------------- Parameter estimation requires linking experimental observations to the model of interest. Therefore, one needs to define observables (model outputs) and @@ -498,23 +551,36 @@ Detailed field description Scale of the parameter to be used during parameter estimation. + ``lin`` + Use the parameter value, ``lowerBound``, ``upperBound``, and + ``nominalValue`` without transformation. + ``log`` + Take the natural logarithm of the parameter value, ``lowerBound``, + ``upperBound``, and ``nominalValue`` during parameter estimation. + ``log10`` + Take the logarithm to base 10 of the parameter value, ``lowerBound``, + ``upperBound``, and ``nominalValue`` during parameter estimation. + - ``lowerBound`` [NUMERIC] - Lower bound of the parameter used for optimization. + Lower bound of the parameter used for estimation. Optional, if ``estimate==0``. - Must be provided in linear space, independent of ``parameterScale``. + The provided value should be untransformed, as it will be transformed + according to ``parameterScale`` during parameter estimation. - ``upperBound`` [NUMERIC] - Upper bound of the parameter used for optimization. + Upper bound of the parameter used for estimation. Optional, if ``estimate==0``. - Must be provided in linear space, independent of ``parameterScale``. + The provided value should be untransformed, as it will be transformed + according to ``parameterScale`` during parameter estimation. - ``nominalValue`` [NUMERIC] Some parameter value to be used if the parameter is not subject to estimation (see ``estimate`` below). - Must be provided in linear space, independent of ``parameterScale``. + The provided value should be untransformed, as it will be transformed + according to ``parameterScale`` during parameter estimation. Optional, unless ``estimate==0``. - ``estimate`` [BOOL 0|1] @@ -524,7 +590,7 @@ Detailed field description - ``initializationPriorType`` [STRING, OPTIONAL] - Prior types used for sampling of initial points for optimization. Sampled + Prior types used for sampling of initial points for estimation. Sampled points are clipped to lie inside the parameter boundaries specified by ``lowerBound`` and ``upperBound``. Defaults to ``parameterScaleUniform``. @@ -542,7 +608,7 @@ Detailed field description - ``initializationPriorParameters`` [STRING, OPTIONAL] - Prior parameters used for sampling of initial points for optimization, + Prior parameters used for sampling of initial points for estimation, separated by a semicolon. Defaults to ``lowerBound;upperBound``. The parameters are expected to be in linear scale except for the ``parameterScale`` priors, where the prior parameters are expected to be @@ -562,12 +628,12 @@ Detailed field description - ``objectivePriorType`` [STRING, OPTIONAL] - Prior types used for the objective function during optimization or sampling. + Prior types used for the objective function during estimation. For possible values, see ``initializationPriorType``. - ``objectivePriorParameters`` [STRING, OPTIONAL] - Prior parameters used for the objective function during optimization. + Prior parameters used for the objective function during estimation. For more detailed documentation, see ``initializationPriorParameters``. @@ -686,8 +752,11 @@ Detailed field description Extensions ~~~~~~~~~~ -Additional columns, such as ``Color``, etc. may be specified. - +Additional columns, such as ``Color``, etc. may be specified. Extensions +that define operations on multiple PEtab problems need to employ a single +PEtab YAML file as entrypoint to the analysis. This PEtab file may leave all +fields specifying files empty and reference the other PEtab problems in the +extension specific fields. Examples ~~~~~~~~ @@ -704,7 +773,7 @@ To link the SBML model, measurement table, condition table, etc. in an unambiguous way, we use a `YAML `_ file. This file also allows specifying a PEtab version (as the format is not unlikely -to change in the future). +to change in the future) and employed PEtab extensions. Furthermore, this can be used to describe parameter estimation problems comprising multiple models (more details below). @@ -722,3 +791,422 @@ allows to specify multiple SBML models with corresponding condition and measurement tables, and one joint parameter table. This means that the parameter namespace is global. Therefore, parameters with the same ID in different models will be considered identical. + + +Timecourses table +----------------- + +The optional time courses tabke describes a sequence of different experimental +conditions (here: discrete changes) that are applied to the model. + +This is specified as a tab-separated value file in the following way: + ++--------------------+-------------------------------------------------+ +| timecourseId | timecourse | ++====================+=================================================+ +| STRING | STRING | ++--------------------+-------------------------------------------------+ +| timecourse_1 | 0:condition_1;10:condition_2;250:condition_3 | ++--------------------+-------------------------------------------------+ +| patient_3 | -inf:condition_1;0:condition_2 | ++--------------------+-------------------------------------------------+ +| i | -20: | +| ntervention_effect | no_lockdown;20:mild_lockdown;40:severe_lockdown | ++--------------------+-------------------------------------------------+ + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The time courses table with two mandatory columns ``timecourseId`` and +``timecourse``: + +- ``timecourseId`` [STRING, NOT NULL] + + Identifier of the timecourse. The usual PEtab identifier requirements apply. + +- ``timecourse``: [STRING, NOT NULL] + + A semicolon-separated list of different phases of the experiment along with + their starting time. A value in the ``timecourse`` column takes the format + ``[TIMEPOINT:CONDITION_ID;...]``. + + ``TIMEPOINT`` can be: + + - ``-inf``: Marking the following condition as pre-equilibration + condition. (Despite ``-inf``, the pre-equilibration-starts at ``t=0`` and + simulation time is reset to ``TIMEPOINT`` of the following condition + afterwards). + + - ``float``: The timepoint at which to switch to the following + condition. The start time of the first non-preequilibration + condition is ``t_0``. If ``t_0`` is non-zero, then simulators are expected + to simulate from this non-zero time, not zero. + + - ``float0:float1``: indicates repetition of a period from ``time=float0``, + every ``float1`` time units, until the next period + +``CONDITION_ID``: + + References condition IDs from the conditions table that specify which + changes to apply at ``TIMEPOINT`` + + Note: The time interval in which a condition is applied includes the + respective starting timepoint, but excludes the starting timepoint of + the following condition. This means that for a timecourse + ``[time_A:condition_A; time_B:condition_B]``, ``condition_A`` is active + during the interval ``[time_A, time_B)``. This implies that any event + assignment that triggers at ``time_B`` will occur *after* ``condition_B`` was + applied and for any measurements at ``time_B``, the observables will be + evaluated *after* ``condition_B`` was applied. + + +Math expressions syntax +----------------------- + +This section describes the syntax of math expressions used in PEtab files, such +as the observable formulas. + +Supported symbols, literals, and operations are described in the following. Whitespace is ignored in math expressions. + + +Symbols +~~~~~~~ + +* The supported identifiers are: + + * parameter IDs from the parameter table + * model entity IDs that are globally unique and have a clear interpretation + in the math expression context + * observable IDs from the observable table + * PEtab placeholder IDs in the observable and noise formulas + * PEtab entity IDs in the mapping table + * ``time`` for the model time + * PEtab function names listed below + + Identifiers are not supported if they do not match the PEtab identifier + format. PEtab expressions may have further context-specific restrictions on + supported identifiers. + +* The functions defined in PEtab are tabulated below. Other functions, + including those defined in the model, remain undefined in PEtab expressions. + +* Special symbols (such as :math:`e` and :math:`\pi`) are not supported, and + neither is NaN (not-a-number). + +Model time +++++++++++ + +The model time is represented by the symbol ``time``, which is the current +simulated time, not the current duration of simulated time; if the simulation +starts at :math:`t_0 \neq 0`, then ``time`` is *not* the time since +:math:`t_0`. + + +Literals +~~~~~~~~ + +Numbers ++++++++ + +All numbers, including integers, are treated as floating point numbers of +undefined precision (although no less than double precision should be used. +Only decimal notation is supported. Scientific notation +is supported, with the exponent indicated by ``e`` or ``E``. The decimal +separator is indicated by ``.``. +Examples of valid numbers are: ``1``, ``1.0``, ``-1.0``, ``1.0e-3``, ``1.0e3``, +``1e+3``. The general syntax in PCRE2 regex is ``\d*(\.\d+)?([eE][-+]?\d+)?``. +``inf`` and ``-inf`` are supported as positive and negative infinity. + +Booleans +++++++++ + +Boolean literals are ``true`` and ``false``. + + +Operations +~~~~~~~~~~ + +Operators ++++++++++ + +The supported operators are: + +.. list-table:: Supported operators in PEtab math expressions. + :header-rows: 1 + + * - Operator + - Precedence + - Interpretation + - Associativity + - Arguments + - Evaluates to + * - ``f(arg1[, arg2, ...])`` + - 1 + - call to function `f` with arguments `arg1`, `arg2`, ... + - left-to-right + - any + - input-dependent + * - | ``()`` + | + - | 1 + | + - | parentheses for grouping + | acts like identity + - | + | + - | any single expression + | + - | argument + | + * - | ``^`` + | + - | 2 + | + - | exponentiation + | (shorthand for pow) + - | right-to-left + | + - | float, float + | + - | float + | + * - | ``+`` + | ``-`` + - | 3 + - | unary plus + | unary minus + - | right-to-left + - | float + - | float + * - ``!`` + - 3 + - not + - + - bool + - bool + * - | ``*`` + | ``/`` + - | 4 + - | multiplication + | division + - | left-to-right + - | float, float + - | float + * - | ``+`` + | ``-`` + - | 5 + - | binary plus, addition + | binary minus, subtraction + - | left-to-right + - | float, float + - | float + * - | ``<`` + | ``<=`` + | ``>`` + | ``>=`` + - | 6 + - | less than + | less than or equal to + | greater than + | greater than or equal to + - | left-to-right + - | float, float + - | bool + * - | ``==`` + | ``!=`` + - | 6 + - | is equal to + | is not equal to + - | left-to-right + - | (float, float) or (bool, bool) + - | bool + * - | ``&&`` + | ``||`` + - | 7 + - | logical `and` + | logical `or` + - | left-to-right + - | bool, bool + - | bool + * - ``,`` + - 8 + - function argument separator + - left-to-right + - any + - + +Note that operator precedence might be unexpected, compared to other programming +languages. Use parentheses to enforce the desired order of operations. + +Operators must be specified; there are no implicit operators. +For example, ``a b`` is invalid, unlike ``a * b``. + +Functions ++++++++++ + +The following functions are supported: + +.. + START TABLE Supported functions (GENERATED, DO NOT EDIT, INSTEAD EDIT IN PEtab/doc/src) +.. list-table:: Supported functions + :header-rows: 1 + + * - | Function + - | Comment + - | Argument types + - | Evaluates to + * - ``pow(a, b)`` + - power function `b`-th power of `a` + - float, float + - float + * - ``exp(x)`` + - | exponential function pow(e, x) + | (`e` itself not a supported symbol, + | but ``exp(1)`` can be used instead) + - float + - float + * - ``sqrt(x)`` + - | square root of ``x`` + | ``pow(x, 0.5)`` + - float + - float + * - | ``log(a, b)`` + | ``log(x)`` + | ``ln(x)`` + | ``log2(x)`` + | ``log10(x)`` + - | logarithm of ``a`` with base ``b`` + | ``log(x, e)`` + | ``log(x, e)`` + | ``log(x, 2)`` + | ``log(x, 10)`` + | (``log(0)`` is defined as ``-inf``) + | (NOTE: ``log`` without explicit + | base is ``ln``, not ``log10``) + - float[, float] + - float + * - | ``sin`` + | ``cos`` + | ``tan`` + | ``cot`` + | ``sec`` + | ``csc`` + - trigonometric functions + - float + - float + * - | ``arcsin`` + | ``arccos`` + | ``arctan`` + | ``arccot`` + | ``arcsec`` + | ``arccsc`` + - inverse trigonometric functions + - float + - float + * - | ``sinh`` + | ``cosh`` + | ``tanh`` + | ``coth`` + | ``sech`` + | ``csch`` + - hyperbolic functions + - float + - float + * - | ``arcsinh`` + | ``arccosh`` + | ``arctanh`` + | ``arccoth`` + | ``arcsech`` + | ``arccsch`` + - inverse hyperbolic functions + - float + - float + * - | ``piecewise(`` + | ``true_value_1,`` + | ``condition_1,`` + | ``[true_value_2,`` + | ``condition_2,]`` + | ``[...]`` + | ``[true_value_n,`` + | ``condition_n,]`` + | ``otherwise`` + | ``)`` + - | The function value is + | the ``true_value*`` for the + | first ``true`` ``condition*`` + | or ``otherwise`` if all + | conditions are ``false``. + - | ``*value*``: all float or all bool + | ``condition*``: all bool + - float + * - ``abs(x)`` + - | absolute value + | ``piecewise(x, x>=0, -x)`` + - float + - float + * - ``sign(x)`` + - | sign of ``x`` + | ``piecewise(1, x > 0, -1, x < 0, 0)`` + - float + - float + * - | ``min(a, b)`` + | ``max(a, b)`` + - | minimum / maximum of {``a``, ``b``} + | ``piecewise(a, a<=b, b)`` + | ``piecewise(a, a>=b, b)`` + - float, float + - float + +.. + END TABLE Supported functions + + +Boolean <-> float conversion +++++++++++++++++++++++++++++ + +Boolean and float values are implicitly convertible. The following rules apply: + +bool -> float: ``true`` is converted to ``1.0``, ``false`` is converted to +``0.0``. + +float -> bool: ``0.0`` is converted to ``false``, all other values are +converted to ``true``. + +Operands and function arguments are implicitly converted as needed. If there is +no signature compatible with the given types, Boolean +values are promoted to float. If there is still no compatible signature, +float values are demoted to boolean values. For example, in ``1 + true``, +``true`` is promoted to ``1.0`` and the expression is interpreted as +``1.0 + 1.0 = 2.0``, whereas in ``1 && true``, ``1`` is demoted to ``true`` and +the expression is interpreted as ``true && true = true``. + + +Identifiers +----------- + +* All identifiers in PEtab may only contain upper and lower case letters, + digits and underscores, and must not start with a digit. In PCRE2 regex, they + must match ``[a-zA-Z_][a-zA-Z_\d]*``. + +* Identifiers are case-sensitive. + +* Identifiers must not be a reserved keyword (see below). + +* Identifiers must be globally unique within the PEtab problem. + PEtab math function names must not be used as identifiers for other model + entities. PEtab does not put any further restrictions on the use of + identifiers within the model, which means modelers could potentially + use model-format--specific (e.g. SBML) function names as identifiers. + However, this is strongly discouraged. + +Reserved keywords +~~~~~~~~~~~~~~~~~ + +The following keywords, `case-insensitive`, are reserved and must not be used +as identifiers: + +* ``true``, ``false``: Boolean literals, used in PEtab expressions. +* ``inf``: Infinity, used in PEtab expressions and post-equilibration + measurements +* ``time``: Model time, used in PEtab expressions. +* ``nan``: Undefined in PEtab, but reserved to avoid implementation issues. diff --git a/doc/gfx/petab_scope_and_files.pdf b/doc/gfx/petab_scope_and_files.pdf index 223c53b3..e0ca20f4 100644 Binary files a/doc/gfx/petab_scope_and_files.pdf and b/doc/gfx/petab_scope_and_files.pdf differ diff --git a/doc/gfx/petab_scope_and_files.png b/doc/gfx/petab_scope_and_files.png index e22e3e42..4bad5733 100644 Binary files a/doc/gfx/petab_scope_and_files.png and b/doc/gfx/petab_scope_and_files.png differ diff --git a/doc/gfx/petab_scope_and_files.svg b/doc/gfx/petab_scope_and_files.svg index df960999..de0945dd 100644 --- a/doc/gfx/petab_scope_and_files.svg +++ b/doc/gfx/petab_scope_and_files.svg @@ -1,23 +1,23 @@ + inkscape:export-ydpi="150" + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" + xmlns:xlink="http://www.w3.org/1999/xlink" + xmlns="http://www.w3.org/2000/svg" + xmlns:svg="http://www.w3.org/2000/svg" + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" + xmlns:cc="http://creativecommons.org/ns#" + xmlns:dc="http://purl.org/dc/elements/1.1/"> @@ -38,9 +38,7 @@ + type="text/css"> *{stroke-linecap:butt;stroke-linejoin:round;} @@ -76,23 +74,26 @@ borderopacity="1.0" inkscape:pageopacity="0.0" inkscape:pageshadow="2" - inkscape:zoom="2.8284272" - inkscape:cx="278.80053" - inkscape:cy="388.82208" + inkscape:zoom="1" + inkscape:cx="-9.9999997" + inkscape:cy="183.49999" inkscape:document-units="mm" - inkscape:current-layer="g17790" + inkscape:current-layer="g1314" showgrid="false" inkscape:window-width="1920" - inkscape:window-height="974" - inkscape:window-x="0" - inkscape:window-y="0" + inkscape:window-height="1017" + inkscape:window-x="-8" + inkscape:window-y="-8" inkscape:window-maximized="1" inkscape:snap-global="true" fit-margin-top="0" fit-margin-left="0" fit-margin-right="0" fit-margin-bottom="0" - inkscape:document-rotation="0" /> + inkscape:document-rotation="0" + inkscape:showpageshadow="2" + inkscape:pagecheckerboard="0" + inkscape:deskcolor="#d1d1d1" /> @@ -853,14 +854,14 @@ transform="matrix(0.4999368,0,0,0.49990604,-57.129388,202.66202)" style="stroke-linecap:butt;stroke-linejoin:round"> + style="opacity:1;fill:#e6e6e6;fill-opacity:1;fill-rule:nonzero;stroke:#b3b3b3;stroke-width:0.79014;stroke-linecap:round;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" /> Measurement table + style="opacity:1;fill:#800000;fill-opacity:0.12549;fill-rule:nonzero;stroke:#800000;stroke-width:0.683813;stroke-linecap:round;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" /> + d="m 56.476194,188.71858 c 0,0 89.971406,0 107.320826,0" + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.240694px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" /> observableId simulationConditionId time measurementobservableId timecourseId time measurementObservable1 Condition1 1.0 2.0Observable1 Timecourse1 1.0 2.0Observable2 Condition2 1.0 3.0Observable2 Timecourse2 1.0 3.0... + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" /> + ... + + + Timecourses table + + + + timecourseId timecourseTimecourse1 0:Condition1 Timecourse2 0:Condition2 ... + + + width="35.566269" + height="39.630146" + x="198.76614" + y="-10.82364" + rx="2.8864582" + ry="2.6441293" /> =0, -x)`` float float +``sign(x)`` sign of ``x``;``piecewise(1, x>=0, -1)`` float float +``min(a, b)``;``max(a, b)`` minimum / maximum of {``a``, ``b``};``piecewise(a, a<=b, b)``;``piecewise(a, a>=b, b)`` float, float float diff --git a/doc/src/update_tables.py b/doc/src/update_tables.py new file mode 100755 index 00000000..bbc1935d --- /dev/null +++ b/doc/src/update_tables.py @@ -0,0 +1,93 @@ +#!/usr/bin/env python3 + +import pandas as pd +from pathlib import Path + +doc_dir = Path(__file__).parent.parent +table_dir = Path(__file__).parent + +MULTILINE_DELIMITER = ";" +tables = { + "Supported functions": { + "target": doc_dir / "documentation_data_format.rst", + "options": { + "header-rows": "1", + # "widths": "20 10 10 5", + }, + }, +} + + +def df_to_list_table(df, options, name): + columns = df.columns + table = f".. list-table:: {name}\n" + for option_id, option_value in options.items(): + table += f" :{option_id}: {option_value}\n" + table += "\n" + + first = True + for column in columns: + if first: + table += " * " + first = False + else: + table += " " + table += f"- | {column}\n" + + for _, row in df.iterrows(): + first = True + for column in columns: + cell = row[column] + if first: + table += " * " + first = False + else: + table += " " + table += "- " + if MULTILINE_DELIMITER in cell: + first_line = True + for line in cell.split(MULTILINE_DELIMITER): + if first_line: + table += "| " + first_line = False + else: + table += " | " + table += line + table += "\n" + else: + table += cell + table += "\n" + + return table + + +def replace_text(filename, text, start, end): + with open(filename, "r") as f: + full_text0 = f.read() + before_start = full_text0.split(start)[0] + after_end = full_text0.split(end)[1] + full_text = ( + before_start + + start + + text + + end + + after_end + ) + with open(filename, "w") as f: + f.write(full_text) + + +DISCLAIMER = "(GENERATED, DO NOT EDIT, INSTEAD EDIT IN PEtab/doc/src)" + + +for table_id, table_data in tables.items(): + target_file = table_data["target"] + options = table_data["options"] + df = pd.read_csv(table_dir/ f"{table_id}.tsv", sep="\t") + table = df_to_list_table(df, options=options, name=table_id) + replace_text( + filename=target_file, + text=table, + start=f"\n..\n START TABLE {table_id} {DISCLAIMER}\n", + end=f"\n..\n END TABLE {table_id}\n", + ) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index b95908b5..983ac3a1 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -20,8 +20,8 @@ For more details, we refer to the original publication. A PEtab problem consists of 1) an SBML model of a biological system, 2) condition, observable and measurement definitions, and 3) the -specification of the parameters. We will show how to generate the -respective files in the following. +specification of the parameters and 4) a configuration file that lists all of +these files. We will show how to generate the respective files in the following. 1. The model ++++++++++++ @@ -120,7 +120,7 @@ overridden by these condition-specific values. Here, we define the Epo concentration, but additional columns could be used to e.g. set different initial concentrations of STAT5A/B. In addition to numeric values, also parameter identifiers can be used here to introduce -condition specific optimization parameters. +condition specific estimateable parameters. 2.2 Specifying the observation model ------------------------------------ @@ -130,7 +130,7 @@ functions. Additionally, a noise model can be introduced to account for the measurement errors. In PEtab, this can be encoded in the observable file: -.. list-table:: Observables table ``observables.tsv``. +.. list-table:: Observable table ``observables.tsv``. :header-rows: 1 * - observableId @@ -146,7 +146,7 @@ file: - Rel. STAT5A abundance [%] - ... -.. list-table:: Observables table ``observables.tsv`` (continued). +.. list-table:: Observable table ``observables.tsv`` (continued). :header-rows: 1 * - ... @@ -162,7 +162,7 @@ file: - 100*(STAT5A + pApB + 2*pApA) / (2 \* pApB + 2\* pApA + STAT5A + STAT5B + 2*pBpB) - ... -.. list-table:: Observables table ``observables.tsv`` (continued). +.. list-table:: Observable table ``observables.tsv`` (continued). :header-rows: 1 * - ... @@ -235,8 +235,8 @@ PEtab measurement file: brevity, only the first and last time point of the example are shown here (the omitted measurements are indicated by “...” in the example). -* *noiseParameters* relates to the *noiseParameters* in the observables - file. In our example, the measurement noise is unknown. Therefore we +* *noiseParameters* relates to the *noiseParameters* in the observable table. + In our example, the measurement noise is unknown. Therefore we define parameters here which have to be estimated (see parameters sheet below). If the noise is known, e.g. from multiple replicates, numeric values can be used in this column. @@ -273,17 +273,17 @@ The parameters file for this is given by: observables (*sd_{observableId}*) are estimated. * *parameterScale* is the scale on which parameters are estimated. Often, - a logarithmic scale improves optimization. Alternatively, a linear scale + a logarithmic scale improves estimation. Alternatively, a linear scale can be used, e.g. when parameters can be negative. * *lowerBound* and *upperBound* define the bounds for the parameters used - during optimization. These are usually biologically plausible ranges. + during estimation. These are usually biologically plausible ranges. * *nominalValue* are known values used for simulation. The entry can be - left empty, if a value is unknown and subject to optimization. + left empty, if a value is unknown and requires estimation. -* *estimate* defines whether the parameter is subject to optimization (1) - or if it is fixed (0) to the value in the nominalValue column. +* *estimate* defines whether the parameter will be estimated (1) + or be fixed (0) to the value in the nominalValue column. 4. Visualization file +++++++++++++++++++++ @@ -324,11 +324,10 @@ https://petab.readthedocs.io/en/latest/documentation_data_format.html#visualizat 5. YAML file ++++++++++++ -To group the previously mentioned PEtab files, a YAML file can be used, -defining which files constitute a PEtab problem. While being optional, -this makes it easier to import a PEtab problem into tools, and allows -reusing files for different PEtab problems. This file has the following -format (``Boehm_JProteomeRes2014.yaml``): +To group the previously mentioned PEtab files, a YAML file must be used, +defining which files constitute a PEtab problem. This makes it easier to import +a PEtab problem into tools, and allows reusing files for different PEtab +problems. This file has the following format (``Boehm_JProteomeRes2014.yaml``): .. code-block:: yaml