Skip to content

Commit

Permalink
Add ECMA definition of symmetry operation regexp and associated tests
Browse files Browse the repository at this point in the history
  • Loading branch information
ml-evs committed Dec 23, 2023
1 parent df152e9 commit 9c05cca
Show file tree
Hide file tree
Showing 5 changed files with 3,935 additions and 19 deletions.
3 changes: 3 additions & 0 deletions GNUmakefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@
# - tests/generated/identifiers.ere
# - tests/generated/numbers.ere
# - tests/generated/strings.ere
# - tests/generated/symops.pcre
# - tests/generated/symop_definitions.pcre
# - tests/generated/symops.ecma
#
#
# Targets for testing / auditing the specification
Expand Down
47 changes: 29 additions & 18 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3748,34 +3748,45 @@ The Symmetry Operation String Regular Expressions
-------------------------------------------------

Symmetry operation strings that comprise the :property:`space_group_symmetry_operation_xyz` property MUST conform to the following regular expressions.
The regular expressions are recorded in the Perl Compatible Regular Expression (PCRE) syntax, with `Perl extensions <https://perldoc.perl.org/perlre>`__ used for readability.
The :val:`symop_definitions` section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching.
The :val:`symops` section contains the REs themselves.
The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the :code:`/x` RE option.
A working example of these REs in action can be found in the :code:`tests/cases/pcre_symops_001.sh` and other test cases.
The regular expressions are recorded below in two forms:
- Perl Compatible Regular Expression (PCRE) syntax, with `Perl extensions <https://perldoc.perl.org/perlre>`__ used for readability and expressivity.
The :val:`symop_definitions` section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching.
The :val:`symops` section contains the REs themselves.
The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the :code:`/x` RE option.
A working example of these REs in action can be found in the :code:`tests/cases/pcre_symops_001.sh` and other test cases.

.. code:: PCRE
.. code:: PCRE
#BEGIN PCRE symop_definitions
#BEGIN PCRE symop_definitions
$translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6';
$translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6';
$symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?";
$symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?";
$symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?";
$symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?";
$symop_re = "($symop_translation_appended|$symop_translation_prepended)";
$symop_re = "($symop_translation_appended|$symop_translation_prepended)";
#END PCRE symop_definitions
#END PCRE symop_definitions
.. code:: PCRE
.. code:: PCRE
#BEGIN PCRE symops
#BEGIN PCRE symops
^ # From the beginning of the string...
($symop_re)(,$symop_re){2}
$ # ... match to the very end of the string
^ # From the beginning of the string...
($symop_re)(,$symop_re){2}
$ # ... match to the very end of the string
#END PCRE symops
#END PCRE symops
- The regular expression is also provided in a simplified explicit form compatible with the subset of the `ECMA 262 dialect <https://ecma-international.org/publications-and-standards/standards/ecma-262/>`_ syntax supported by `JSON Schema field "pattern" <https://json-schema.org/draft/2020-12/json-schema-validation#name-pattern>_`:

.. code:: ECMA
#BEGIN ECMA symops
^([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?)$
#END ECMA symops
OPTIMADE JSON lines partial data format
---------------------------------------
Expand Down
23 changes: 23 additions & 0 deletions tests/cases/ecma_symops_001.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#! /bin/sh

# Test case: test if a provided PCRE correctly recognises symmetry
# operation strings.

#BEGIN DEPEND

INPUT_GRAMMAR=tests/generated/symops.ecma

#END DEPEND


/usr/bin/env python << EOF
import re
import sys
with open("${INPUT_GRAMMAR}") as f:
expression = [line.strip() for line in f.readlines() if line.strip() and not line.strip().startswith("#")][0]
with open("tests/inputs/symops.lst") as cases:
for case in cases:
if re.match(expression, case):
print(case, end="")
EOF
8 changes: 7 additions & 1 deletion tests/makefiles/Makelocal-grammars
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ EBNF_FILES ?= ${GRAMMARS:%=${GRAMMAR_DIR}/%.ebnf}
GRAMMAR_FILES ?= ${EBNF_FILES:%.ebnf=%.g}

REGEXPS = $(sort $(shell awk '/^ *${RE_START_STRING}/{print $$3}' ${RST_FILES} | tr -d "\r"))
REGEXP_FILES = ${REGEXPS:%=${GRAMMAR_DIR}/%.ere} ${REGEXPS:%=${GRAMMAR_DIR}/%.pcre}
REGEXP_FILES = ${REGEXPS:%=${GRAMMAR_DIR}/%.ere} ${REGEXPS:%=${GRAMMAR_DIR}/%.pcre} ${REGEXPS:%=${GRAMMAR_DIR}/%.ecma}

GRAMMAR_DEPENDENCIES = .grammars.d

Expand All @@ -47,6 +47,8 @@ ${GRAMMAR_DEPENDENCIES}: ${RST_FILES}
$^ | tr -d "\r" >> $@
awk '/^ *${RE_START_STRING} PCRE/{print "${GRAMMAR_DIR}/"$$3".pcre:", FILENAME}' \
$^ | tr -d "\r" >> $@
awk '/^ *${RE_START_STRING} ECMA/{print "${GRAMMAR_DIR}/"$$3".ecma:", FILENAME}' \
$^ | tr -d "\r" >> $@

${GRAMMAR_DIR}/%.ebnf:
awk '/^ *${GRAMMAR_START_STRING} $*/,/^ *${GRAMMAR_END_STRING} $*/' $< \
Expand All @@ -60,6 +62,10 @@ ${GRAMMAR_DIR}/%.pcre:
awk '/^ *${RE_START_STRING} PCRE $*/,/^ *${RE_END_STRING} PCRE $*/' $< \
| sed 's/^ //' | tr -d "\r" > $@

${GRAMMAR_DIR}/%.ecma:
awk '/^ *${RE_START_STRING} ECMA $*/,/^ *${RE_END_STRING} ECMA $*/' $< \
| sed 's/^ //' | tr -d "\r" > $@

.PHONY: tools

${GRAMMAR_DIR}/%.g: ${GRAMMAR_DIR}/%.ebnf | tools
Expand Down
Loading

0 comments on commit 9c05cca

Please sign in to comment.