The purpose of the assemblies yaml is to specify which biological assemblies are present. A typical assembly yaml will have a single assembly defined in it, although for large crystal screens this could easily become larger.
# assemblies.yaml
"0": # The name of the assembly: must be unique
reference: 5rgs # The dataset that is the template for the assembly
biomol: A # The names of the chains in the "abstract" biomolecule. This needed because the reference dataset may
# contain only one of the chains, and the others are then generated by some symmetry operation, as
# defined below
chains: A # The chains and associated symmtery operations used to generate the biomolecule - in this case the
# symmetry operation (the identity (x,y,z)) can be omitted
We could also define a dimer:
# assemblies.yaml
"1": # The name of the assembly: must be unique
reference: Mpro-IBM0045 # The dataset that is the template for the assembly
biomol: A,B # The names of the chains in the "abstract" biomolecule. This needed because the reference dataset may
# contain only one of the chains, and the others are then generated by some symmetry operation, as
# defined below
chains: A,A(-x,y,-z) # The chains and associated symmtery operations used to generate the biomolecule. Here the B
# chain of the biomolecule is generated by reflections in the b and z axis.
The purpose of the xtalform yaml is to specify the exact crystalform that is being worked with.
Two datasets are said to belong to the same crystalform if:
- They have similar unit cell parameters (a,b,c,alpha,beta,gamma)
- The same spacegroup
The alignment code will assign each crystal to the user entered xtalform which has the closest unit cell paramaters and the same spacegroup.
The xtalform yaml also specifies which biological assemblies are present in a crystalform.
The simplest xtalform has a single chain in the biological assembly:
# crystalforms.yaml
"0": # The name of the xtalform: must be unique
"reference": "5rgs" # The name of the reference dataset that will be used to get this crystalforms unit cell
"assemblies": # The biological assemblies present in this crystalform, and the operations which relate them
# to the reference dataset
"0": # The name of the assembly -within this xtalform-: must be unique within this xtalform
"assembly": "0" # The name of the assembly in the assemblies.yaml to match this assembly to
"chains": A(x,y,z) # The name of the chain in datasets from this crystalform and the symmetry operation
# which generates the corresponding (in index) chain in the reference assembly
A slightly more complicated example might feature a bioloigcal assembly with two chains - a dimer - only one of which is given in the pdb, and the other is generated by a crystallographic symmetry operation.
# crystalforms.yaml
"0":
"reference":
"assemblies":
"0":
"assembly": "1" # Now the assembly is the dimer, rather than the monomer!
"chains": A,A(-x,y,-z) # Now there is a second generator: this creates the second chain, B,
# by applying a symmetry operation to chain A. Notice the identity operation
# can be omitted!
The next level of complexity is a dataset which features multiple crystalforms. Here we add a second dimer, although this time the assembly is given in the file so there is no need for a non-identity symmetry operation.
# crystalforms.yaml
"0":
"reference": 5rgs
"assemblies":
"0":
"assembly": "0"
"chains": A,A(-x,y,-z)
"1":
"reference": Mpro-J0162
"assemblies":
"0":
"assembly": "1"
"chains": A,B # Notice this time the B chain of the dimer is generated by the identity operation applied to
# given B chain, rather than a symmetry operation duplicating the A chain
The most complicated systems may feature multiple assemblies in the same crystalform: for example Mpro has a crystalform in which two dimers are present.
# crystalforms.yaml
...
"3":
"reference": 8dz9
"assemblies":
"0":
"assembly": "1"
"chains": "A,B"
"1":
"assembly": "1"
"chains": C,D # Now chains C and D in datasets from this crystalform are matched to the Dimer's A and B chains