Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mol2, SDF and XYZ File Parsers #418

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
fb47b92
XYZFile working with single model, some change with regards
entropybit Aug 24, 2022
b0b811d
XYZFile get_structure so far working (yet tests are not passing ...)
entropybit Aug 25, 2022
a1e0699
xyz file working with reading multiple models
entropybit Aug 26, 2022
68ca749
XYZFile finally fully working with tests
entropybit Aug 26, 2022
9767d04
XYZFile cleaning up for production code ...
entropybit Aug 27, 2022
977616a
Added some small molecule files for testing, specifically
entropybit Aug 27, 2022
34c9d4c
Finally fucking (somewhat) finished the MOL2File class.
entropybit Aug 29, 2022
9d61326
mol2 tests working again
entropybit Aug 30, 2022
6103a1a
Mol2File test expanded, charges and sybyl_atom type reading
entropybit Aug 30, 2022
7292457
forgot to commit changes on MOL2File before only pushed
entropybit Aug 30, 2022
36e0645
Modified test_mol so that only pdbx residues are tested
entropybit Aug 30, 2022
fc81ec9
all file formats somewhat working, at least reading works for
entropybit Aug 31, 2022
6f3b551
reading and writing of mol2 file now also works with atom_names
entropybit Sep 1, 2022
a8db20a
Mol test now passing without skipped for amino acids.
entropybit Sep 2, 2022
e591fef
Small change in XYZFile and also added all docstrings for
entropybit Sep 2, 2022
88fd7e3
XYZFile and MOL2File ready.
entropybit Sep 2, 2022
9128b08
SD File read/write with AtomArray and AtomArrayStack,
entropybit Sep 2, 2022
fb530c9
removed small previous change in test_mol
entropybit Sep 2, 2022
1971241
Merge branch 'master' into master
entropybit Sep 2, 2022
18479a8
removing get_header and set_header from test as apprently
entropybit Sep 2, 2022
4ddd8b1
Forgot to implement functionality in load_structure and
entropybit Sep 2, 2022
dd19716
Update src/biotite/structure/io/ctab.py
entropybit Sep 4, 2022
040f9c3
Update src/biotite/structure/io/ctab.py
entropybit Sep 4, 2022
d979cec
Update src/biotite/structure/io/general.py
entropybit Sep 4, 2022
ada8e76
Update src/biotite/structure/io/general.py
entropybit Sep 4, 2022
d193ae7
Update src/biotite/structure/io/general.py
entropybit Sep 4, 2022
882a5bf
Update src/biotite/structure/io/mol2/file.py
entropybit Sep 4, 2022
92a5aa7
Bega pep8 checking + reformatting in file.py as well as
entropybit Sep 4, 2022
fedc28a
Merge branch 'master' of github.com:entropybit/biotite
entropybit Sep 4, 2022
8e64067
PEP8 checking on file.py in xyz, also added myself
entropybit Sep 12, 2022
a34a344
Added test for get_model_count in XYZFiles, found
entropybit Sep 12, 2022
a4e63a1
Forgot contrib as well as changes to xyz file.
entropybit Sep 12, 2022
dd54098
Retrieving file name from absolute path now done in a way
entropybit Sep 12, 2022
501b07d
Typo in retrieve_file_name_from_path, hopefully working now
entropybit Sep 12, 2022
a0e05d7
MOL2File all files have conformity to PEP8 now.
entropybit Sep 12, 2022
473f50b
Changes in test_mol2 (now also covers using model parameter
entropybit Sep 13, 2022
8725b40
Changes in sybyl_atom_type heuristic
entropybit Sep 13, 2022
9ee1b20
Changed warning when MOLFile reads timestamp entry and
entropybit Sep 13, 2022
13c5810
Date parsing now works in header test.
entropybit Sep 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,4 @@ biotite.egg-info

# Ignore fuse_hidden files on Linux systems
*.fuse_hidden*

1 change: 1 addition & 0 deletions CONTRIB.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ CONTRIBUTORS
- Thomas Nevolianis <https://github.com/thomasnevolianis>
- Maximilian Greil <https://github.com/MaxGreil>
- Claude J. Rogers <https://github.com/claudejrogers>
- Benjamin E. Mayer <https://github.com/entropybit>
2 changes: 2 additions & 0 deletions src/biotite/structure/io/ctab.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ def read_structure_from_ctab(ctab_lines):
.. footbibliography::
"""
n_atoms, n_bonds = _get_counts(ctab_lines[0])


atom_lines = ctab_lines[1 : 1 + n_atoms]
bond_lines = ctab_lines[1 + n_atoms : 1 + n_atoms + n_bonds]

Expand Down
43 changes: 40 additions & 3 deletions src/biotite/structure/io/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ def load_structure(file_path, template=None, **kwargs):
# Stack containing only one model -> return as atom array
return array[0]
else:
return array
return array

elif suffix == ".cif" or suffix == ".pdbx":
from .pdbx import PDBxFile, get_structure
file = PDBxFile.read(file_path)
Expand Down Expand Up @@ -115,12 +116,33 @@ def load_structure(file_path, template=None, **kwargs):
return array[0]
else:
return array
elif suffix == ".mol" or suffix == ".sdf":
elif suffix == ".mol":
from .mol import MOLFile
file = MOLFile.read(file_path)
array = file.get_structure(**kwargs)
# MOL files only contain a single model
return array
elif suffix == ".sdf":
from .sdf import SDFile
file = SDFile.read(file_path)
array = file.get_structure(**kwargs)
# SDFile automatically detects if to return
# AtomArray or AtomArrayStack
return array
elif suffix == ".xyz":
from .xyz import XYZFile
file = XYZFile.read(file_path)
array = file.get_structure(**kwargs)
# XYZFile automatically detects if to return
# AtomArray or AtomArrayStack
return array
elif suffix == ".mol2":
from .mol2 import MOL2File
file = MOL2File.read(file_path)
array = file.get_structure(**kwargs)
# MOL2File automatically detects if to return
# AtomArray or AtomArrayStack
return array
elif suffix in [".trr", ".xtc", ".tng", ".dcd", ".netcdf"]:
if template is None:
raise TypeError("Template must be specified for trajectory files")
Expand Down Expand Up @@ -204,11 +226,26 @@ def save_structure(file_path, array, **kwargs):
file = NpzFile()
file.set_structure(array, **kwargs)
file.write(file_path)
elif suffix == ".mol" or suffix == ".sdf":
elif suffix == ".mol":
from .mol import MOLFile
file = MOLFile()
file.set_structure(array, **kwargs)
file.write(file_path)
elif suffix == ".sdf":
from .sdf import SDFile
file = SDFile()
file.set_structure(array, **kwargs)
file.write(file_path)
elif suffix == ".xyz":
from .xyz import XYZFile
file = XYZFile()
file.set_structure(array, **kwargs)
file.write(file_path)
elif suffix == ".mol2":
from .mol2 import MOL2File
file = MOL2File()
file.set_structure(array, **kwargs)
file.write(file_path)
elif suffix in [".trr", ".xtc", ".tng", ".dcd", ".netcdf"]:
from .trr import TRRFile
from .xtc import XTCFile
Expand Down
69 changes: 50 additions & 19 deletions src/biotite/structure/io/mol/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

# Number of header lines
N_HEADER = 3
DATE_FORMAT = "%d%m%y%H%M"
DATE_FORMATS = ["%d%m%y%H%M", "%m%d%y%H%M"]


class MOLFile(TextFile):
Expand All @@ -44,7 +44,9 @@ class MOLFile(TextFile):
--------

>>> from os.path import join
>>> mol_file = MOLFile.read(join(path_to_structures, "molecules", "TYR.sdf"))
>>> mol_file = MOLFile.read(
... join(path_to_structures, "molecules", "TYR.sdf")
... )
>>> atom_array = mol_file.get_structure()
>>> print(atom_array)
0 N 1.320 0.952 1.428
Expand Down Expand Up @@ -91,7 +93,8 @@ def get_header(self):
program : str
The program name.
time : datetime
The time of file creation.
The time of file creation. Returns None in this field if not
able to parse according entry in MOLFile as datetime string.
dimensions : str
Dimensional codes.
scaling_factors : str
Expand All @@ -103,19 +106,40 @@ def get_header(self):
comments : str
Additional comments.
"""
mol_name = self.lines[0].strip()
initials = self.lines[1][0:2].strip()
program = self.lines[1][2:10].strip()
time = datetime.datetime.strptime(self.lines[1][10:20],
DATE_FORMAT)
dimensions = self.lines[1][20:22].strip()
mol_name = self.lines[0].strip()
initials = self.lines[1][0:2].strip()
program = self.lines[1][2:10].strip()
# sometimes the string can not be interpreted as datetime
# in those cases instead of failing simply warn the user
time = None
if len(self.lines[1][10:20]) > 1:
time_parsing_succesfull = False
msg_last = ""
for format_i in DATE_FORMATS:
try:
time = datetime.datetime.strptime(
self.lines[1][10:20],
format_i
)
time_parsing_succesfull = True
break
except ValueError:
msg_last = self.lines[1][10:20].strip()[:len(format_i)]
msg_last += " could not be interpreted as datetime"

if not time_parsing_succesfull:
warn(UserWarning(msg_last))
time = self.lines[1][10:20]

dimensions = self.lines[1][20:22].strip()
scaling_factors = self.lines[1][22:34].strip()
energy = self.lines[1][34:46].strip()
energy = self.lines[1][34:46].strip()
registry_number = self.lines[1][46:52].strip()
comments = self.lines[2].strip()
return mol_name, initials, program, time, dimensions, \
scaling_factors, energy, registry_number, comments

comments = self.lines[2].strip()
return (
mol_name, initials, program, time, dimensions,
scaling_factors, energy, registry_number, comments
)

def set_header(self, mol_name, initials="", program="", time=None,
dimensions="", scaling_factors="", energy="",
Expand All @@ -132,7 +156,7 @@ def set_header(self, mol_name, initials="", program="", time=None,
program : str, optional
The program name. Maximum length is 8.
time : datetime or date, optional
The time of file creation.
The time of file creation, if none uses current time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The time of file creation, if none uses current time.
The time of file creation.
By default, current time is used.

dimensions : str, optional
Dimensional codes. Maximum length is 2.
scaling_factors : str, optional
Expand All @@ -144,9 +168,18 @@ def set_header(self, mol_name, initials="", program="", time=None,
comments : str, optional
Additional comments.
"""
if time is None:
time_str = ""
if time is not None and type(time) is datetime.datetime:
for format_i in DATE_FORMATS:
try:
time_str = time.strftime(format_i)
break
except ValueError:
time_str = time
# only fill with local time if nothing was provided via time
if len(time_str) == 0:
time = datetime.datetime.now()
time_str = time.strftime(DATE_FORMAT)
time_str = time.strftime(DATE_FORMATS[0])

self.lines[0] = str(mol_name)
self.lines[1] = (
Expand All @@ -160,7 +193,6 @@ def set_header(self, mol_name, initials="", program="", time=None,
)
self.lines[2] = str(comments)


def get_structure(self):
"""
Get an :class:`AtomArray` from the MOL file.
Expand All @@ -178,7 +210,6 @@ def get_structure(self):
raise InvalidFileError("File does not contain structure data")
return read_structure_from_ctab(ctab_lines)


def set_structure(self, atoms, default_bond_type=BondType.ANY):
"""
Set the :class:`AtomArray` for the file.
Expand Down
17 changes: 17 additions & 0 deletions src/biotite/structure/io/mol2/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# This source code is part of the Biotite package and is distributed
# under the 3-Clause BSD License. Please see 'LICENSE.rst' for further
# information.

"""
The MOL format is used to depict atom positions and bonds for small
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The MOL format is used to depict atom positions and bonds for small
The MOL2 format is used to depict atom positions and bonds for small

molecules.
This subpackage is used for reading and writing an :class:`AtomArray` or
an :class:`AtomArrayStack` for a file containing multiple models
in this format.
"""

__name__ = "biotite.structure.io.mol2"
__author__ = "Benjamin E. Mayer"

from .file import *
from .convert import *
113 changes: 113 additions & 0 deletions src/biotite/structure/io/mol2/convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# This source code is part of the Biotite package and is distributed
# under the 3-Clause BSD License. Please see 'LICENSE.rst' for further
# information.

__name__ = "biotite.structure.io.mol2"
__author__ = "Benjamin E. Mayer"
__all__ = [
"get_structure", "set_structure",
"get_charges", "set_charges",
"get_model_count"
]


def get_structure(mol2_file, model=None):
"""
Get an :class:`AtomArray` from the MOL2 File.

Ths function is a thin wrapper around
:meth:`MOL2File.get_structure()`.

Parameters
----------
mol2_file : MOL2File
The MOL2File.

Returns
-------
array : AtomArray, AtomArrayStack
Return an AtomArray or AtomArrayStack containing the structure or
structures depending on if file contains single or multiple models.
If something other then `NO_CHARGE` is set in the charge_type field
of the according mol2 file, the AtomArray or AtomArrayStack will
contain the charge field.
"""
return mol2_file.get_structure(model)


def set_structure(mol2_file, atoms):
"""
Set the :class:`AtomArray` for the MOL2 File.

Ths function is a thin wrapper around
:meth:`MOL2File.set_structure(atoms)`.

Parameters
----------
mol2_file : MOL2File
The XYZ File.
array : AtomArray
The array to be saved into this file.
Must have an associated :class:`BondList`.
If charge field set this is used for storage within the according
MOL2 charge column.
"""
mol2_file.set_structure(atoms)


def get_charges(mol2_file):
"""
Get an ndarray containing the partial charges from the MOL2File

This function is a thin wrapper around
:meth:`MOL2File.get_charges()`.

Parameters
----------
xyz_file : XYZFile
The XYZ File.

Returns
-------
array : AtomArray
This :class:`AtomArray` contains the optional ``charge``
annotation and has an associated :class:`BondList`.
All other annotation categories, except ``element`` are
empty.
Comment on lines +72 to +76
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function is called get_charges(), I think it should only return an ndarray containing the partial charges instead of an AtomArray

"""
return mol2_file.get_charges()


def set_charges(mol2_file, charges):
"""
Set the partial charges in the MOL2File to an ndarray
specified as parameter here.

Ths function is a thin wrapper around
:meth:`MOL2File.set_charges(charges)`.

Parameters
----------
mol2_file: MOL2File
The MOL2File
charges: ndarray
A ndarray containing data with `float` type to be written as
partial charges.

"""
return mol2_file.set_charges(charges)


def get_model_count(mol2_file):
"""
Get the number of models contained in the xyz file.

This function is a thin wrapper around
:meth:`MOL2File.get_model_count()`.

Returns
-------
model_count : int
The number of models.
"""
return mol2_file.get_model_count()
Loading