Skip to content

Commit

Permalink
Csse pyd2 warnings (#354)
Browse files Browse the repository at this point in the history
* further equalize return btwn v1 and v2

* molecule.extras default

* avoid pubchem error

* add docs, fix warnings

* ok move back a few

* suppress warnings
  • Loading branch information
loriab authored Oct 28, 2024
1 parent 4f8a691 commit b1199ed
Show file tree
Hide file tree
Showing 14 changed files with 310 additions and 88 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/CI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
if: matrix.python-version == '3.9'
run: poetry install --no-interaction --no-ansi --extras test
- name: Run tests
run: poetry run pytest -rws -v --cov=qcelemental --color=yes --cov-report=xml
run: poetry run pytest -rws -v --cov=qcelemental --color=yes --cov-report=xml #-k "not pubchem_multiout_g"
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3 # NEEDS UPDATE TO v3 https://github.com/codecov/codecov-action
- name: QCSchema Examples Deploy
Expand Down
3 changes: 3 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Breaking Changes
++++++++++++++++
* The very old model names `ResultInput`, `Result`, `ResultProperties`, `Optimization` deprecated in 2019 are now only available through `qcelelemental.models.v1`
* ``models.v2`` do not support AutoDoc. The AutoDoc routines have been left at pydantic v1 syntax. Use autodoc-pydantic for Sphinx instead.
* Unlike Levi's pyd v2, this doesn't forward define dict, copy, json to v2 models. Instead it backwards-defines model_dump, model_dump_json, model_copy to v1. This will impede upgrading but be cleaner in the long run. See commented-out functions to temporarily restore this functionality. v2.Molecule retains its dict for now

New Features
++++++++++++
Expand All @@ -35,6 +36,8 @@ New Features

Enhancements
++++++++++++
* Fix a lot of warnings originating in this project.
* `Molecule.extras` now defaults to `{}` rather than None in both v1 and v2. Input None converts to {} upon instantiation.
* ``v2.FailedOperation`` field `id` is becoming `Optional[str]` instead of plain `str` so that the default validates.
* v1.ProtoModel learned `model_copy`, `model_dump`, `model_dump_json` methods (all w/o warnings) so downstream can unify on newer syntax. Levi's work alternately/additionally taught v2 `copy`, `dict`, `json` (all w/warning) but dict has an alternate use in Pydantic v2.
* ``AtomicInput`` and ``AtomicResult`` ``OptimizationInput``, ``OptimizationResult``, ``TorsionDriveInput``, ``TorsionDriveResult``, ``FailedOperation`` (both versions) learned a ``.convert_v(ver)`` function that returns self or the other version.
Expand Down
154 changes: 143 additions & 11 deletions docs/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ as their base to provide serialization, validation, and manipluation.


Basics
--------
------

Model creation occurs with a ``kwargs`` constructor as shown by equivalent operations below:

Expand All @@ -16,11 +16,27 @@ Model creation occurs with a ``kwargs`` constructor as shown by equivalent opera
>>> mol = qcel.models.Molecule(symbols=["He"], geometry=[0, 0, 0])
>>> mol = qcel.models.Molecule(**{"symbols":["He"], "geometry": [0, 0, 0]})
A list of all available fields can be found by querying the ``fields`` attribute:
Certain models (Molecule in particular) have additional convenience instantiation functions, like
the below for hydroxide ion:

.. code-block:: python
>>> mol.fields.keys()
>>> mol = qcel.models.Molecule.from_data("""
-1 1
O 0 0 0
H 0 0 1.2
""")
A list of all available fields can be found by querying for fields:

.. code-block:: python
# QCSchema v1 / Pydantic v1
>>> mol.__fields__.keys()
dict_keys(['symbols', 'geometry', ..., 'id', 'extras'])
# QCSchema v2 / Pydantic v2
>>> mol.model_fields.keys()
dict_keys(['symbols', 'geometry', ..., 'id', 'extras'])
These attributes can be accessed as shown:
Expand All @@ -37,11 +53,13 @@ Note that these models are typically immutable:
>>> mol.symbols = ["Ne"]
TypeError: "Molecule" is immutable and does not support item assignment
To update or alter a model the ``copy`` command can be used with the ``update`` kwargs:
To update or alter a model the ``model_copy`` command can be used with the ``update`` kwargs.
Note that ``model_copy`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
The older Pydantic v1 syntax, ``copy``, will only work on QCSchema v1 models.

.. code-block:: python
>>> mol.copy(update={"symbols": ["Ne"]})
>>> mol.model_copy(update={"symbols": ["Ne"]})
< Geometry (in Angstrom), charge = 0.0, multiplicity = 1:
Center X Y Z
Expand All @@ -53,26 +71,30 @@ To update or alter a model the ``copy`` command can be used with the ``update``
Serialization
-------------

All models can be serialized back to their dictionary counterparts through the ``dict`` function:
All models can be serialized back to their dictionary counterparts through the ``model_dump`` function:
Note that ``model_dump`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
The older Pydantic v1 syntax, ``dict``, will only work on QCSchema v1 models. It has a different effect on v2 models.

.. code-block:: python
>>> mol.dict()
>>> mol.model_dump()
{'symbols': ['He'], 'geometry': array([[0., 0., 0.]])}
JSON representations are supported out of the box for all models:
Note that ``model_dump_json`` is Pydantic v2 syntax, but it will work on QCSchema v1 and v2 models.
The older Pydantic v1 syntax, ``json``, will only work on QCSchema v1 models.

.. code-block:: python
>>> mol.json()
>>> mol.model_dump_json()
'{"symbols": ["He"], "geometry": [0.0, 0.0, 0.0]}'
Raw JSON can also be parsed back into a model:

.. code-block:: python
>>> mol.parse_raw(mol.json())
>>> mol.parse_raw(mol.model_dump_json())
< Geometry (in Angstrom), charge = 0.0, multiplicity = 1:
Center X Y Z
Expand All @@ -82,10 +104,120 @@ Raw JSON can also be parsed back into a model:
>
The standard ``dict`` operation returns all internal representations which may be classes or other complex structures.
To return a JSON-like dictionary the ``dict`` function can be used:
To return a JSON-like dictionary the ``model_dump`` function can be used:

.. code-block:: python
>>> mol.dict(encoding='json')
>>> mol.model_dump(encoding='json')
{'symbols': ['He'], 'geometry': [0.0, 0.0, 0.0]}
QCSchema v2
-----------

Starting with QCElemental v0.50.0, a new "v2" version of QCSchema is accessible. In particular:

* QCSchema v2 is written in Pydantic v2 syntax. (Note that a model with submodels may not mix Pydantic v1 and v2 models.)
* Major QCSchema v2 models have field ``schema_version=2``. Note that Molecule has long had ``schema_version=2``, but this belongs to QCSchema v1. The QCSchema v2 Molecule has ``schema_version=3``.
* QCSchema v2 has certain field rearrangements that make procedure models more composable. They also make v1 and v2 distinguishable in dictionary form.
* QCSchema v2 does not include new features. It is purely a technical upgrade.

Also see https://github.com/MolSSI/QCElemental/issues/323 for details and progress. The changelog contains details.

The anticipated timeline is:

* v0.50 — QCSchema v2 available. QCSchema v1 unchanged (files moved but imports will work w/o change). There will be beta releases.
* v0.70 — QCSchema v2 will become the default. QCSchema v1 will remain available, but it will require specific import paths (available as soon as v0.50).
* v1.0 — QCSchema v2 unchanged. QCSchema v1 dropped. Earliest 1 Jan 2026.

Both QCSchema v1 and v2 will be available for quite awhile to allow downstream projects time to adjust.

To make sure you're using QCSchema v1:

.. code-block:: python
# replace
>>> from qcelemental.models import AtomicResult, OptimizationInput
# by
>>> from qcelemental.models.v1 import AtomicResult, OptimizationInput
To try out QCSchema v2:

.. code-block:: python
# replace
>>> from qcelemental.models import AtomicResult, OptimizationInput
# by
>>> from qcelemental.models.v2 import AtomicResult, OptimizationInput
To figure out what model you're working with, you can look at its Pydantic base or its QCElemental base:

.. code-block:: python
# make molecules
>>> mol1 = qcel.models.v1.Molecule(symbols=["O", "H"], molecular_charge=-1, geometry=[0, 0, 0, 0, 0, 1.2])
>>> mol2 = qcel.models.v2.Molecule(symbols=["O", "H"], molecular_charge=-1, geometry=[0, 0, 0, 0, 0, 1.2])
>>> print(mol1, mol2)
Molecule(name='HO', formula='HO', hash='6b7a42f') Molecule(name='HO', formula='HO', hash='6b7a42f')
# query v1 molecule
>>> isinstance(mol1, pydantic.v1.BaseModel)
True
>>> isinstance(mol1, pydantic.BaseModel)
False
>>> isinstance(mol1, qcel.models.v1.ProtoModel)
True
>>> isinstance(mol1, qcel.models.v2.ProtoModel)
False
# query v2 molecule
>>> isinstance(mol2, pydantic.v1.BaseModel)
False
>>> isinstance(mol2, pydantic.BaseModel)
True
>>> isinstance(mol2, qcel.models.v1.ProtoModel)
False
>>> isinstance(mol2, qcel.models.v2.ProtoModel)
True
Most high-level models (e.g., ``AtomicInput``, not ``Provenance``) have a ``convert_v`` function to convert between QCSchema versions. It returns the input object if called with the current version.

.. code-block:: python
>>> inp1 = qcel.models.v1.AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule=mol1)
>>> print(inp1)
AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule_hash='6b7a42f')
>>> inp1.schema_version
1
>>> inp2 = qcel.models.v2.AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule=mol2)
>>> print(inp2)
AtomicInput(driver='energy', model={'method': 'pbe', 'basis': 'pvdz'}, molecule_hash='6b7a42f')
>>> inp2.schema_version
2
# now convert
>>> inp1_now2 = inp1.convert_v(2)
>>> print(inp1_now2.schema_version)
2
>>> inp2_now1 = inp1.convert_v(1)
>>> print(inp2_now1.schema_version)
1
Error messages aren't necessarily helpful in the upgrade process.

.. code-block:: python
# This usually means you're calling Pydantic v1 functions (dict, json, copy) on a Pydantic v2 model.
# There are dict and copy functions commented out in qcelemental/models/v2/basemodels.py that you
# can uncomment and use temporarily to ease the upgrade, but the preferred route is to switch to
# model_dump, model_dump_json, model_copy that work on QCSchema v1 and v2 models.
>>> TypeError: ProtoModel.serialize() got an unexpected keyword argument 'by_alias'
# This usually means you're mixing a v1 model into a v2 model. Check all the imports from
# qcelemental.models for version specificity. If the import can't be updated, run `convert_v`
# on the model.
>>> pydantic_core._pydantic_core.ValidationError: 1 validation error for AtomicInput
>>> molecule
>>> Input should be a valid dictionary or instance of Molecule [type=model_type, input_value=Molecule(name='HO', formula='HO', hash='6b7a42f'), input_type=Molecule]
>>> For further information visit https://errors.pydantic.dev/2.5/v/model_type
8 changes: 5 additions & 3 deletions qcelemental/models/v1/molecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ class Molecule(ProtoModel):
"never need to be manually set.",
)
extras: Dict[str, Any] = Field( # type: ignore
None,
{},
description="Additional information to bundle with the molecule. Use for schema development and scratch space.",
)

Expand Down Expand Up @@ -350,7 +350,7 @@ def __init__(self, orient: bool = False, validate: Optional[bool] = None, **kwar
kwargs = {**kwargs, **schema} # Allow any extra fields
validate = True

if "extras" not in kwargs:
if "extras" not in kwargs or kwargs["extras"] is None: # latter re-defaults to empty dict
kwargs["extras"] = {}
super().__init__(**kwargs)

Expand Down Expand Up @@ -552,10 +552,12 @@ def __eq__(self, other):
by scientific terms, and not programing terms, so it's less rigorous than
a programmatic equality or a memory equivalent `is`.
"""
import qcelemental

if isinstance(other, dict):
other = Molecule(orient=False, **other)
elif isinstance(other, Molecule):
elif isinstance(other, (qcelemental.models.v2.Molecule, Molecule)):
# allow v2 on grounds of "scientific, not programming terms"
pass
else:
raise TypeError("Comparison molecule not understood of type '{}'.".format(type(other)))
Expand Down
9 changes: 6 additions & 3 deletions qcelemental/models/v2/basemodels.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,11 @@ def parse_file(cls, path: Union[str, Path], *, encoding: Optional[str] = None) -

return cls.parse_raw(path.read_bytes(), encoding=encoding)

def dict(self, **kwargs) -> Dict[str, Any]:
warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
return self.model_dump(**kwargs)
# UNCOMMENT IF NEEDED FOR UPGRADE
# defining this is maybe bad idea as dict(v2) does non-recursive dictionary, whereas model_dump does nested
# def dict(self, **kwargs) -> Dict[str, Any]:
# warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
# return self.model_dump(**kwargs)

@model_serializer(mode="wrap")
def _serialize_model(self, handler) -> Dict[str, Any]:
Expand Down Expand Up @@ -235,6 +237,7 @@ def serialize(

return serialize(data, encoding=encoding)

# UNCOMMENT IF NEEDED FOR UPGRADE REDO!!!
def json(self, **kwargs):
# Alias JSON here from BaseModel to reflect dict changes
warnings.warn("The `json` method is deprecated; use `model_dump_json` instead.", DeprecationWarning)
Expand Down
10 changes: 7 additions & 3 deletions qcelemental/models/v2/molecule.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ class Molecule(ProtoModel):
"never need to be manually set.",
)
extras: Dict[str, Any] = Field( # type: ignore
None,
{},
description="Additional information to bundle with the molecule. Use for schema development and scratch space.",
)

Expand Down Expand Up @@ -382,7 +382,7 @@ def __init__(self, orient: bool = False, validate: Optional[bool] = None, **kwar
kwargs = {**kwargs, **schema} # Allow any extra fields
validate = True

if "extras" not in kwargs:
if "extras" not in kwargs or kwargs["extras"] is None: # latter re-defaults to empty dict
kwargs["extras"] = {}
super().__init__(**kwargs)

Expand Down Expand Up @@ -588,19 +588,23 @@ def __eq__(self, other):
by scientific terms, and not programing terms, so it's less rigorous than
a programmatic equality or a memory equivalent `is`.
"""
import qcelemental

if isinstance(other, dict):
other = Molecule(orient=False, **other)
elif isinstance(other, Molecule):
elif isinstance(other, (Molecule, qcelemental.models.v1.Molecule)):
# allow v2 on grounds of "scientific, not programming terms"
pass
else:
raise TypeError("Comparison molecule not understood of type '{}'.".format(type(other)))

return self.get_hash() == other.get_hash()

# UNCOMMENT IF NEEDED FOR UPGRADE REDO??
def dict(self, **kwargs):
warnings.warn("The `dict` method is deprecated; use `model_dump` instead.", DeprecationWarning)
return self.model_dump(**kwargs)
# TODO maybe bad idea as dict(v2) does non-recursive dictionary, whereas model_dump does nested

@model_serializer(mode="wrap")
def _serialize_molecule(self, handler) -> Dict[str, Any]:
Expand Down
5 changes: 4 additions & 1 deletion qcelemental/models/v2/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -773,7 +773,10 @@ def _version_stamp(cls, v):
@field_validator("return_result")
@classmethod
def _validate_return_result(cls, v, info):
if info.data["driver"] == "gradient":
if info.data["driver"] == "energy":
if isinstance(v, np.ndarray) and v.size == 1:
v = v.item(0)
elif info.data["driver"] == "gradient":
v = np.asarray(v).reshape(-1, 3)
elif info.data["driver"] == "hessian":
v = np.asarray(v)
Expand Down
12 changes: 8 additions & 4 deletions qcelemental/tests/addons.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,13 @@

def internet_connection():
try:
socket.create_connection(("www.google.com", 80))
return True
scc = socket.create_connection(("www.google.com", 80))
except OSError:
scc.close()
return False
else:
scc.close()
return True


using_web = pytest.mark.skipif(internet_connection() is False, reason="Could not connect to the internet")
Expand Down Expand Up @@ -62,15 +65,16 @@ def xfail_on_pubchem_busy():


def drop_qcsk(instance, tnm: str, schema_name: str = None):
is_model = isinstance(instance, (qcelemental.models.v1.ProtoModel, qcelemental.models.v2.ProtoModel))
# order matters for isinstance. a __fields__ warning is thrown if v1 before v2.
is_model = isinstance(instance, (qcelemental.models.v2.ProtoModel, qcelemental.models.v1.ProtoModel))
if is_model and schema_name is None:
schema_name = type(instance).__name__
drop = (_data_path / schema_name / tnm).with_suffix(".json")

with open(drop, "w") as fp:
if is_model:
# fp.write(instance.json(exclude_unset=True, exclude_none=True)) # works but file is one-line
instance = json.loads(instance.json(exclude_unset=True, exclude_none=True))
instance = json.loads(instance.model_dump_json(exclude_unset=True, exclude_none=True))
elif isinstance(instance, dict):
pass
else:
Expand Down
Loading

0 comments on commit b1199ed

Please sign in to comment.