Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the datasets docs #4400

Merged
merged 12 commits into from
Aug 1, 2023
5 changes: 2 additions & 3 deletions doc/code/qml_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,5 @@ qml.data

.. currentmodule:: pennylane.data

.. automodapi:: pennylane.data
:no-heading:
:no-inherited-members:
.. automodule:: pennylane.data

50 changes: 22 additions & 28 deletions doc/introduction/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,20 +31,22 @@ The :func:`~pennylane.data.load` function returns a ``list`` with the desired da

>>> H2datasets = qml.data.load("qchem", molname="H2", basis="STO-3G", bondlength=1.1)
>>> print(H2datasets)
[<Dataset = description: qchem/H2/STO-3G/1.1, attributes: ['molecule', 'hamiltonian', ...]>]
[<Dataset = molname: H2, basis: STO-3G, bondlength: 1.1, attributes: ['basis', 'basis_rot_groupings', ...]>]
>>> H2data = H2datasets[0]

We can load datasets for multiple parameter values by providing a list of values instead of a single value.
To load all possible values, use the special value :const:`~pennylane.data.FULL` or the string 'full':
To load all possible values, use the special value :const:`~pennylane.data.FULL` or the string ``"full"``:

>>> H2datasets = qml.data.load("qchem", molname="H2", basis="full", bondlength=[0.5, 1.1])
>>> print(H2datasets)
[<Dataset = description: qchem/H2/6-31G/0.5, attributes: ['molecule', 'hamiltonian', ...]>,
<Dataset = description: qchem/H2/6-31G/1.1, attributes: ['molecule', 'hamiltonian', ...]>,
<Dataset = description: qchem/H2/STO-3G/0.5, attributes: ['molecule', 'hamiltonian', ...]>,
<Dataset = description: qchem/H2/STO-3G/1.1, attributes: ['molecule', 'hamiltonian', ...]>]

When we only want to download portions of a large dataset, we can specify the desired properties (referred to as `attributes`).
[<Dataset = molname: H2, basis: STO-3G, bondlength: 0.5, attributes: ['basis', 'basis_rot_groupings', ...]>,
<Dataset = molname: H2, basis: STO-3G, bondlength: 1.1, attributes: ['basis', 'basis_rot_groupings', ...]>,
<Dataset = molname: H2, basis: CC-PVDZ, bondlength: 0.5, attributes: ['basis', 'basis_rot_groupings', ...]>,
<Dataset = molname: H2, basis: CC-PVDZ, bondlength: 1.1, attributes: ['basis', 'basis_rot_groupings', ...]>,
<Dataset = molname: H2, basis: 6-31G, bondlength: 0.5, attributes: ['basis', 'basis_rot_groupings', ...]>,
<Dataset = molname: H2, basis: 6-31G, bondlength: 1.1, attributes: ['basis', 'basis_rot_groupings', ...]>]

When we only want to download portions of a large dataset, we can specify the desired properties (referred to as 'attributes').
For example, we can download or load only the molecule and energy of a dataset as follows:

>>> part = qml.data.load("qchem", molname="H2", basis="STO-3G", bondlength=1.1,
Expand All @@ -57,16 +59,20 @@ For example, we can download or load only the molecule and energy of a dataset a
To determine what attributes are available for a type of dataset, we can use the function :func:`~pennylane.data.list_attributes`:

>>> qml.data.list_attributes(data_name="qchem")
["molecule",
"hamiltonian",
"sparse_hamiltonian",
...
"tapered_hamiltonian",
"full"]
['molname',
'basis',
'bondlength',
...
'vqe_params',
'vqe_energy']

.. note::

"full" is the default value for ``attributes``, and it means that all available attributes for the Dataset will be downloaded.
The default values for attributes are as follows:

- Molecules: ``basis`` is the smallest available basis, usually ``"STO-3G"``, and ``bondlength`` is the optimal bondlength for the molecule or an alternative if the optimal is not known.

- Spin systems: ``periodicity`` is ``"open"``, ``lattice`` is ``"chain"``, and ``layout`` is ``1x4`` for ``chain`` systems and ``2x2`` for ``rectangular`` systems.

Using Datasets in PennyLane
---------------------------
Expand Down Expand Up @@ -151,19 +157,6 @@ array([-1.5, -0.5, 0.5, 1.5])
Quantum Datasets Functions and Classes
--------------------------------------

Classes
^^^^^^^

.. autosummary::
:nosignatures:

~pennylane.data.Dataset

:html:`</div>`

Functions
^^^^^^^^^

:html:`<div class="summary-table">`

.. autosummary::
Expand All @@ -173,5 +166,6 @@ Functions
~pennylane.data.list_attributes
~pennylane.data.load
~pennylane.data.load_interactive
~pennylane.data.Dataset

:html:`</div>`
126 changes: 85 additions & 41 deletions pennylane/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,57 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The data subpackage provides functionality to access, store and manipulate quantum datasets.
"""The data subpackage provides functionality to access, store and manipulate `quantum datasets <https://pennylane.ai/qml/datasets.html>`_.

.. note::

For more details on using datasets, please see the
:doc:`quantum datasets quickstart guide </introduction/data>`.
josh146 marked this conversation as resolved.
Show resolved Hide resolved

Overview
--------

Datasets are generally stored and accessed using the :class:`~pennylane.data.Dataset` class.
Pre-computed datasets are available for download and can be accessed using the :func:`~pennylane.data.load` or
:func:`~pennylane.data.load_interactive` functions.
Additionally, users can easily create, write to disk, and read custom datasets using functions within the
:class:`~pennylane.data.Dataset` class.

.. currentmodule:: pennylane.data
.. autosummary::
:toctree: api
:toctree: api

Description
-----------
attribute
field
Dataset
DatasetNotWriteableError
load
load_interactive
list_attributes
list_datasets

In addition, various dataset types are provided

.. autosummary::
:toctree: api

AttributeInfo
DatasetAttribute
DatasetArray
DatasetScalar
DatasetString
DatasetList
DatasetDict
DatasetOperator
DatasetNone
DatasetMolecule
DatasetSparseArray
DatasetJSON
DatasetTuple

Datasets
~~~~~~~~
The :class:`Dataset` class provides a portable storage format for information describing a physical
--------

The :class:`~.Dataset` class provides a portable storage format for information describing a physical
system and its evolution. For example, a dataset for an arbitrary quantum system could have
a Hamiltonian, its ground state, and an efficient state-preparation circuit for that state. Datasets
can contain a range of object types, including:
Expand All @@ -41,10 +74,13 @@
- ``dict`` of any supported type, as long as the keys are strings


For more details on using datasets, please see the
:doc:`quantum datasets quickstart guide </introduction/data>`.

Creating a Dataset
~~~~~~~~~~~~~~~~~~
------------------

To create a new dataset in-memory, initialize a new ``Dataset`` with the desired attributes:
To create a new dataset in-memory, initialize a new :class:`~.Dataset` with the desired attributes:

>>> hamiltonian = qml.Hamiltonian([1., 1.], [qml.PauliZ(wires=0), qml.PauliZ(wires=1)])
>>> eigvals, eigvecs = np.linalg.eigh(qml.matrix(hamiltonian))
Expand All @@ -53,7 +89,8 @@
... eigen = {"eigvals": eigvals, "eigvecs": eigvecs}
... )
>>> dataset.hamiltonian
<Hamiltonian: terms=2, wires=[0, 1]>
(1.0) [Z0]
+ (1.0) [Z1]
>>> dataset.eigen
{'eigvals': array([-2., 0., 0., 2.]),
'eigvecs': array([[0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j],
Expand All @@ -63,69 +100,71 @@

Attributes can also be assigned to the instance after creation:

>>> dataset.ground_state = np.transpose(eigvecs)[np.argmin(eigvals)]
>>> dataset.ground_state
array([0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j])
>>> dataset.ground_state = np.transpose(eigvecs)[np.argmin(eigvals)]
>>> dataset.ground_state
array([0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j])


Reading and Writing Datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
----------------------------

Datasets can be saved to disk for later use. Datasets use the HDF5 format for serialization,
which uses the '.h5' file extension.

To save a dataset, use the :meth:`Dataset.write()` method:

>>> my_dataset = Dataset(...)
>>> my_dataset.write("~/datasets/my_dataset.h5")
>>> my_dataset = Dataset(...)
>>> my_dataset.write("~/datasets/my_dataset.h5")

To open a dataset from a file, use :meth:`Dataset.open()` class method:

>>> my_dataset = Dataset.open("~/datasets/my_dataset.h5", mode="r")
>>> my_dataset = Dataset.open("~/datasets/my_dataset.h5", mode="r")

The `mode` argument follow the standard library convention - 'r' for reading, 'w-' and `w` for create and overwrite,
and 'a' for editing. ``open()`` can be used to create a new dataset directly on disk:
The ``mode`` argument follow the standard library convention --- ``r`` for
reading, ``w-`` and ``w`` for create and overwrite, and 'a' for editing.
``open()`` can be used to create a new dataset directly on disk:

>>> new_dataset = Dataset.open("~/datasets/new_datasets.h5", mode="w")
>>> new_dataset = Dataset.open("~/datasets/new_datasets.h5", mode="w")

By default, any changes made to an opened dataset will be committed directly to the file, which will fail
if the file is opened read-only. The `"copy"` mode can be used to load the dataset into memory and detach
if the file is opened read-only. The ``"copy"`` mode can be used to load the dataset into memory and detach
it from the file:

>>> my_dataset = Dataset.open("~/dataset/my_dataset/h5", mode="copy")
>>> my_dataset.new_attribute = "abc"
>>> my_dataset = Dataset.open("~/dataset/my_dataset/h5", mode="copy")
>>> my_dataset.new_attribute = "abc"


Attribute Metadata
~~~~~~~~~~~~~~~~~~
------------------

Dataset attributes can also contain additional metadata, such as docstrings. The :func:`qml.data.attribute`
Dataset attributes can also contain additional metadata, such as docstrings. The :func:`~.data.attribute`
function can be used to attach metadata on assignment or initialization.

>>> hamiltonian = qml.Hamiltonian([1., 1.], [qml.PauliZ(wires=0), qml.PauliZ(wires=1)])
>>> eigvals, eigvecs = np.linalg.eigh(qml.matrix(hamiltonian))
>>> dataset = qml.data.Dataset(hamiltonian = qml.data.attribute(
hamiltonian,
doc="The hamiltonian of the system"))
>>> dataset.eigen = qml.data.attribute(
{"eigvals": eigvals, "eigvecs": eigvecs},
doc="Eigenvalues and eigenvectors of the hamiltonain")
>>> hamiltonian = qml.Hamiltonian([1., 1.], [qml.PauliZ(wires=0), qml.PauliZ(wires=1)])
>>> eigvals, eigvecs = np.linalg.eigh(qml.matrix(hamiltonian))
>>> dataset = qml.data.Dataset(hamiltonian = qml.data.attribute(
... hamiltonian,
... doc="The hamiltonian of the system"))
>>> dataset.eigen = qml.data.attribute(
... {"eigvals": eigvals, "eigvecs": eigvecs},
... doc="Eigenvalues and eigenvectors of the hamiltonain")

This metadata can then be accessed using the :meth:`Dataset.attr_info` mapping:

>>> dataset.attr_info["eigen"]["doc"]
'The hamiltonian of the system'
>>> dataset.attr_info["eigen"]["doc"]
'Eigenvalues and eigenvectors of the hamiltonain'


Declarative API
~~~~~~~~~~~~~~~
---------------

When creating datasets to model a physical system, it is common to collect the same data for
a system under different conditions or assumptions. For example, a collection of datasets describing
a quantum oscillator, which contains the first 1000 energy levels for different masses and force constants.

The datasets declarative API allows us to create subclasses of ``Dataset`` that define the required attributes,
or 'fields', and their associated type and documentation:
The datasets declarative API allows us to create subclasses
of :class:`Dataset` that define the required attributes, or 'fields', and
their associated type and documentation:

.. code-block:: python

Expand All @@ -144,9 +183,14 @@ class QuantumOscillator(qml.data.Dataset, data_name="quantum_oscillator", identi
When a ``QuantumOscillator`` dataset is created, its attributes will have the documentation from the field
definition:

>>> dataset = QuantumOscillator(mass=1, force_constant=0.5, hamiltonian=..., energy_levels=...)
>>> dataset.attr_info["mass"]["doc"]
'The mass of the particle'
>>> dataset = QuantumOscillator(
... mass=1,
... force_constant=0.5,
... hamiltonian=qml.PauliX(0),
... energy_levels=np.array([0.1, 0.2])
... )
>>> dataset.attr_info["mass"]["doc"]
'The mass of the particle'

"""

Expand Down
31 changes: 29 additions & 2 deletions pennylane/data/base/attribute.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,8 +416,35 @@ def __init_subclass__( # pylint: disable=arguments-differ
def attribute(
val: T, doc: Optional[str] = None, **kwargs: Any
) -> DatasetAttribute[HDF5Any, T, Any]:
"""Returns ``DatasetAttribute`` class matching ``val``, with other arguments passed
to the ``AttributeInfo`` class."""
"""Creates a dataset attribute that contains both a value and associated metadata.

Args:
val (any): the dataset attribute value
doc (str): the docstring that describes the attribute
**kwargs: Additional keyword arguments may be passed, which represents metadata
which describes the attribute.

Returns:
DatasetAttribute: an attribute object

.. seealso:: :class:`~.Dataset`

**Example**

>>> hamiltonian = qml.Hamiltonian([1., 1.], [qml.PauliZ(wires=0), qml.PauliZ(wires=1)])
>>> eigvals, eigvecs = np.linalg.eigh(qml.matrix(hamiltonian))
>>> dataset = qml.data.Dataset(hamiltonian = qml.data.attribute(
... hamiltonian,
... doc="The hamiltonian of the system"))
>>> dataset.eigen = qml.data.attribute(
... {"eigvals": eigvals, "eigvecs": eigvecs},
... doc="Eigenvalues and eigenvectors of the hamiltonain")

This metadata can then be accessed using the :meth:`~.Dataset.attr_info` mapping:

>>> dataset.attr_info["eigen"]["doc"]
'Eigenvalues and eigenvectors of the hamiltonain'
"""
josh146 marked this conversation as resolved.
Show resolved Hide resolved
return match_obj_type(val)(val, AttributeInfo(doc=doc, py_type=type(val), **kwargs))


Expand Down
37 changes: 37 additions & 0 deletions pennylane/data/base/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,43 @@ def field( # pylint: disable=too-many-arguments, unused-argument
py_type: Type annotation or string describing this object's type. If not
provided, the annotation on the class will be used
kwargs: Extra arguments to ``AttributeInfo``

Returns:
Field:

.. seealso:: :class:`~.Dataset`, :func:`~.data.attribute`

**Example**

The datasets declarative API allows us to create subclasses
of :class:`Dataset` that define the required attributes, or 'fields', and
their associated type and documentation:

.. code-block:: python

class QuantumOscillator(qml.data.Dataset, data_name="quantum_oscillator", identifiers=["mass", "force_constant"]):
\"""Dataset describing a quantum oscillator.\"""

mass: float = qml.data.field(doc = "The mass of the particle")
force_constant: float = qml.data.field(doc = "The force constant of the oscillator")
hamiltonian: qml.Hamiltonian = qml.data.field(doc = "The hamiltonian of the particle")
energy_levels: np.ndarray = qml.data.field(doc = "The first 1000 energy levels of the system")

The ``data_name`` keyword argument specifies a category or descriptive name for the dataset type, and the ``identifiers``
keyword argument specifies fields that function as parameters, i.e., they determine the behaviour
of the system.

When a ``QuantumOscillator`` dataset is created, its attributes will have the documentation from the field
definition:

>>> dataset = QuantumOscillator(
... mass=1,
... force_constant=0.5,
... hamiltonian=qml.PauliX(0),
... energy_levels=np.array([0.1, 0.2])
... )
>>> dataset.attr_info["mass"]["doc"]
'The mass of the particle'
"""

return Field(
Expand Down
Loading
Loading