Skip to content

Commit

Permalink
update docs with web-poet's new MVP version and POP definition
Browse files Browse the repository at this point in the history
  • Loading branch information
BurnzZ committed Mar 2, 2022
1 parent 0c94cf6 commit 1f52f3b
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 49 deletions.
44 changes: 12 additions & 32 deletions docs/overrides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,50 +155,31 @@ For example:
consume=["external_package_A", "another_ext_package.lib"]
)
# Or, you could even extract the rules on a specific subpackage or module.
SCRAPY_POET_OVERRIDES = default_registry.get_overrides(
filters=["external_page_objects_package", "another_page_object_package.module_1"]
)
The ``get_overrides()`` method of the ``default_registry`` above returns
``List[OverrideRule]`` that were declared using `web-poet`_'s ``@handle_urls()``
annotation. This is much more convenient that manually defining all of the
`OverrideRule``. Take note that since ``SCRAPY_POET_OVERRIDES`` is structured as
``List[OverrideRule]``, you can easily modify it later on if needed.

.. note::
.. tip::

For more info and advanced features of `web-poet`_'s ``@handle_urls``
and its registry, kindly read the `web-poet <https://web-poet.readthedocs.io>`_
documentation regarding Overrides.
If you're using External Packages which conform to the **POP**
standards as described in **web-poet's** `Page Object Projects (POP)
<https://web-poet.readthedocs.io/en/stable/intro/pop.html>`_ section,
then retrieving the rules should be as easy as:

In case the external packages you're using does not use `web-poet`_'s
``default_registry``, you can find and collect custom registries via `web-poet`_'s
``registry_pool``:

.. code-block:: python
.. code-block:: python
from web_poet import registry_pool, consume_modules
import external_package_A, another_ext_package
# Ensures that the external dependencies are properly imported so that the
# Registry and its accompanying rules can be discovered.
consume_modules("external_package_A", "another_ext_package_B.lib")
SCRAPY_POET_OVERRIDES = external_package_A.RULES + another_ext_package.RULES
print(registry_pool)
# {
# 'default': <web_poet.overrides.PageObjectRegistry object at 0x7f47d654d8b0>,
# 'custom_reg' = <external_package_A.PageObjectRegistry object at 0x7f47d654382a>,
# 'another_custom_reg' = <another_ext_package_B.lib.PageObjectRegistry object at 0xd93746549dea>,
# }
.. note::

SCRAPY_POET_OVERRIDES = [
rule
for _, registry in registry_pool.items()
for rule in registry.get_overrides()
]
For more info and advanced features of `web-poet`_'s ``@handle_urls``
and its registry, kindly read the `web-poet <https://web-poet.readthedocs.io>`_
documentation regarding Overrides.

# Converting it to a set also ensures that there are no duplicate OverrideRules.
SCRAPY_POET_OVERRIDES = set(SCRAPY_POET_OVERRIDES)

Overrides registry
==================
Expand All @@ -217,4 +198,3 @@ must be a subclass of ``scrapy_poet.overrides.OverridesRegistryBase``
and must implement the method ``overrides_for``. As other Scrapy components,
it can be initialized from the ``from_crawler`` class method if implemented.
This might be handy to be able to access settings, stats, request meta, etc.

36 changes: 19 additions & 17 deletions scrapy_poet/overrides.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,21 +57,9 @@ class OverridesRegistry(OverridesRegistryBase):
Now, if you've used ``web-poet``'s built-in functionality to directly create
the override rules in the Page Object via the ``@handle_urls`` annotation,
you can quickly import them via:
.. code-block:: python
from web_poet import default_registry
SCRAPY_POET_OVERRIDES = default_registry.get_overrides(filters="my_page_objects_module")
It finds all the rules annotated using ``web-poet``'s ``@handle_urls``
decorator inside the ``my_page_objects_module`` module and all of its
submodules.
However, for most cases, you'd most likely going to simply retrieve all of
the override rules that were ever declared on a given registry. Though make
sure to call ``consume_module()`` beforehand:
you can quickly import them via the following code below. It finds all the
rules annotated using ``web-poet``'s ``@handle_urls`` decorator that were
registered into ``web_poet.default_registry``.
.. code-block:: python
Expand All @@ -88,8 +76,22 @@ class OverridesRegistry(OverridesRegistryBase):
consume=["external_package_A.po", "another_ext_package.lib"]
)
More info on this at `web-poet <https://web-poet.readthedocs.io>`_.
"""
Make sure to call ``consume_module()`` beforehand. More info on this at
`web-poet <https://web-poet.readthedocs.io>`_.
.. tip::
If you're using External Packages which conform to the **POP**
standards as described in **web-poet's** `Page Object Projects (POP)
<https://web-poet.readthedocs.io/en/stable/intro/pop.html>`_ section,
then retrieving the rules should be as easy as:
.. code-block:: python
import external_package_A, another_ext_package
SCRAPY_POET_OVERRIDES = external_package_A.RULES + another_ext_package.RULES
"""

@classmethod
def from_crawler(cls, crawler: Crawler) -> Crawler:
Expand Down

0 comments on commit 1f52f3b

Please sign in to comment.