Skip to content

Commit

Permalink
create consume_modules() to properly load annotations in get_overrides()
Browse files Browse the repository at this point in the history
  • Loading branch information
BurnzZ committed Jan 6, 2022
1 parent 10dff5b commit daa3ff9
Show file tree
Hide file tree
Showing 6 changed files with 131 additions and 7 deletions.
36 changes: 33 additions & 3 deletions docs/intro/overrides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,31 @@ to see the other functionalities.
using only the ``default_registry``. There's no need to declare multiple
:class:`~.PageObjectRegistry` instances and use multiple annotations.

.. warning::

:meth:`~.PageObjectRegistry.get_overrides` relies on the fact that all essential
packages/modules which contains the :meth:`~.PageObjectRegistry.handle_urls`
annotations are properly loaded.

Thus, for cases like importing Page Objects from another external package, you'd
need to properly load all :meth:`~.PageObjectRegistry.handle_urls` annotations
from the external module. This ensures that the external Page Objects' have
their annotations properly loaded.

This can be done via the function named :func:`~.web_poet.overrides.consume_modules`.
Here's an example:

.. code-block:: python
from web_poet import default_registry, consume_modules
consume_modules("external_package_A.po", "another_ext_package.lib")
rules = default_registry.get_overrides()
**NOTE**: :func:`~.web_poet.overrides.consume_modules` must be called before
:meth:`~.PageObjectRegistry.get_overrides` for the imports to properly load.


A handy CLI tool is also available at your disposal to quickly see the available
Override rules in a given module in your project. For example, invoking something
like ``web_poet my_project.page_objects`` would produce the following:
Expand Down Expand Up @@ -226,7 +251,7 @@ Then we could easily retrieve all Page Objects per subpackage or module like thi

.. code-block:: python
from web_poet import default_registry
from web_poet import default_registry, consume_modules
# We can do it per website.
rules = default_registry.get_overrides_from("my_page_obj_project.cool_gadget_site")
Expand All @@ -236,11 +261,16 @@ Then we could easily retrieve all Page Objects per subpackage or module like thi
rules = default_registry.get_overrides_from("my_page_obj_project.cool_gadget_site.us")
rules = default_registry.get_overrides_from("my_page_obj_project.cool_gadget_site.fr")
# or even drill down further to the specific module.
# Or even drill down further to the specific module.
rules = default_registry.get_overrides_from("my_page_obj_project.cool_gadget_site.us.products")
rules = default_registry.get_overrides_from("my_page_obj_project.cool_gadget_site.us.product_listings")
# Or simply all of Override rules ever declared.
# Or simply all of the Override rules ever declared.
rules = default_registry.get_overrides()
# Lastly, you'd need to properly load external packages/modules for the
# @handle_urls annotation to be correctly read.
consume_modules("external_package_A.po", "another_ext_package.lib")
rules = default_registry.get_overrides()
Multiple Registry Approach
Expand Down
23 changes: 22 additions & 1 deletion tests/test_overrides.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
PONestedModule,
PONestedModuleOverridenSecondary,
)
from web_poet import consume_modules
from web_poet.overrides import PageObjectRegistry, default_registry


Expand All @@ -17,9 +18,19 @@

def test_list_page_objects_all():
rules = default_registry.get_overrides()

page_objects = {po.use for po in rules}

# Note that the 'tests_extra.po_lib_sub_not_imported.POLibSubNotImported'
# Page Object is not included here since it was never imported anywhere in
# our test package. It would only be included if we run any of the following
# below. (Note that they should run before `get_overrides` is called.)
# - from tests_extra import po_lib_sub_not_imported
# - import tests_extra.po_lib_sub_not_imported
# - web_poet.consume_modules("tests_extra")
# Merely having `import tests_extra` won't work since the subpackages and
# modules needs to be traversed and imported as well.
assert all(["po_lib_sub_not_imported" not in po.__module__ for po in page_objects])

# Ensure that ALL Override Rules are returned as long as the given
# registry's @handle_urls annotation was used.
assert page_objects == POS.union({POLibSub})
Expand All @@ -29,6 +40,16 @@ def test_list_page_objects_all():
assert rule.meta == rule.use.expected_meta, rule.use


def test_list_page_objects_all_consume_modules():
"""A test similar to the one above but calls ``consume_modules()`` to properly
load the @handle_urls annotations from other modules/packages.
"""
consume_modules("tests_extra")
rules = default_registry.get_overrides()
page_objects = {po.use for po in rules}
assert any(["po_lib_sub_not_imported" in po.__module__ for po in page_objects])


def test_list_page_objects_from_pkg():
"""Tests that metadata is extracted properly from the po_lib package"""
rules = default_registry.get_overrides_from("tests.po_lib")
Expand Down
5 changes: 5 additions & 0 deletions tests_extra/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""
This test package was created separately to see the behavior of retrieving the
Override rules declared on a registry where @handle_urls is defined on another
package.
"""
28 changes: 28 additions & 0 deletions tests_extra/po_lib_sub_not_imported/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""
This package quite is similar to tests/po_lib_sub in terms of code contents.
What we're ultimately trying to test here is to see if the `default_registry`
captures the rules annotated in this module if it was not imported.
"""
from typing import Dict, Any, Callable

from url_matcher import Patterns

from web_poet import handle_urls


class POBase:
expected_overrides: Callable
expected_patterns: Patterns
expected_meta: Dict[str, Any]


class POLibSubOverridenNotImported:
...


@handle_urls("sub_example_not_imported.com", POLibSubOverridenNotImported)
class POLibSubNotImported(POBase):
expected_overrides = POLibSubOverridenNotImported
expected_patterns = Patterns(["sub_example_not_imported.com"])
expected_meta = {} # type: ignore
2 changes: 1 addition & 1 deletion web_poet/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .pages import WebPage, ItemPage, ItemWebPage, Injectable
from .page_inputs import ResponseData
from .overrides import handle_urls, PageObjectRegistry, default_registry
from .overrides import handle_urls, PageObjectRegistry, default_registry, consume_modules
44 changes: 42 additions & 2 deletions web_poet/overrides.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
import importlib.util
import warnings
import pkgutil
from collections import deque
from dataclasses import dataclass, field
from types import ModuleType
from typing import Iterable, Union, List, Callable, Dict, Any

from url_matcher import Patterns
Expand Down Expand Up @@ -164,7 +166,7 @@ def get_overrides_from(self, module: str) -> List[OverrideRule]:
"""
rules: Dict[Callable, OverrideRule] = {}

for mod in walk_modules(module):
for mod in walk_module(module):
# Dict ensures that no duplicates are collected and returned.
rules.update(self._filter_from_module(mod.__name__))

Expand All @@ -191,7 +193,7 @@ def _filter_from_module(self, module: str) -> Dict[Callable, OverrideRule]:
handle_urls = default_registry.handle_urls


def walk_modules(module: str) -> Iterable:
def walk_module(module: str) -> Iterable:
"""Return all modules from a module recursively.
Note that this will import all the modules and submodules. It returns the
Expand All @@ -212,3 +214,41 @@ def onerror(err):
):
mod = importlib.import_module(info.name)
yield mod


def consume_modules(*modules: str) -> None:
"""A quick wrapper for :func:`~.walk_module` to efficiently consume the
generator and recursively load all packages/modules.
This function is essential to be run before calling :meth:`~.PageObjectRegistry.get_overrides`
from the :class:`~.PageObjectRegistry`. It essentially ensures that the
``@handle_urls`` are properly acknowledged for modules/packages that are not
imported.
Let's take a look at an example:
.. code-block:: python
# my_page_obj_project/load_rules.py
from web_poet import default_registry, consume_modules
consume_modules("other_external_pkg.po", "another_pkg.lib")
rules = default_registry.get_overrides()
For this case, the Override rules are coming from:
- ``my_page_obj_project`` `(since it's the same module as the file above)`
- ``other_external_pkg.po``
- ``another_pkg.lib``
So if the ``default_registry`` had other ``@handle_urls`` annotations outside
of the packages/modules list above, then the Override rules won't be returned.
"""

for module in modules:
gen = walk_module(module)

# Inspired by itertools recipe: https://docs.python.org/3/library/itertools.html
# Using a deque() results in a tiny bit performance improvement that list().
deque(gen, maxlen=0)

0 comments on commit daa3ff9

Please sign in to comment.