Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_check_toc_parents should consider only the descendants of root_doc and n… #13038

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

khanxmetu
Copy link
Contributor

@khanxmetu khanxmetu commented Oct 18, 2024

…ot the whole toctree_includes graph

Purpose

  • Skips the document referenced in multiple toctree warning/log when the said toctrees are not the descendants of root_doc toctree.
  • The warning/log output now aligns with the current implementation of global_toctree_for_doc and _get_toctree_ancestors (i.e root_doc being hardcoded as the root ancestor for every document).

Relates

@khanxmetu khanxmetu changed the title _check_toc_parents considers only the descendants of root_doc and n… _check_toc_parents should consider only the descendants of root_doc and n… Oct 18, 2024
@khanxmetu khanxmetu force-pushed the discussioncomment-10909269 branch 2 times, most recently from 07080c5 to f371bbb Compare October 18, 2024 15:36
Comment on lines 827 to 837
toc_parents: dict[str, list[str]] = {}
for parent, children in toctree_includes.items():
for child in children:
toc_parents.setdefault(child, []).append(parent)

def _find_toc_parents_dfs(node: str) -> None:
for child in toctree_includes.get(node, []):
toc_parents.setdefault(child, []).append(node)
is_child_already_visited = len(toc_parents[child]) > 1
if not is_child_already_visited:
_find_toc_parents_dfs(child)

_find_toc_parents_dfs(root_doc)
for doc, parents in sorted(toc_parents.items()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A personal opinion: the code might be easier to understand if the iteration source of the subsequent for loop -- toc_parents -- is assigned-to from the result of the _find_toc_parents_dfs function.

Explaining why: to me, function calls that have side-effects that affect outer-scoped variables are slightly hard to follow.

I think that another potential benefit could be that it'd be easier to write test coverage for the helper function (although I admit that it's a small one, and that perhaps the enclosing function is a better candidate for testing here).

Copy link
Contributor Author

@khanxmetu khanxmetu Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A personal opinion: the code might be easier to understand if the iteration source of the subsequent for loop -- toc_parents -- is assigned-to from the result of the _find_toc_parents_dfs function.

Explaining why: to me, function calls that have side-effects that affect outer-scoped variables are slightly hard to follow.

I think that another potential benefit could be that it'd be easier to write test coverage for the helper function (although I admit that it's a small one, and that perhaps the enclosing function is a better candidate for testing here).

I didn't worry too much about side-effects as it being more simplistic this way.

Here is the DFS without side-effect:

    def _find_toc_parents_dfs(node: str, toc_parents: dict[str, list[str]] = {}) -> dict[str, list[str]]:
        for child in toctree_includes.get(node, []):
            already_visited = child in toc_parents
            toc_parents.setdefault(child, []).append(node)
            if already_visited:
                continue
            _find_toc_parents_dfs(child, toc_parents)
        return toc_parents

Personally I found it slightly more complicated than needed because of toc_parents being propogated down the tree as a parameter but also being returned. Note that return toc_parents will only be used by the external caller of the helper function and not elsewhere.
Anyways I'm fine with this implementation too if you think so.

Edit: There exists another DFS implementation, without taking toc_parents dict as a parameter but only relying on return values, however I believe that would require combining the returned dicts from each subtree at each node which would be expensive.

for doc, parents in sorted(toc_parents.items()):
if len(parents) > 1:
logger.info(
__(
'document is referenced in multiple toctrees: %s, selecting: %s <- %s'
),
parents,
sorted(parents),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the sorting added here for debugging/investigation purposes? (and should we include it with these changes?)

Copy link
Contributor Author

@khanxmetu khanxmetu Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper function uses preorder traversal which does not guarantee sorted parents as was before. Sorting is kept for consistency reasons (independent of the helper function traversal order) in the logged output, this way it is also easier to write the corresponding tests instead of dry running the traversal order and depending on the helper functions implementation. Further, it also makes it easier for the user to spot the pattern that the lexicographically greatest parent is being selected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial sense here is that I'm not too keen on the practice of modifying application code in order to make test expectations easier to write.

I do understand that it helps in this case, but I think that unit test coverage of different tree/graph structures would be more robust over time.

(apologies for taking a while to add further review commentary)

Copy link
Contributor Author

@khanxmetu khanxmetu Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial sense here is that I'm not too keen on the practice of modifying application code

I do not understand how the application code is modified since the function _check_toc_parents() does not produce any side-effects other than the output. In fact I think the idea of applying sorted is quite the opposite.
I'd like to clarify again that the mentioned benefits/reasons in my previous comment of having sorted parents aren't enforced by this PR, instead the parents were already implicitly sorted previously due to the node-wise traversal and the sorted nature of values in toctree_includes. Since the traversal order is now changed to inorder which doesn't inherently guarantee parents being collected in sorted order, sorted function is now applied post-traversal, to keep it consistent with the previous behavior. You could argue that guaranteeing the order of parents should come from the nested helper function itself and the outer body of check_toc_parents() shall not be modified, however given the recursive nature of the helper function, I feel like the current approach is much simpler.

I do understand that it helps in this case, but I think that unit test coverage of different tree/graph structures would be more robust over time.

I don’t have a strong opinion on writing unittests for helper functions, in my opinion we should be testing based on functionality and not the implementation of a function which in this current case would mean that the tests should only care about consistent warning/logging and not about whatever method of traversal is used internally to achieve so. For example the helper method which used node-wise traversal previously, and now inorder traversal in this PR, should ideally NOT break the existing tests and hence having a determined order of parents regardless of the traversal algorithm helps achieve it.

I’d like to know more about what you think of this. If you still believe that we shouldn’t guarantee sorted order of parents anymore, I’d happily remove it.

Co-authored-by: James Addison <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants