Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#64] Refactor markdown scanner #238

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

YuriRomanowski
Copy link
Contributor

@YuriRomanowski YuriRomanowski commented Dec 14, 2022

Description

Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor: improve handling annotations, remove IMSAll state, rename functions, isolate processing annotations for different types of Nodes, sharing common behaviour with factored out functions.

Related issue(s)

Needed to fix #64

✅ Checklist for your Pull Request

Ideally a PR has all of the checkmarks set.

If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).

Related changes (conditional)

  • Tests

    • If I added new functionality, I added tests covering it.
    • If I fixed a bug, I added a regression test to prevent the bug from
      silently reappearing again.
  • Documentation

    • I checked whether I should update the docs and did so if necessary:
  • Public contracts

    • Any modifications of public contracts comply with the Evolution
      of Public Contracts
      policy.
    • I added an entry to the changelog if my changes are visible to the users
      and
    • provided a migration guide for breaking changes if possible

Stylistic guide (mandatory)

✓ Release Checklist

  • I updated the version number in package.yaml.
  • I updated the changelog and moved everything
    under the "Unreleased" section to a new section for this release version.
  • (After merging) I edited the auto-release.
    • Change the tag and title using the format vX.Y.Z.
    • Write a summary of all user-facing changes.
    • Deselect the "This is a pre-release" checkbox at the bottom.
  • (After merging) I updated xrefcheck-action.
  • (After merging) I uploaded the package to hackage.

Problem: Current implementation of the markdown scanner is hard
to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor; improve handling annotations, remove IMSAll state
as it's not required, rename functions.
Problem: Current implementation of the markdown scanner is hard
to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor; isolate processing annotations for different
types of nodes.
@YuriRomanowski YuriRomanowski changed the title Yuri romanowski/#64 refactor markdown scanner [#64] Refactor markdown scanner Dec 14, 2022
Copy link
Contributor

@aeqz aeqz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactor. It will allow us to create new annotations more easily. I have added just a few comments.

Also, maybe there will be the need for another small refactor if at some point we increase the ScannerState with stateful annotations that do not correspond to ignore, but it can be done at that moment.

getPosition :: Node -> Maybe PosInfo
getPosition node@(Node pos _ _) = do
getPosition :: C.Node -> Maybe PosInfo
getPosition node@(C.Node pos _ _) = do
annLength <- length . T.strip <$> getHTMLText node
PosInfo sl sc _ _ <- pos
pure $ PosInfo sl sc sl (sc + annLength - 1)

-- | Extract `IgnoreMode` if current node is xrefcheck annotation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this comment should be updated:

IgnoreMode -> GetAnnotation

isIgnoreFile :: Node -> Bool
isIgnoreFile = (ValidMode IMAll ==) . getIgnoreMode
isIgnoreFile :: C.Node -> Bool
isIgnoreFile = (Just (IgnoreAnnotation IMAll) ==) . getAnnotation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have a function like

isGlobalAnnotation :: GetAnnotation -> Bool

and use it here. Also, if this function has no wildcard pattern matchings, the compiler will ask us if we want any new annotation that we add in the future to be global or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there can be only one type of annotation, which is IgnoreFile. Here similar function is introduced, called isHeaderNode.

NodeType ->
[ScannerM C.Node] ->
ScannerM C.Node
handleLink ign pos ty subs = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all the ign arguments can renamed to something like ann. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, they always correspond to Ignore.

<!-- ignore my-previous-comment -->

ScannerM C.Node
traverseNodeWithLinkExpected ignoreLinkState modePos pos ty subs = do
when (ignoreLinkState == ExpectingLinkInSubnodes) $
ssIgnore . _Just . ignoreMode .= IMSLink ParentExpectsLink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space after .=?

-> GetAnnotation
-> ScannerM C.Node
handleAnnotation pos nodeType = \case
IgnoreAnnotation mode -> do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space after mode?

@aeqz aeqz self-requested a review December 15, 2022 09:10
Nothing -> case e of
Nothing -> Node pos ty <$> sequence subs
Just (Ignore mode modePos) ->
case (mode, ty) of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Where is it? It seems it's not present in the final version

@@ -18,7 +18,7 @@
➥ In file check-scan-errors.md
scan error at src:21:1-50:

Unrecognised option "unrecognised-annotation" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all">
Unrecognised option "ignore unrecognised-annotation" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change.

Mm, however, in the commit description you explicitly tell that you refactor things, which assumes no changes in the application logic.

Please leave a note in the commit descrption that your refactoring led to this minor change.

Copy link
Contributor Author

@YuriRomanowski YuriRomanowski Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, I'll change the description later when prettying commit history.

handleLink ign pos ty subs = do
let traverseChildren = C.Node pos ty <$> sequence subs
-- It can be checked that it's correct for all the cases
ssIgnore .= Nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I'd agree with @aeqz on the maintainability issue.

Currently, the comment at line above is correct, but if someday someone adds some IMS123Paragraphs constructor, this statement will turn false but the compiler won't help in noticing that. And good luck to that guy figuring out where the state resets.

Also, AFAIU in the case of ign = IMSParagraph you rely on the behaviour of handleParagraph to strip the entire Paragraph subtree. This is not directly an issue, but this is a dependency between the code components that with this PR become coupled slightly less tightly (after being extracted to separate subfunctions), this increases the probability of bugs, and I'm a bit worried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, AFAIU in the case of ign = IMSParagraph you rely on the behaviour of handleParagraph to strip the entire Paragraph subtree.

It seems that this code is fully made up of such implicit dependencies, because the main reason is how we use annotations. We put an annotation somewhere in the text, and then the further behavior depends on the next nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of IMSParagraph, at least, we could try to look at the next node (not very easy to do) and perform some actions immediately, so, if the next is paragraph, ignore it, and if the next is something else, emit an error. But in case of link annotations we have to handle all this implicit stuff (and that's sad).

isSimpleComment node = do
let isComment = isJust $ getCommentContent node
isNotXrefcheckAnnotation = isNothing $ getXrefcheckContent node
isComment && isNotXrefcheckAnnotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good renaming and use of span 👍

I'm not sure though why providing mere isComment was bad 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we want to split header nodes and other contents of the file. So if for example ignore paragraph goes right after header nodes, we will stop immediately. And this annotations is also a comment, because isJust . getCommentContent is true for it. This way we exclude local annotations from the consideration.

Hm, but also this function will stop at an invalid annotation. Maybe it's better to allow it too, and just skip it reporting an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, no, I think it's better to stop at incorrect annotations 🤔

PARAGRAPH -> handleParagraph ign pos ty subs
LINK {} -> handleLink ign pos ty subs
IMAGE {} -> handleLink ign pos ty subs
_ -> handleOther ign pos ty subs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmhm, looking at the result of this change, I feel quite skeptical in fact.

When imagining the implementation, it seems simpler to keep an annotation in mind and think how it affects markdown nodes, rather than mentally sorting through all types of nodes (hard) and for each think how it is affected by each of our annotations (keeping all annotations in mind can become hard in the future). And this code goes against this model of thinking.

A related thing: code locality issues, what happens in handleParagraph seems to affect what happens in handleLink, this makes reasoning about the code correctness harder.

Perhaps you applied this refactoring to make the code shorter? We really spared several lines (and on one such place I left a comment because I think it is arguable), but overall the code seems to take more space now even if we don't take the function signatures into account.

I mostly agree with other minor changes in this commit, but the core part of this commit I find suspicious.

Could you tell which benefits, in your opinion, this rewrite gives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it's easier to reason about code behavior when a node type is known, because our types for ignore modes are, IMHO, rather confusing and hard to understand. And, in contrast, node type is something very clear and easy to work with, so I preferred it.

Things only gets worse if there are some different types of annotation (the primary reason of the whole this refactor). We'll have to handle a lot of different cases for all types of annotations and nodes.

Ignore IMSParagraph prevPos ->
lift . tell . makeError prevPos fp . ParagraphErr $ prettyType nodeType
Ignore (IMSLink _) prevPos ->
lift $ tell $ makeError prevPos fp LinkErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, the commit says that you apply a refactoring, but here in fact you add a behaviour that was not present before.

I more than agree that this check is good to add, but let's extract it to a separate commit.

Copy link
Contributor Author

@YuriRomanowski YuriRomanowski Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact the behavior hasn't changed here, because in the original version some of these checks were present in other cases on the top level of remove function, and currently all of them are here.

@YuriRomanowski
Copy link
Contributor Author

@Martoon-00 There are a lot of troubles in the Scanners/Markdown.hs module that are inherited from the past, I didn't fix them fully, just refactor slightly such that adding new annotations ceased to be really impossible. But we can try to further improve it in some extent, sure.

@Martoon-00
Copy link
Member

Related ticket: #276.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try to implement copy-paste protection checks
3 participants