-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to implement copy-paste protection checks #64
Comments
If possible, we should check reference-style links ( However not sure how much is it possible, AFAIR such links are automatically inlined by our markdown parser. |
Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; improve handling annotations, remove IMSAll state as it's not required, rename functions.
Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; isolated processing annotations for different types of nodes.
Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; isolate processing annotations for different types of nodes.
Do we want to check only links within one list? How about checking all the links within a given file? |
Problem: Currently xrefcheck is not able to detect possibly bad copy-pastes, when some links are referring the same file, but from the link name it seems that one of that links should refer other file. Solution: Implement check, add support for related annotations for `.md` files, add corresponding settings to the config.
That's a good question. On the one hand, this increases the probability of getting a false positive. On the other, checking through the entire file may be more useful and will be a more transparent behaviour for the user. Let's really go with checking across the entire file. Over time we will collect some statistics on how this check works on real-life repositories and will revise the behaviour then. |
Remove extra parameters in md scanner
Problem: Currently xrefcheck is not able to detect possible bad copy-pastes, when some links are referring the same file, but from the link names it seems that one of those links should refer other file. Solution: Implement check, add corresponding settings to the config.
Problem: Currently xrefcheck is able to detect possibly bad copy-pastes, but there is no way to disable those checks locally for a file/paragraph/link. Solution: Add support for related annotations for `.md` files.
Review: fix config, README, CHANGES
Clarification and motivation
Imagine the following list of links:
[file](files/file.out)
[file2](files/file2.out)
[file3](files/file3.out)
[Another file](files/another-file.out)
[And a file once again](files/and-a-file-once-again.out)
It is easy to make a mistake here during copy-pasting so that text is updated and the link is not. I think we can use some heuristics to spot such mistakes (but avoid false positives at all costs):
[T1](L1)
and[T2](L1)
in a file, andT1
is substring ofL1
modulo casing and all the non-alphanum characters, whileT2
is not substring ofL1
modulo the same things;then report an error at
[T2](L1)
position, mentioning that it could be a bad copy-paste of[T1](L1)
. And a similar check for[T1](L2)
.Acceptance criteria
<--! xrefcheck: no duplication check in {file/paragraph/link} -->
.The text was updated successfully, but these errors were encountered: