Skip to content

Commit

Permalink
[#64] Implement copy/paste protection checks
Browse files Browse the repository at this point in the history
Problem: Currently xrefcheck is able to detect possibly bad
copy-pastes, but there is no way to disable those checks
locally for a file/paragraph/link.

Solution: Add support for related annotations for `.md` files.
  • Loading branch information
YuriRomanowski committed Dec 16, 2022
1 parent 44f21e5 commit 8c12e81
Show file tree
Hide file tree
Showing 20 changed files with 557 additions and 118 deletions.
4 changes: 3 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ Unreleased
+ Now we call references to anchors in current file (e.g. `[a](#b)`) as
`file-local` references instead of calling them `current file` (which was ambiguous).
* [#233](https://github.com/serokell/xrefcheck/pull/233)
+ Now xrefxcheck does not follow redirect links by default. It fails for permanent
+ Now xrefcheck does not follow redirect links by default. It fails for permanent
redirect responses (i.e. 301 and 308) and passes for temporary ones (i.e. 302, 303, 307).
* [#231](https://github.com/serokell/xrefcheck/pull/231)
+ Anchor analysis takes now into account the appropriate case-sensitivity depending on
the configured Markdown flavour.
* [240](https://github.com/serokell/xrefcheck/pull/240)
+ Now xrefcheck is able to detect possible copy-pastes relying on links and their names.

0.2.2
==========
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Comparing to alternative solutions, this tool tries to achieve the following poi
* Supports external links (`http`, `https`, `ftp` and `ftps`).
* Detects broken and ambiguous anchors in local links.
* Integration with GitHub Actions.
* Detects possible bad copy-pastes of links.

## Dependencies [](#xrefcheck)

Expand Down Expand Up @@ -148,6 +149,21 @@ There are several ways to fix this:
* By default, `xrefcheck` will ignore links to localhost.
* This behavior can be disabled by removing the corresponding entry from the `ignoreExternalRefsTo` list in the config file.

1. How do I disable copy-paste check for specific links?
* Add a `<!-- xrefcheck: no duplication check in link -->` annotation before the link:
```md
<!-- xrefcheck: no duplication check in link -->
Links with bad copypaste:
[good link](https://good.link.uri/).
[copypasted link](https://good.link.uri/).
```
```md
A [good link](https://good.link.uri/)
followed by an <!-- xrefcheck: no duplication check in link --> [copypasted intentionally](https://good.link.uri/).
```
* You can use a `<!-- xrefcheck: no duplication check in paragraph -->` annotation to disable copy-paste check in a paragraph.
* You can use a `<!-- xrefcheck: no duplication check in file -->` annotation to disable copy-paste check within an entire file.

## Further work [↑](#xrefcheck)

- [ ] Support link detection in different languages, not only Markdown.
Expand Down
25 changes: 15 additions & 10 deletions src/Xrefcheck/Core.hs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ module Xrefcheck.Core where
import Universum

import Control.Lens (makeLenses)
import Control.Lens.Combinators (makeLensesWith)
import Data.Aeson (FromJSON (..), withText)
import Data.Char (isAlphaNum)
import Data.Char qualified as C
Expand Down Expand Up @@ -70,14 +71,17 @@ instance Given ColorMode => Buildable Position where

-- | Full info about a reference.
data Reference = Reference
{ rName :: Text
{ rName :: Text
-- ^ Text displayed as reference.
, rLink :: Text
, rLink :: Text
-- ^ File or site reference points to.
, rAnchor :: Maybe Text
, rAnchor :: Maybe Text
-- ^ Section or custom anchor tag.
, rPos :: Position
, rPos :: Position
, rCheckCopyPaste :: Bool
-- ^ Whether to check bad copy/paste for this link
} deriving stock (Show, Generic, Eq, Ord)
makeLensesWith postfixFields ''Reference

-- | Context of anchor.
data AnchorType
Expand All @@ -102,9 +106,9 @@ data FileInfoDiff = FileInfoDiff
}
makeLenses ''FileInfoDiff

diffToFileInfo :: FileInfoDiff -> FileInfo
diffToFileInfo (FileInfoDiff refs anchors) =
FileInfo (DList.toList refs) (DList.toList anchors)
diffToFileInfo :: Bool -> FileInfoDiff -> FileInfo
diffToFileInfo ignoreCpcInFile (FileInfoDiff refs anchors) =
FileInfo (DList.toList refs) (DList.toList anchors) ignoreCpcInFile

instance Semigroup FileInfoDiff where
FileInfoDiff a b <> FileInfoDiff c d = FileInfoDiff (a <> c) (b <> d)
Expand All @@ -114,13 +118,14 @@ instance Monoid FileInfoDiff where

-- | All information regarding a single file we care about.
data FileInfo = FileInfo
{ _fiReferences :: [Reference]
, _fiAnchors :: [Anchor]
{ _fiReferences :: [Reference]
, _fiAnchors :: [Anchor]
, _fiCopyPasteCheck :: Bool
} deriving stock (Show, Generic)
makeLenses ''FileInfo

instance Default FileInfo where
def = diffToFileInfo mempty
def = diffToFileInfo True mempty

data ScanPolicy
= OnlyTracked
Expand Down
13 changes: 11 additions & 2 deletions src/Xrefcheck/Scan.hs
Original file line number Diff line number Diff line change
Expand Up @@ -117,18 +117,27 @@ data ScanErrorDescription
= LinkErr
| FileErr
| ParagraphErr Text
| LinkErrCpc
| FileErrCpc
| ParagraphErrCpc Text
| UnrecognisedErr Text
deriving stock (Show, Eq)

instance Buildable ScanErrorDescription where
build = \case
LinkErr -> [int||Expected a LINK after "ignore link" annotation|]
LinkErrCpc -> [int||Expected a LINK after "no duplication check in link" annotation|]
FileErr -> [int||Annotation "ignore all" must be at the top of \
markdown or right after comments at the top|]
FileErrCpc -> [int||Annotation "no duplication check in file" must be at the top of \
markdown or right after comments at the top|]
ParagraphErr txt -> [int||Expected a PARAGRAPH after \
"ignore paragraph" annotation, but found #{txt}|]
UnrecognisedErr txt -> [int||Unrecognised option "#{txt}" perhaps you meant \
<"ignore link"|"ignore paragraph"|"ignore all">|]
ParagraphErrCpc txt -> [int||Expected a PARAGRAPH after \
"no duplication check in paragraph" annotation, but found #{txt}|]
UnrecognisedErr txt -> [int||Unrecognised option "#{txt}", perhaps you meant
<"ignore link"|"ignore paragraph"|"ignore all">
or "no duplication check in <link|paragraph|file>"?|]

specificFormatsSupport :: [([Extension], ScanAction)] -> FormatsSupport
specificFormatsSupport formats = \ext -> M.lookup ext formatsMap
Expand Down
Loading

0 comments on commit 8c12e81

Please sign in to comment.