Skip to content

Commit

Permalink
Merge pull request #231 from serokell/aeqz/#211-case_insensitive_anchors
Browse files Browse the repository at this point in the history
[#211] Case insensitive anchors
  • Loading branch information
aeqz authored Dec 13, 2022
2 parents 2b9bf25 + dd52970 commit 50e4e3b
Show file tree
Hide file tree
Showing 13 changed files with 170 additions and 21 deletions.
3 changes: 3 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ Unreleased
* [#233](https://github.com/serokell/xrefcheck/pull/233)
+ Now xrefxcheck does not follow redirect links by default. It fails for permanent
redirect responses (i.e. 301 and 308) and passes for temporary ones (i.e. 302, 303, 307).
* [#231](https://github.com/serokell/xrefcheck/pull/231)
+ Anchor analysis takes now into account the appropriate case-sensitivity depending on
the configured Markdown flavour.

0.2.2
==========
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Run `stack install` to build everything and install the executable.
If you wish to use `cabal`, you need to run [`stack2cabal`](https://hackage.haskell.org/package/stack2cabal) first!

### Run on Windows [](#xrefcheck)

On Windows, executable requires some dynamic libraries (DLLs).
They are shipped together with executable in [releases page](https://github.com/serokell/xrefcheck/releases).
If you have built executable from source using `stack install`,
Expand Down Expand Up @@ -135,7 +136,7 @@ There are several ways to fix this:
* If you wish to ignore all http/ftp links, you can use `--mode local-only`.

1. How does `xrefcheck` handle links that require authentication?
* It's common for projects to contains links to protected resources.
* It's common for projects to contain links to protected resources.
By default, when `xrefcheck` attempts to verify a link and is faced with a `403 Forbidden` or a `401 Unauthorized`, it assumes the link is valid.
* This behavior can be disabled by setting `ignoreAuthFailures: false` in the config file.

Expand All @@ -160,7 +161,7 @@ There are several ways to fix this:
Its features include duplicated URLs detection, specifying allowed HTTP error codes and reporting generation.
At the moment of writing, it scans only external references and checking anchors is not possible.
* [remark-validate-links](https://github.com/remarkjs/remark-validate-links) and [remark-lint-no-dead-urls](https://github.com/davidtheclark/remark-lint-no-dead-urls) - highly configurable JavaScript solution for checking local and external links respectively.
It is able to check multiple repositores at once if they are gathered in one folder.
It is able to check multiple repositories at once if they are gathered in one folder.
Doesn't handle "429 Too Many Requests", so false positives are likely when you have many links to the same domain.
* [markdown-link-check](https://github.com/tcort/markdown-link-check) - another checker written in JavaScript, scans one specific file at a time.
Supports `mailto:` link resolution.
Expand Down
5 changes: 5 additions & 0 deletions package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,11 @@ ghc-options:
- -Wno-prepositive-qualified-module
- -Wno-monomorphism-restriction

# This option avoids a warning on case-insensitive systems:
# https://github.com/haskell/cabal/issues/4739
# https://github.com/commercialhaskell/stack/issues/3918
- -optP-Wno-nonportable-include-path

dependencies:
- base >=4.14.3.0 && <5

Expand Down
16 changes: 8 additions & 8 deletions src/Xrefcheck/Core.hs
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ import Control.Lens (makeLenses)
import Data.Aeson (FromJSON (..), withText)
import Data.Char (isAlphaNum)
import Data.Char qualified as C
import Data.Default (Default (..))
import Data.DList (DList)
import Data.DList qualified as DList
import Data.Default (Default (..))
import Data.List qualified as L
import Data.Reflection (Given)
import Data.Text qualified as T
Expand All @@ -40,15 +40,15 @@ import Xrefcheck.Util
data Flavor
= GitHub
| GitLab
deriving stock (Show)
deriving stock (Show, Enum, Bounded)

allFlavors :: [Flavor]
allFlavors = [GitHub, GitLab]
where
_exhaustivenessCheck = \case
GitHub -> ()
GitLab -> ()
-- if you update this, also update the list above
allFlavors = [minBound .. maxBound]

-- | Whether anchors are case-sensitive for a given Markdown flavour or not.
caseInsensitiveAnchors :: Flavor -> Bool
caseInsensitiveAnchors GitHub = True
caseInsensitiveAnchors GitLab = False

instance FromJSON Flavor where
parseJSON = withText "flavor" $ \txt ->
Expand Down
12 changes: 6 additions & 6 deletions src/Xrefcheck/Progress.hs
Original file line number Diff line number Diff line change
Expand Up @@ -137,18 +137,18 @@ showProgress name width col posixTime Progress{..} = mconcat
, status
]
where
-- | Each of the following values represents the number of the progress bar cells
-- Each of the following values represents the number of the progress bar cells
-- corresponding to the respective "class" of processed references: the valid ones,
-- the ones containing an unfixable error (a.k.a. the invalid ones), and the ones
-- containing a fixable error.
--
-- The current overall number of proccessed errors.
done = floor $ (pCurrent % pTotal) * fromIntegral @Int @(Ratio Int) width

-- | The current number of the invalid references.
-- The current number of the invalid references.
errsU = ceiling $ (pErrorsUnfixable % pTotal) * fromIntegral @Int @(Ratio Int) width

-- | The current number of (fixable) errors that may be eliminated during further
-- The current number of (fixable) errors that may be eliminated during further
-- verification.
-- Notice!
-- 1. Both this and the previous values use @ceiling@ as the rounding function.
Expand All @@ -160,13 +160,13 @@ showProgress name width col posixTime Progress{..} = mconcat
errsF = min (width - errsU) . ceiling $ (pErrorsFixable % pTotal) *
fromIntegral @Int @(Ratio Int) width

-- | The number of valid references.
-- The number of valid references.
-- The value is bounded from below by 0 to ensure the number never gets negative.
-- This situation is plausible due to the different rounding functions used for each value:
-- @floor@ for the minuend @done@, @ceiling@ for the two subtrahends @errsU@ & @errsF@.
successful = max 0 $ done - errsU - errsF

-- | The remaining number of references to be verified.
-- The remaining number of references to be verified.
remaining = width - successful - errsU - errsF

bar
Expand Down Expand Up @@ -237,7 +237,7 @@ putTextRewrite (Rewrite RewriteCtx{..}) msg = do
atomicModifyIORef' rMaxPrintedSize $ \maxPrinted ->
(max maxPrinted (length msg), ())
where
-- | The maximum possible difference between two progress text representations,
-- The maximum possible difference between two progress text representations,
-- including the timer & the status, is 9 characters. This is a temporary
-- solution to the problem of re-printing a smaller string on top of another
-- that'll leave some of the trailing characters in the original string
Expand Down
20 changes: 15 additions & 5 deletions src/Xrefcheck/Verify.hs
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,13 @@ import URI.ByteString qualified as URIBS
import Control.Exception.Safe (handleAsync, handleJust)
import Data.Bits (toIntegralSized)
import Data.List (lookup)
import Data.Text (toCaseFold)
import Xrefcheck.Config
import Xrefcheck.Core
import Xrefcheck.Orphans ()
import Xrefcheck.Progress
import Xrefcheck.Scan
import Xrefcheck.Scanners.Markdown (MarkdownConfig (mcFlavor))
import Xrefcheck.System
import Xrefcheck.Util

Expand Down Expand Up @@ -596,12 +598,17 @@ verifyReference
checkDeduplicatedAnchorReference file fileAnchors anchor
checkAnchorExists fileAnchors anchor

anchorNameEq =
if caseInsensitiveAnchors . mcFlavor . scMarkdown $ cScanners
then (==) `on` toCaseFold
else (==)

-- Detect a case when original file contains two identical anchors, github
-- has added a suffix to the duplicate, and now the original is referrenced -
-- such links are pretty fragile and we discourage their use despite
-- they are in fact unambiguous.
checkAnchorReferenceAmbiguity file fileAnchors anchor = do
let similarAnchors = filter ((== anchor) . aName) fileAnchors
let similarAnchors = filter (anchorNameEq anchor . aName) fileAnchors
when (length similarAnchors > 1) $
throwError $ AmbiguousAnchorRef file anchor (Exts.fromList similarAnchors)

Expand All @@ -612,13 +619,16 @@ verifyReference
checkAnchorReferenceAmbiguity file fileAnchors origAnchor

checkAnchorExists givenAnchors anchor =
case find ((== anchor) . aName) givenAnchors of
case find (anchorNameEq anchor . aName) givenAnchors of
Just _ -> pass
Nothing ->
let isSimilar = (>= scAnchorSimilarityThreshold cScanners)
similarAnchors =
filter (isSimilar . realToFrac . damerauLevenshteinNorm anchor . aName)
givenAnchors
distance = damerauLevenshteinNorm `on` toCaseFold
similarAnchors = flip filter givenAnchors
$ isSimilar
. realToFrac
. distance anchor
. aName
in throwError $ AnchorDoesNotExist anchor similarAnchors

-- | Parse URI according to RFC 3986 extended by allowing non-encoded
Expand Down
24 changes: 24 additions & 0 deletions tests/golden/check-case-sensitivity/a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<!--
- SPDX-FileCopyrightText: 2022 Serokell <https://serokell.io>
-
- SPDX-License-Identifier: MPL-2.0
-->
# Some header

Some text

# Another header

# <a name="Another-header">Custom header</>

# <a name="UPPERCASE-NAME">Custom header</>

[Mixing case reference](#SomE-HEADer)

[Mixing case reference](#SomE-HEADr)

[Reference as it is](#UPPERCASE-NAME)

[Reference lowered](#uppercase-name)

[Maybe ambiguous reference](#another-header)
23 changes: 23 additions & 0 deletions tests/golden/check-case-sensitivity/check-case-sensitivity.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bats

# SPDX-FileCopyrightText: 2022 Serokell <https://serokell.io>
#
# SPDX-License-Identifier: MPL-2.0

load '../helpers/bats-support/load'
load '../helpers/bats-assert/load'
load '../helpers/bats-file/load'
load '../helpers'


@test "GitHub anchors: check, ambiguous and similar detection is case-insensitive" {
to_temp xrefcheck -c config-github.yaml

assert_diff expected1.gold
}

@test "GitLab anchors: check and ambiguous detection is case-sensitive, but similar detection is not" {
to_temp xrefcheck -c config-gitlab.yaml

assert_diff expected2.gold
}
7 changes: 7 additions & 0 deletions tests/golden/check-case-sensitivity/config-github.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SPDX-FileCopyrightText: 2022 Serokell <https://serokell.io>
#
# SPDX-License-Identifier: Unlicense

scanners:
markdown:
flavor: GitHub
7 changes: 7 additions & 0 deletions tests/golden/check-case-sensitivity/config-gitlab.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SPDX-FileCopyrightText: 2022 Serokell <https://serokell.io>
#
# SPDX-License-Identifier: Unlicense

scanners:
markdown:
flavor: GitLab
30 changes: 30 additions & 0 deletions tests/golden/check-case-sensitivity/expected1.gold
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
=== Invalid references found ===

➥ In file a.md
bad reference (file-local) at src:18:1-36:
- text: "Mixing case reference"
- link: -
- anchor: SomE-HEADr

Anchor 'SomE-HEADr' is not present, did you mean:
- some-header (header I) at src:6:1-13
- another-header (header I) at src:10:1-16
- custom-header (header I) at src:12:1-43
- Another-header (hand made) at src:12:3-27
- custom-header (header I) at src:14:1-43

➥ In file a.md
bad reference (file-local) at src:24:1-44:
- text: "Maybe ambiguous reference"
- link: -
- anchor: another-header

Ambiguous reference to anchor 'another-header'
In file a.md
It could refer to either:
- another-header (header I) at src:10:1-16
- Another-header (hand made) at src:12:3-27
Use of ambiguous anchors is discouraged because the target
can change silently while the document containing it evolves.

Invalid references dumped, 2 in total.
38 changes: 38 additions & 0 deletions tests/golden/check-case-sensitivity/expected2.gold
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
=== Invalid references found ===

➥ In file a.md
bad reference (file-local) at src:16:1-37:
- text: "Mixing case reference"
- link: -
- anchor: SomE-HEADer

Anchor 'SomE-HEADer' is not present, did you mean:
- some-header (header I) at src:6:1-13
- another-header (header I) at src:10:1-16
- custom-header (header I) at src:12:1-43
- Another-header (hand made) at src:12:3-27
- custom-header (header I) at src:14:1-43

➥ In file a.md
bad reference (file-local) at src:18:1-36:
- text: "Mixing case reference"
- link: -
- anchor: SomE-HEADr

Anchor 'SomE-HEADr' is not present, did you mean:
- some-header (header I) at src:6:1-13
- another-header (header I) at src:10:1-16
- custom-header (header I) at src:12:1-43
- Another-header (hand made) at src:12:3-27
- custom-header (header I) at src:14:1-43

➥ In file a.md
bad reference (file-local) at src:22:1-36:
- text: "Reference lowered"
- link: -
- anchor: uppercase-name

Anchor 'uppercase-name' is not present, did you mean:
- UPPERCASE-NAME (hand made) at src:14:3-27

Invalid references dumped, 3 in total.
1 change: 1 addition & 0 deletions tests/golden/check-git/check-git.bats
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ load '../helpers'
@test "Git: not a repo" {
cd $TEST_TEMP_DIR

export LANG=en_US
run xrefcheck

assert_output --partial "fatal: not a git repository"
Expand Down

0 comments on commit 50e4e3b

Please sign in to comment.