Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect gitignore #1500

Merged
merged 11 commits into from
Sep 22, 2024
Merged

Conversation

thomas-zahner
Copy link
Member

@thomas-zahner thomas-zahner commented Sep 15, 2024

Closes #1331

With this PR lychee skips files that are ignored (by git or .ignore) and hidden by default. This behaviour can be disabled with the --no-ignore and --hidden flags respectively. This is done by replacing jwalk with ignore.

Files are considered ignored as defined by standard_filters with the exception of hidden which is handled with the separate flag or config option.

@thomas-zahner
Copy link
Member Author

thomas-zahner commented Sep 15, 2024

@mre I've got three questions:

  1. The test_readme_usage_up_to_date test was annoying me a bit sometimes because all my editors trim empty lines when saving README.md. I've updated the test so that lines containing only whitespace are trimmed, so that empty lines are expected. Is this okay?
  2. For now --gitignore is required so that ignored files are skipped. Would it be okay if we switched it around and update the flag to --no-gitignore? This would be a more sensible default in my opinion but also a breaking change.
  3. Currently there is no way to include hidden files. (hardcoded previously as skip_hidden(true) now hidden(true)) Should I add an option (lychee-lib) and flag (lychee-bin) to include hidden files?

@mre
Copy link
Member

mre commented Sep 15, 2024

Very nice!

  1. The test_readme_usage_up_to_date test was annoying me a bit sometimes because all my editors trim empty lines when saving README.md. I've updated the test so that lines containing only whitespace are trimmed, so that empty lines are expected. Is this okay?

Yes, same. Sounds very reasonable to me. Thanks!

  1. For now --gitignore is required so that ignored files are skipped. Would it be okay if we switched it around and update the flag to --no-gitignore? This would be a more sensible default in my opinion but also a breaking change.

Absolutely. I'm all for it. Please go forward and change it if you like. Maybe we could use the --include prefix to align it with the other options. Maybe --include-gitignore or --include-hidden?

  1. Currently there is no way to include hidden files. (hardcoded previously as skip_hidden(true) now hidden(true)) Should I add an option (lychee-lib) and flag (lychee-bin) to include hidden files?

Yes, that sounds sensible to me. 👍

@thomas-zahner
Copy link
Member Author

thomas-zahner commented Sep 20, 2024

@mre Thanks for the answers. The PR is now ready for review.

Just one last thing; clippy complains about too many booleans in the Collector struct. Should I extract the three skip booleans into a separate struct? How would you name that struct?

pub struct Collector {
    basic_auth_extractor: Option<BasicAuthExtractor>,
    skip_missing_inputs: bool,
    skip_ignored: bool,
    skip_hidden: bool,
    include_verbatim: bool,
    use_html5ever: bool,
    base: Option<Base>,
}

@mre
Copy link
Member

mre commented Sep 20, 2024

Awesome. I don't like that lint too much. Maybe we can ignore it for now.

@mre
Copy link
Member

mre commented Sep 20, 2024

Coincidentally, it's the most ignored lint: rust-lang/rust-clippy#5418
(Except for cognitive_complexity, which got removed.)

@thomas-zahner
Copy link
Member Author

Yeah I also felt like this is very picky and currently there are 292 results which is quite a lot in comparison to other lints. Began extracting the three bools but it didn't feel like a benefit, only like more code. So I also prefer disabling the lint here.

@mre
Copy link
Member

mre commented Sep 20, 2024

I don't know why the publish-check is failing. It seems like that's unrelated and not a blocker.

For consistency, I'd propose to rename the options:

  • --no-ignore--include-ignored or --include-gitignore
  • --hidden--include-hidden

Here's the other options so far:

❯❯❯ lychee --help | rg 'exclude|include'                                                                                                                                                         
      --include <INCLUDE>
          URLs to check (supports regex). Has preference over all excludes
      --exclude <EXCLUDE>
      --exclude-file <EXCLUDE_FILE>
          Deprecated; use `--exclude-path` instead
      --exclude-path <EXCLUDE_PATH>
  -E, --exclude-all-private
          Equivalent to `--exclude-private --exclude-link-local --exclude-loopback`
      --exclude-private
      --exclude-link-local
      --exclude-loopback
      --exclude-mail
          Exclude all mail addresses from checking (deprecated; excluded by default)
      --include-mail
      --include-fragments
      --include-verbatim

I'm assuming you probably modeled the existing flags after ripgrep, which is a fine choice. Looking through the list of existing options above, I realize that they all have to do with links, not inputs (with the exception of --exclude-path). So I have no strong opinion on this. We could merge as-is, but I'd like to document our thought process here, so happy for any feedback.

@thomas-zahner
Copy link
Member Author

@mre Yes exactly, I used the same names as ripgrep does as I find them very good.

--include-gitignore too me sounds like I'd include the .gitignore files themself for checking. But that's not the case. As you have mentioned all flags with include or exclude have to do with individual links not files (with one exception). So I prefer the shorter and clearer flags that align with ripgrep.

@thomas-zahner thomas-zahner merged commit 99148af into lycheeverse:master Sep 22, 2024
6 of 7 checks passed
@mre
Copy link
Member

mre commented Sep 22, 2024

Ah, that makes sense. Thanks for the clarification and the awesome PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read .gitignore
2 participants