From v2.0.1 anchors to .pdfs are not returned in page_links #231
-
Hi, I love the library , thanks. If take this code sample and run it against I have tried subscriptions and css selectors etc enabled How do I do this post v2? Thanks use spider::{tokio, CaseInsensitiveString};
use spider::website::Website;
#[tokio::main]
async fn main() {
let target = "http://localhost:8900/";
let mut website: Website = Website::new(target)
.with_full_resources(true)
.build().unwrap();
website.crawl().await;
for link in website.get_links().iter() {
println!(
"{}",
link
)
}
} Running it against this <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Index Page</title>
</head>
<body>
<h1>Welcome to the Nginx Dockerized Site</h1>
<ul>
<li><a href="/page1.html">Page 1</a></li>
<li><a href="/page2.html">Page 2</a></li>
<li><a href="/page3.html">Page 3</a></li>
<li><a href="/page4.html">Page 4</a></li>
<li><a href="/page5.html">Page 5</a></li>
<li><a href="/pdfs/sample3.pdf">PDF</a></li>
<li><a href="/pdfs/sample3.xls">xls</a></li>
<li><a href="/pdfs/sample3.htm">htm</a></li>
<li><a href="/pdfs/sample3.pdf">htmpdf</a></li>
</ul>
</body>
</html>
Outputv1.99.37
v2.0.1
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hello is there a reason you are using v2.13.20? Can you try to upgrade to the latest to see if the issue persist? |
Beta Was this translation helpful? Give feedback.
-
fixed via |
Beta Was this translation helpful? Give feedback.
fixed via
v2.13.21