Node.js framework that scrapes and crawls webpages.
Using npm:
npm install yukikaki
You can import Yukikaki using require
:
(async () => {
const yukikaki = await new require("yukikaki");
})();
Or with import
:
(async () => {
import Yukikaki from "yukikaki";
const yukikaki = await new Yukikaki;
})();
Bool
Optional. Default is true. If false, starts crawling in headful mode.
Scrapes data from webpages according to options
and runs options.func
on every webpage it crawls. You can use these properties or add your own properties for use by options.func
.
String
The URL to start crawling from.
Function
.scrape()
will run options.func
on every webpage it crawls. .scrape()
will input the following values into options.func
:
options
You can change this value's properties inside of options.func
, except for options.func
and options.url
.
Note: options.i
will be decremented based on how many links or sources away the page is from the starting page.
res
<HTTPResponse>
Puppeteer response from the current page.
page
<Page>
Puppeteer page of the current page.
Int
Optional. Default is 1. Determines when to stop archiving trees of links and sources. If options.i
> 1, options.hrefs will automatically be set to true.
Bool
Optional. Default is true. Scrape sources of the current page.
Bool
Optional. If true, scrape links, links of links, so on, stemming from the current page. It will stop when options.i is depleted. Will automatically be set to true if options.i
> 1.
Bool
Optional. If true, only scrape pages in accordance with robots.txt.
String
Optional. The user agent to use for robots.txt.
Bool
Optional. Default is true. Crawl pages that are neutral according to robots.txt.
Bool
Optional. Default is true. Crawl links and sources even if the current page is not compatible with robots.txt.
Copyright (c) Moogamouth 2022