a website-crawler library for nodejs
Documentation of the library in a summarized and precise way.
const craw = require('craw');
async function start () {
const result = await craw("https://2lstudios.dev/");
console.log(result.toJSON());
}
start();
Get the content of the website as headers, paragraphs, paragraphs and all the text in general.
Output:
{
text: "....", // String
h1: [], // Array
h2: []. // Array
h3: [], // Array
h4: [], // Array
h5: [], // Array
h6: [], // Array
words: [] // Array
}
Get a list with iframes from the website.
Output:
[...] // Array
Get a list of imports from the website. (like css, favicon and js)
Output:
{
scripts: [ // Array
{
integrity: "...", // String
src: "...", // String
async: ... // Boolean
}
],
styles: [ // Array
{
integrity: "...", // String
href: "...", // String
rel: "..." // String
}
],
favicon: {
type: "...", // String
href: "..." // String
}
}
Get a list of hyperlinks from the website.
Output:
[ // Array
{
url: "...", // String
anchor: "...", // String
rel: [ ... ] // Array of Strings
}
]
Get a list of multimedia elements from the website. (Like images, audios and videos)
Output:
{
audios: [ // Array
{
src: "...", // String
type: "..." // String
}
],
images: [ // Array
{
src: "...", // String
alt: "...", // String
loading: "..." // String
}
],
videos: [ ... ] // Array of strings
}
Get a list of metadata tags from the website.
Output:
{
author: "...", // String
viewport: "...", // String
robots: "...", // String
description: "...", // String
keywords: [], // Array of strings
image: "...", // String (Favicon)
charset: "...", // String
... any other metadata tag like OG or Twitter ...
}
Get the title of the website.
Output:
"..." // String
Run all functions and add the results of each one in the same object.