Skip to content

Commit

Permalink
chore(release): 2.0.5 [skip ci]
Browse files Browse the repository at this point in the history
## [2.0.5](v2.0.4...v2.0.5) (2021-11-15)

### Bug Fixes

* export utilities and add managed jsdom example ([132038b](132038b))
  • Loading branch information
semantic-release-bot committed Nov 15, 2021
1 parent 1100792 commit 8e3eca6
Show file tree
Hide file tree
Showing 13 changed files with 129 additions and 38 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## [2.0.5](https://github.com/armand1m/papercut/compare/v2.0.4...v2.0.5) (2021-11-15)


### Bug Fixes

* export utilities and add managed jsdom example ([132038b](https://github.com/armand1m/papercut/commit/132038bd46bf6386b168967925f0cadf8a906241))

## [2.0.4](https://github.com/armand1m/papercut/compare/v2.0.3...v2.0.4) (2021-11-15)


Expand Down
75 changes: 74 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,10 @@ const main = async () => {
baseUrl: "https://news.ycombinator.com/",
target: ".athing",
selectors: {
rank: ({ text }) => text('.rank'),
rank: (utils) => {
const value = utils.text('.rank').replace(/^\D+/g, '');
return Number(value);
},
name: ({ text }) => text('.titlelink'),
url: ({ href }) => href('.titlelink'),
score: ({ element }) => {
Expand Down Expand Up @@ -229,6 +232,76 @@ Then run it using `node` or `ts-node`:
npx ts-node ./paginated-scraper.ts
```

#### Managed JSDOM

In case you want to use your own JSDOM and Pino instance and tweak/configure as much as you prefer, you can use the `scrape` function instead.

In the example below, we use the exposed `createWindow` and `fetchPage` utilities for convenience. You can use JSDOM constructor directly and any other strategy to fetch your page HTML as desired.

```ts file=./examples/typescript/src/managed-jsdom/scraper.ts
import pino from 'pino'
import { scrape, fetchPage, createWindow } from '@armand1m/papercut';

const main = async () => {
const logger = pino({
name: 'Hacker News',
enabled: false
});

const rawHTML = await fetchPage('https://news.ycombinator.com/')
const window = createWindow(rawHTML);

const results = await scrape({
strict: true,
logger,
document: window.document,
target: ".athing",
selectors: {
rank: (utils) => {
const value = utils.text('.rank').replace(/^\D+/g, '');
return Number(value);
},
name: ({ text }) => text('.titlelink'),
url: ({ href }) => href('.titlelink'),
score: ({ element }) => {
return element.nextElementSibling?.querySelector('.score')
?.textContent;
},
createdBy: ({ element }) => {
return element.nextElementSibling?.querySelector('.hnuser')
?.textContent;
},
createdAt: ({ element }) => {
return element.nextElementSibling
?.querySelector('.age')
?.getAttribute('title');
},
},
options: {
log: false,
cache: true,
concurrency: {
page: 2,
node: 2,
selector: 2
}
}
});

window.close();

console.log(JSON.stringify(results, null, 2));
};

main();
```

Then run it using `node` or `ts-node`:

```sh
npx ts-node ./managed-jsdom.ts
```

## API Reference

[Click here to open the API reference.](https://armand1m.github.io/papercut)
Expand Down
20 changes: 10 additions & 10 deletions docs/assets/highlight.css
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@
--dark-hl-5: #569CD6;
--light-hl-6: #0070C1;
--dark-hl-6: #4FC1FF;
--light-hl-7: #267F99;
--dark-hl-7: #4EC9B0;
--light-hl-8: #098658;
--dark-hl-8: #B5CEA8;
--light-hl-9: #811F3F;
--dark-hl-9: #D16969;
--light-hl-10: #000000;
--dark-hl-10: #D7BA7D;
--light-hl-11: #EE0000;
--dark-hl-11: #DCDCAA;
--light-hl-7: #811F3F;
--dark-hl-7: #D16969;
--light-hl-8: #EE0000;
--dark-hl-8: #DCDCAA;
--light-hl-9: #000000;
--dark-hl-9: #D7BA7D;
--light-hl-10: #267F99;
--dark-hl-10: #4EC9B0;
--light-hl-11: #098658;
--dark-hl-11: #B5CEA8;
--light-hl-12: #008000;
--dark-hl-12: #6A9955;
--light-code-background: #FFFFFF;
Expand Down
2 changes: 1 addition & 1 deletion docs/assets/search.js

Large diffs are not rendered by default.

19 changes: 15 additions & 4 deletions docs/index.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/interfaces/CreateRunnerProps.html
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!DOCTYPE html><html class="default"><head><meta charSet="utf-8"/><meta http-equiv="x-ua-compatible" content="IE=edge"/><title>CreateRunnerProps | @armand1m/papercut</title><meta name="description" content="Documentation for @armand1m/papercut"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="stylesheet" href="../assets/style.css"/><link rel="stylesheet" href="../assets/highlight.css"/><script async src="../assets/search.js" id="search-script"></script></head><body><script>document.body.classList.add(localStorage.getItem("tsd-theme") || "os")</script><header><div class="tsd-page-toolbar"><div class="container"><div class="table-wrap"><div class="table-cell" id="tsd-search" data-base=".."><div class="field"><label for="tsd-search-field" class="tsd-widget search no-caption">Search</label><input type="text" id="tsd-search-field"/></div><ul class="results"><li class="state loading">Preparing search index...</li><li class="state failure">The search index is not available</li></ul><a href="../index.html" class="title">@armand1m/papercut</a></div><div class="table-cell" id="tsd-widgets"><div id="tsd-filter"><a href="#" class="tsd-widget options no-caption" data-toggle="options">Options</a><div class="tsd-filter-group"><div class="tsd-select" id="tsd-filter-visibility"><span class="tsd-select-label">All</span><ul class="tsd-select-list"><li data-value="public">Public</li><li data-value="protected">Public/Protected</li><li data-value="private" class="selected">All</li></ul></div> <input type="checkbox" id="tsd-filter-inherited" checked/><label class="tsd-widget" for="tsd-filter-inherited">Inherited</label><input type="checkbox" id="tsd-filter-externals" checked/><label class="tsd-widget" for="tsd-filter-externals">Externals</label></div></div><a href="#" class="tsd-widget menu no-caption" data-toggle="menu">Menu</a></div></div></div></div><div class="tsd-page-title"><div class="container"><ul class="tsd-breadcrumb"><li><a href="../modules.html">@armand1m/papercut</a></li><li><a href="CreateRunnerProps.html">CreateRunnerProps</a></li></ul><h1>Interface CreateRunnerProps</h1></div></div></header><div class="container container-main"><div class="row"><div class="col-8 col-content"><section class="tsd-panel tsd-hierarchy"><h3>Hierarchy</h3><ul class="tsd-hierarchy"><li><span class="target">CreateRunnerProps</span></li></ul></section><section class="tsd-panel-group tsd-index-group"><h2>Index</h2><section class="tsd-panel tsd-index-panel"><div class="tsd-index-content"><section class="tsd-index-section "><h3>Properties</h3><ul class="tsd-index-list"><li class="tsd-kind-property tsd-parent-kind-interface"><a href="CreateRunnerProps.html#logger" class="tsd-kind-icon">logger</a></li><li class="tsd-kind-property tsd-parent-kind-interface"><a href="CreateRunnerProps.html#options" class="tsd-kind-icon">options</a></li></ul></section></div></section></section><section class="tsd-panel-group tsd-member-group "><h2>Properties</h2><section class="tsd-panel tsd-member tsd-kind-property tsd-parent-kind-interface"><a id="logger" class="tsd-anchor"></a><h3>logger</h3><div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Logger</span></div><aside class="tsd-sources"><ul><li>Defined in <a href="https://github.com/armand1m/papercut/blob/32146cd/src/scraper/createRunner.ts#L68">scraper/createRunner.ts:68</a></li></ul></aside><div class="tsd-comment tsd-typography"><div class="lead">
<!DOCTYPE html><html class="default"><head><meta charSet="utf-8"/><meta http-equiv="x-ua-compatible" content="IE=edge"/><title>CreateRunnerProps | @armand1m/papercut</title><meta name="description" content="Documentation for @armand1m/papercut"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="stylesheet" href="../assets/style.css"/><link rel="stylesheet" href="../assets/highlight.css"/><script async src="../assets/search.js" id="search-script"></script></head><body><script>document.body.classList.add(localStorage.getItem("tsd-theme") || "os")</script><header><div class="tsd-page-toolbar"><div class="container"><div class="table-wrap"><div class="table-cell" id="tsd-search" data-base=".."><div class="field"><label for="tsd-search-field" class="tsd-widget search no-caption">Search</label><input type="text" id="tsd-search-field"/></div><ul class="results"><li class="state loading">Preparing search index...</li><li class="state failure">The search index is not available</li></ul><a href="../index.html" class="title">@armand1m/papercut</a></div><div class="table-cell" id="tsd-widgets"><div id="tsd-filter"><a href="#" class="tsd-widget options no-caption" data-toggle="options">Options</a><div class="tsd-filter-group"><div class="tsd-select" id="tsd-filter-visibility"><span class="tsd-select-label">All</span><ul class="tsd-select-list"><li data-value="public">Public</li><li data-value="protected">Public/Protected</li><li data-value="private" class="selected">All</li></ul></div> <input type="checkbox" id="tsd-filter-inherited" checked/><label class="tsd-widget" for="tsd-filter-inherited">Inherited</label><input type="checkbox" id="tsd-filter-externals" checked/><label class="tsd-widget" for="tsd-filter-externals">Externals</label></div></div><a href="#" class="tsd-widget menu no-caption" data-toggle="menu">Menu</a></div></div></div></div><div class="tsd-page-title"><div class="container"><ul class="tsd-breadcrumb"><li><a href="../modules.html">@armand1m/papercut</a></li><li><a href="CreateRunnerProps.html">CreateRunnerProps</a></li></ul><h1>Interface CreateRunnerProps</h1></div></div></header><div class="container container-main"><div class="row"><div class="col-8 col-content"><section class="tsd-panel tsd-hierarchy"><h3>Hierarchy</h3><ul class="tsd-hierarchy"><li><span class="target">CreateRunnerProps</span></li></ul></section><section class="tsd-panel-group tsd-index-group"><h2>Index</h2><section class="tsd-panel tsd-index-panel"><div class="tsd-index-content"><section class="tsd-index-section "><h3>Properties</h3><ul class="tsd-index-list"><li class="tsd-kind-property tsd-parent-kind-interface"><a href="CreateRunnerProps.html#logger" class="tsd-kind-icon">logger</a></li><li class="tsd-kind-property tsd-parent-kind-interface"><a href="CreateRunnerProps.html#options" class="tsd-kind-icon">options</a></li></ul></section></div></section></section><section class="tsd-panel-group tsd-member-group "><h2>Properties</h2><section class="tsd-panel tsd-member tsd-kind-property tsd-parent-kind-interface"><a id="logger" class="tsd-anchor"></a><h3>logger</h3><div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Logger</span></div><aside class="tsd-sources"><ul><li>Defined in <a href="https://github.com/armand1m/papercut/blob/1100792/src/scraper/createRunner.ts#L68">scraper/createRunner.ts:68</a></li></ul></aside><div class="tsd-comment tsd-typography"><div class="lead">
<p>A pino.Logger instance.</p>
</div></div></section><section class="tsd-panel tsd-member tsd-kind-property tsd-parent-kind-interface"><a id="options" class="tsd-anchor"></a><h3>options</h3><div class="tsd-signature tsd-kind-icon">options<span class="tsd-signature-symbol">:</span> <a href="ScraperOptions.html" class="tsd-signature-type" data-tsd-kind="Interface">ScraperOptions</a></div><aside class="tsd-sources"><ul><li>Defined in <a href="https://github.com/armand1m/papercut/blob/32146cd/src/scraper/createRunner.ts#L73">scraper/createRunner.ts:73</a></li></ul></aside><div class="tsd-comment tsd-typography"><div class="lead">
</div></div></section><section class="tsd-panel tsd-member tsd-kind-property tsd-parent-kind-interface"><a id="options" class="tsd-anchor"></a><h3>options</h3><div class="tsd-signature tsd-kind-icon">options<span class="tsd-signature-symbol">:</span> <a href="ScraperOptions.html" class="tsd-signature-type" data-tsd-kind="Interface">ScraperOptions</a></div><aside class="tsd-sources"><ul><li>Defined in <a href="https://github.com/armand1m/papercut/blob/1100792/src/scraper/createRunner.ts#L73">scraper/createRunner.ts:73</a></li></ul></aside><div class="tsd-comment tsd-typography"><div class="lead">
<p>The scraper options.
Use this to tweak log, cache and concurrency settings.</p>
</div></div></section></section></div><div class="col-4 col-menu menu-sticky-wrap menu-highlight"><nav class="tsd-navigation primary"><ul><li class=""><a href="../modules.html">Exports</a></li></ul></nav><nav class="tsd-navigation secondary menu-sticky"><ul><li class="current tsd-kind-interface"><a href="CreateRunnerProps.html" class="tsd-kind-icon">Create<wbr/>Runner<wbr/>Props</a><ul><li class="tsd-kind-property tsd-parent-kind-interface"><a href="CreateRunnerProps.html#logger" class="tsd-kind-icon">logger</a></li><li class="tsd-kind-property tsd-parent-kind-interface"><a href="CreateRunnerProps.html#options" class="tsd-kind-icon">options</a></li></ul></li></ul></nav></div></div></div><footer class="with-border-bottom"><div class="container"><h2>Legend</h2><div class="tsd-legend-group"><ul class="tsd-legend"><li class="tsd-kind-property tsd-parent-kind-interface"><span class="tsd-kind-icon">Property</span></li></ul></div><h2>Settings</h2><p>Theme <select id="theme"><option value="os">OS</option><option value="light">Light</option><option value="dark">Dark</option></select></p></div></footer><div class="container tsd-generator"><p>Generated using <a href="https://typedoc.org/" target="_blank">TypeDoc</a></p></div><div class="overlay"></div><script src="../assets/main.js"></script></body></html>
Loading

0 comments on commit 8e3eca6

Please sign in to comment.