Skip to content

Commit

Permalink
Merge github.com:seymores/boilerpipe-clj into merge/get-image-by-seym…
Browse files Browse the repository at this point in the history
…ores

ref: cgag#9

* github.com:seymores/boilerpipe-clj:
  Add missing doc
  Added get-images

Signed-off-by: Avelino <[email protected]>
  • Loading branch information
avelino committed Feb 10, 2021
2 parents 06cb797 + e5406eb commit a40a228
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 3 deletions.
2 changes: 1 addition & 1 deletion project.clj
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(defproject run.avelino/boilerpipe-clj "0.3.1"
(defproject run.avelino/boilerpipe-clj "0.3.2"
:description "A simple wrapper around the Boilerpipe library for extracting text from html articles/pages"
:url "https://avelino.run"
:license {:name "Apache License, Version 2.0"
Expand Down
9 changes: 8 additions & 1 deletion src/boilerpipe_clj/core.clj
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
(ns boilerpipe-clj.core
(:require [boilerpipe-clj.extractors :as ext])
(:require [boilerpipe-clj.extractors :as ext]
[clojure.java.io :refer [as-url]])
(:import (de.l3s.boilerpipe.extractors ExtractorBase)))

(defn get-text
Expand All @@ -11,3 +12,9 @@
(get-text source ext/article-extractor))
([^String source ^ExtractorBase extractor]
(.getText extractor source)))

(defn get-images
"Takes the URL of the page and return list of Image"
[^String url]
(.process ext/image-extractor (as-url url) ext/default-extractor))

4 changes: 3 additions & 1 deletion src/boilerpipe_clj/extractors.clj
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
(:import (de.l3s.boilerpipe.extractors
ArticleExtractor
ArticleSentencesExtractor
DefaultExtractor)))
DefaultExtractor)
(de.l3s.boilerpipe.sax ImageExtractor)))

(defonce article-extractor (ArticleExtractor/getInstance))
(defonce default-extractor (DefaultExtractor/getInstance))
(defonce image-extractor (ImageExtractor/INSTANCE))
(defonce article-sentence-extractor
(ArticleSentencesExtractor/getInstance))

0 comments on commit a40a228

Please sign in to comment.