Merge pull request #607 from dipu-bd/dev

Version 2.23.2
dipu-bd · Sep 20, 2020 · 5ce52ee · 5ce52ee
2 parents 3999332 + 3261a42
commit 5ce52ee
Show file tree

Hide file tree

Showing 11 changed files with 910 additions and 22 deletions.
diff --git a/README.md b/README.md
@@ -19,26 +19,28 @@ An app to download novels from online sources and generate e-books.
 
 ## Table of contents
 
-- [Installation](#a-installation)
-  - [⏬ Standalone Bundle (Windows, Linux)](#a1-standalone-bundle-windows-linux)
-  - [📦 PIP (Windows, Mac, and Linux)](#a2-pip-windows-mac-and-linux)
-  - [📱 Termux (Android)](#a3-termux-android)
-  - [Chatbots](#a4-chatbots)
-    - [Discord](#a41-discord)
-    - [Telegram](#a42-telegram)
-  - [Run from source](#a5-run-from-source)
-  - [Heroku Deployment](#a6-heroku-deployment)
-- [General Usage](#b-general-usage)
-  - [Available options](#b1-available-options)
-  - [Example usage](#b2-example-usage)
-  - [Running the bot](#b3-running-the-bot)
-- [Development](#c-development)
-  - [Adding new source](#c1-adding-new-source)
-  - [Adding new Bot](#c2-adding-new-bot)
-  - [Supported sources](#c3-supported-sources)
-  - [Rejected sources](#c4-rejected-sources)
-  - [Supported output formats](#c5-supported-output-formats)
-  - [Supported bots](#c6-supported-bots)
+- [Lightnovel Crawler ![pip package](https://pypi.org/project/lightnovel-crawler) [![download win](https://img.shields.io/badge/%E2%A7%AA-lncrawl.exe-red)](https://rebrand.ly/lncrawl) [![download linux](<https://img.shields.io/badge/%E2%A7%AD-lncrawl%20(linux)-brown>)](https://rebrand.ly/lncrawl-linux)](#lightnovel-crawler-img-srchttpsimgshieldsiobadgef09f93a6-pip-blue-altpip-package-img-srchttpsimgshieldsiobadgee2a7aa-lncrawlexe-red-altdownload-win-img-srchttpsimgshieldsiobadgee2a7ad-lncrawl20linux-brown-altdownload-linux)
+  - [Table of contents](#table-of-contents)
+  - [(A) Installation](#a-installation)
+    - [A1. Standalone Bundle (Windows, Linux)](#a1-standalone-bundle-windows-linux)
+    - [A2. PIP (Windows, Mac, and Linux)](#a2-pip-windows-mac-and-linux)
+    - [A3. Termux (Android)](#a3-termux-android)
+    - [A4. Chatbots](#a4-chatbots)
+      - [A4.1 Discord](#a41-discord)
+      - [A4.2 Telegram](#a42-telegram)
+    - [A5. Run from source](#a5-run-from-source)
+    - [A6. Heroku Deployment](#a6-heroku-deployment)
+  - [(B) General Usage](#b-general-usage)
+    - [B1. Available options](#b1-available-options)
+    - [B2. Example Usage](#b2-example-usage)
+    - [B3. Running the bot](#b3-running-the-bot)
+  - [(C) Development](#c-development)
+    - [C1. Adding new source](#c1-adding-new-source)
+    - [C2. Adding new Bot](#c2-adding-new-bot)
+    - [C3. Supported sources](#c3-supported-sources)
+    - [C4. Rejected sources](#c4-rejected-sources)
+    - [C5. Supported output formats](#c5-supported-output-formats)
+    - [C6. Supported bots](#c6-supported-bots)
 
 <a href="https://github.com/dipu-bd/lightnovel-crawler"><img src="res/lncrawl-icon.png" width="128px" align="right"/></a>
 
@@ -52,7 +54,7 @@ Without it, you will only get output in epub, text, and web formats.
 
 ### A1. Standalone Bundle (Windows, Linux)
 
-⏬ **Windows**: [lightnovel-crawler v2.23.1 ~ 23MB](https://rebrand.ly/lncrawl)
+⏬ **Windows**: [lightnovel-crawler v2.23.2 ~ 23MB](https://rebrand.ly/lncrawl)
 
 > In Windows 8, 10 or later versions, it might say that `lncrawl.exe` is not safe to dowload or execute. You should bypass/ignore this security check to execute this program.
 
@@ -301,6 +303,8 @@ You are very welcome to contribute in this project. You can:
 | https://4scanlation.xyz                      |            |           |
 | https://9kqw.com                             |     ✔      |           |
 | https://anythingnovel.com                    |            |           |
+| https://arangscans.com                       |            |           |
+| https://asadatranslations.com                |     ✔      |           |
 | https://automtl.wordpress.com                |            |           |
 | https://babelnovel.com                       |     ✔      |           |
 | https://bestlightnovel.com                   |     ✔      |           |
@@ -342,6 +346,7 @@ You are very welcome to contribute in this project. You can:
 | https://m.wuxiaworld.co                      |     ✔      |           |
 | https://mangatoon.mobi                       |            |           |
 | https://meionovel.id                         |     ✔      |           |
+| https://moonstonetranslation.com             |            |           |
 | https://myoniyonitranslations.com            |            |           |
 | https://mysticalmerries.com                  |     ✔      |           |
 | https://novel27.com                          |     ✔      |           |
@@ -354,6 +359,7 @@ You are very welcome to contribute in this project. You can:
 | https://pery.info/                           |     ✔      |           |
 | https://ranobelib.me                         |            |           |
 | https://readwebnovels.net                    |     ✔      |           |
+| https://reincarnationpalace.com              |            |           |
 | https://rewayat.club                         |            |           |
 | https://shalvationtranslations.wordpress.com |            |           |
 | https://tomotranslations.com                 |            |           |
@@ -366,14 +372,17 @@ You are very welcome to contribute in this project. You can:
 | https://webnovelonline.com                   |            |           |
 | https://woopread.com                         |     ✔      |           |
 | https://wordexcerpt.com                      |     ✔      |           |
+| https://writerupdates.com                    |            |           |
 | https://wuxiaworld.io                        |     ✔      |           |
 | https://wuxiaworld.live                      |     ✔      |           |
 | https://wuxiaworld.online                    |     ✔      |           |
 | https://wuxiaworld.site                      |            |           |
 | https://www.aixdzs.com                       |            |           |
 | https://www.asianhobbyist.com                |            |           |
+| https://www.centinni.com                     |     ✔      |           |
 | https://www.daocaorenshuwu.com               |            |           |
 | https://www.fuyuneko.org                     |            |           |
+| https://www.f-w-o.com                        |     ✔      |           |
 | https://www.idqidian.us                      |            |           |
 | https://www.lightnovelworld.com              |     ✔      |           |
 | https://www.machine-translation.org          |     ✔      |           |
@@ -382,6 +391,7 @@ You are very welcome to contribute in this project. You can:
 | https://www.novelall.com                     |     ✔      |           |
 | https://www.novelcool.com                    |            |           |
 | https://www.novelhall.com                    |            |           |
+| https://www.novelhunters.com                 |     ✔      |           |
 | https://www.novelringan.com                  |            |           |
 | https://www.novelspread.com                  |            |           |
 | https://www.novelupdates.cc                  |            |           |
@@ -399,6 +409,7 @@ You are very welcome to contribute in this project. You can:
 | https://www.virlyce.com                      |            |           |
 | https://www.wattpad.com                      |            |           |
 | https://www.webnovel.com                     |     ✔      |           |
+| https://www.webnovelover.com                 |     ✔      |           |
 | https://www.worldnovel.online                |     ✔      |           |
 | https://www.wuxialeague.com                  |            |           |
 | https://www.wuxiaworld.co                    |     ✔      |           |

diff --git a/lncrawl/VERSION b/lncrawl/VERSION
@@ -1 +1 @@
-2.23.1
+2.23.2
diff --git a/lncrawl/sources/arangscans.py b/lncrawl/sources/arangscans.py
@@ -0,0 +1,86 @@
+# -*- coding: utf-8 -*-
+import json
+import logging
+import re
+from urllib.parse import urlparse
+from ..utils.crawler import Crawler
+
+logger = logging.getLogger('ARANG_SCANS')
+search_url = 'https://arangscans.com/?s=%s&post_type=wp-manga'
+
+
+class ArangScans(Crawler):
+    base_url = 'https://arangscans.com/'
+
+    # FIXME: Can't seem to get search to work not showing up when running command "lncrawl -q "Rooftop Sword Master" --sources"
+    # def search_novel(self, query):
+    #     query = query.lower().replace(' ', '+')
+    #     soup = self.get_soup(search_url % query)
+
+    #     results = []
+    #     for tab in soup.select('.c-tabs-item__content'):
+    #         a = tab.select_one('.post-title h3 a')
+    #         latest = tab.select_one('.latest-chap .chapter a').text
+    #         votes = tab.select_one('.rating .total_votes').text
+    #         results.append({
+    #             'title': a.text.strip(),
+    #             'url': self.absolute_url(a['href']),
+    #             'info': '%s | Rating: %s' % (latest, votes),
+    #         })
+    #     # end for
+
+    #     return results
+    # # end def
+
+    def read_novel_info(self):
+        '''Get novel title, autor, cover etc'''
+        logger.debug('Visiting %s', self.novel_url)
+        soup = self.get_soup(self.novel_url)
+
+        possible_title = soup.select_one('.post-title h1')
+        for span in possible_title.select('span'):
+            span.extract()
+        # end for
+        self.novel_title = possible_title.text.strip()
+        logger.info('Novel title: %s', self.novel_title)
+
+        self.novel_cover = self.absolute_url(
+            soup.select_one('.summary_image a img')['src'])
+        logger.info('Novel cover: %s', self.novel_cover)
+
+        self.novel_author = ' '.join([
+            a.text.strip()
+            for a in soup.select('.author-content a[href*="manga-author"]')
+        ])
+        logger.info('%s', self.novel_author)
+
+        volumes = set()
+        chapters = soup.select('ul.main li.wp-manga-chapter a')
+        for a in reversed(chapters):
+            chap_id = len(self.chapters) + 1
+            vol_id = (chap_id - 1) // 100 + 1
+            volumes.add(vol_id)
+            self.chapters.append({
+                'id': chap_id,
+                'volume': vol_id,
+                'url':  self.absolute_url(a['href']),
+                'title': a.text.strip() or ('Chapter %d' % chap_id),
+            })
+        # end for
+
+        self.volumes = [{'id': x} for x in volumes]
+    # end def
+
+    def download_chapter_body(self, chapter):
+        '''Download body of a single chapter and return as clean html format.'''
+        logger.info('Downloading %s', chapter['url'])
+        soup = self.get_soup(chapter['url'])
+
+        contents = soup.select_one('div.text-left')
+        for bad in contents.select('h3, .code-block, script, .adsbygoogle'):
+            bad.decompose()
+
+        body = self.extract_contents(contents)
+        return '<p>' + '</p><p>'.join(body) + '</p>'
+    # end def
+# end class
diff --git a/lncrawl/sources/asadatrans.py b/lncrawl/sources/asadatrans.py
@@ -0,0 +1,102 @@
+# -*- coding: utf-8 -*-
+import json
+import logging
+import re
+from ..utils.crawler import Crawler
+
+logger = logging.getLogger(__name__)
+search_url = 'https://asadatranslations.com/?s=%s&post_type=wp-manga&author=&artist=&release='
+
+
+class AsadaTranslations(Crawler):
+    base_url = 'https://asadatranslations.com/'
+
+    def search_novel(self, query):
+        query = query.lower().replace(' ', '+')
+        soup = self.get_soup(search_url % query)
+
+        results = []
+        for tab in soup.select('.c-tabs-item__content'):
+            a = tab.select_one('.post-title h3 a')
+            latest = tab.select_one('.latest-chap .chapter a').text
+            votes = tab.select_one('.rating .total_votes').text
+            results.append({
+                'title': a.text.strip(),
+                'url': self.absolute_url(a['href']),
+                'info': '%s | Rating: %s' % (latest, votes),
+            })
+        # end for
+
+        return results
+    # end def
+
+    def read_novel_info(self):
+        '''Get novel title, autor, cover etc'''
+        logger.debug('Visiting %s', self.novel_url)
+        soup = self.get_soup(self.novel_url)
+
+        possible_title = soup.select_one('.post-title h1')
+        for span in possible_title.select('span'):
+            span.extract()
+        # end for
+        self.novel_title = possible_title.text.strip()
+        logger.info('Novel title: %s', self.novel_title)
+
+        # NOTE: Site doesn't have book covers.
+        # self.novel_cover = self.absolute_url(
+        #     soup.select_one('.summary_image a img')['src'])
+        # logger.info('Novel cover: %s', self.novel_cover)
+
+        self.novel_author = ' '.join([
+            a.text.strip()
+            for a in soup.select('.author-content a[href*="manga-author"]')
+        ])
+        logger.info('%s', self.novel_author)
+
+        volumes = set()
+        chapters = soup.select('ul.main li.wp-manga-chapter a')
+        for a in reversed(chapters):
+            chap_id = len(self.chapters) + 1
+            vol_id = (chap_id - 1) // 100 + 1
+            volumes.add(vol_id)
+            self.chapters.append({
+                'id': chap_id,
+                'volume': vol_id,
+                'url':  self.absolute_url(a['href']),
+                'title': a.text.strip() or ('Chapter %d' % chap_id),
+            })
+        # end for
+
+        self.volumes = [{'id': x} for x in volumes]
+    # end def
+
+    def download_chapter_body(self, chapter):
+        '''Download body of a single chapter and return as clean html format.'''
+        logger.info('Downloading %s', chapter['url'])
+        soup = self.get_soup(chapter['url'])
+
+        contents = soup.select_one('div.text-left')
+        for bad in contents.select('h3, .code-block, script, .adsbygoogle, .sharedaddy'):
+            bad.decompose()
+
+        # remove bad text
+        self.blacklist_patterns = [
+            r'^Translator:',
+            r'^Qii',
+            r'^Editor:',
+            r'^Maralynx',
+            r'^Translator and Editor Notes:',
+            r'^Support this novel on',
+            r'^NU',
+            r'^by submitting reviews and ratings or by adding it to your reading list.',
+        ]
+
+        for discord in contents.select('p'):
+            for bad in ["Join our", "<a>discord</a>", "to get latest updates and progress about the translations"]:
+                if bad in discord.text:
+                    discord.decompose()
+
+        body = self.extract_contents(contents)
+        return '<p>' + '</p><p>'.join(body) + '</p>'
+    # end def
+# end class