Skip to content

Commit

Permalink
Merge pull request #607 from dipu-bd/dev
Browse files Browse the repository at this point in the history
Version 2.23.2
  • Loading branch information
dipu-bd authored Sep 20, 2020
2 parents 3999332 + 3261a42 commit 5ce52ee
Show file tree
Hide file tree
Showing 11 changed files with 910 additions and 22 deletions.
53 changes: 32 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,26 +19,28 @@ An app to download novels from online sources and generate e-books.
## Table of contents

- [Installation](#a-installation)
- [⏬ Standalone Bundle (Windows, Linux)](#a1-standalone-bundle-windows-linux)
- [📦 PIP (Windows, Mac, and Linux)](#a2-pip-windows-mac-and-linux)
- [📱 Termux (Android)](#a3-termux-android)
- [Chatbots](#a4-chatbots)
- [Discord](#a41-discord)
- [Telegram](#a42-telegram)
- [Run from source](#a5-run-from-source)
- [Heroku Deployment](#a6-heroku-deployment)
- [General Usage](#b-general-usage)
- [Available options](#b1-available-options)
- [Example usage](#b2-example-usage)
- [Running the bot](#b3-running-the-bot)
- [Development](#c-development)
- [Adding new source](#c1-adding-new-source)
- [Adding new Bot](#c2-adding-new-bot)
- [Supported sources](#c3-supported-sources)
- [Rejected sources](#c4-rejected-sources)
- [Supported output formats](#c5-supported-output-formats)
- [Supported bots](#c6-supported-bots)
- [Lightnovel Crawler ![pip package](https://pypi.org/project/lightnovel-crawler) [![download win](https://img.shields.io/badge/%E2%A7%AA-lncrawl.exe-red)](https://rebrand.ly/lncrawl) [![download linux](<https://img.shields.io/badge/%E2%A7%AD-lncrawl%20(linux)-brown>)](https://rebrand.ly/lncrawl-linux)](#lightnovel-crawler-img-srchttpsimgshieldsiobadgef09f93a6-pip-blue-altpip-package-img-srchttpsimgshieldsiobadgee2a7aa-lncrawlexe-red-altdownload-win-img-srchttpsimgshieldsiobadgee2a7ad-lncrawl20linux-brown-altdownload-linux)
- [Table of contents](#table-of-contents)
- [(A) Installation](#a-installation)
- [A1. Standalone Bundle (Windows, Linux)](#a1-standalone-bundle-windows-linux)
- [A2. PIP (Windows, Mac, and Linux)](#a2-pip-windows-mac-and-linux)
- [A3. Termux (Android)](#a3-termux-android)
- [A4. Chatbots](#a4-chatbots)
- [A4.1 Discord](#a41-discord)
- [A4.2 Telegram](#a42-telegram)
- [A5. Run from source](#a5-run-from-source)
- [A6. Heroku Deployment](#a6-heroku-deployment)
- [(B) General Usage](#b-general-usage)
- [B1. Available options](#b1-available-options)
- [B2. Example Usage](#b2-example-usage)
- [B3. Running the bot](#b3-running-the-bot)
- [(C) Development](#c-development)
- [C1. Adding new source](#c1-adding-new-source)
- [C2. Adding new Bot](#c2-adding-new-bot)
- [C3. Supported sources](#c3-supported-sources)
- [C4. Rejected sources](#c4-rejected-sources)
- [C5. Supported output formats](#c5-supported-output-formats)
- [C6. Supported bots](#c6-supported-bots)

<a href="https://github.com/dipu-bd/lightnovel-crawler"><img src="res/lncrawl-icon.png" width="128px" align="right"/></a>

Expand All @@ -52,7 +54,7 @@ Without it, you will only get output in epub, text, and web formats.

### A1. Standalone Bundle (Windows, Linux)

**Windows**: [lightnovel-crawler v2.23.1 ~ 23MB](https://rebrand.ly/lncrawl)
**Windows**: [lightnovel-crawler v2.23.2 ~ 23MB](https://rebrand.ly/lncrawl)

> In Windows 8, 10 or later versions, it might say that `lncrawl.exe` is not safe to dowload or execute. You should bypass/ignore this security check to execute this program.
Expand Down Expand Up @@ -301,6 +303,8 @@ You are very welcome to contribute in this project. You can:
| https://4scanlation.xyz | | |
| https://9kqw.com | ✔ | |
| https://anythingnovel.com | | |
| https://arangscans.com | | |
| https://asadatranslations.com | ✔ | |
| https://automtl.wordpress.com | | |
| https://babelnovel.com | ✔ | |
| https://bestlightnovel.com | ✔ | |
Expand Down Expand Up @@ -342,6 +346,7 @@ You are very welcome to contribute in this project. You can:
| https://m.wuxiaworld.co | ✔ | |
| https://mangatoon.mobi | | |
| https://meionovel.id | ✔ | |
| https://moonstonetranslation.com | | |
| https://myoniyonitranslations.com | | |
| https://mysticalmerries.com | ✔ | |
| https://novel27.com | ✔ | |
Expand All @@ -354,6 +359,7 @@ You are very welcome to contribute in this project. You can:
| https://pery.info/ | ✔ | |
| https://ranobelib.me | | |
| https://readwebnovels.net | ✔ | |
| https://reincarnationpalace.com | | |
| https://rewayat.club | | |
| https://shalvationtranslations.wordpress.com | | |
| https://tomotranslations.com | | |
Expand All @@ -366,14 +372,17 @@ You are very welcome to contribute in this project. You can:
| https://webnovelonline.com | | |
| https://woopread.com | ✔ | |
| https://wordexcerpt.com | ✔ | |
| https://writerupdates.com | | |
| https://wuxiaworld.io | ✔ | |
| https://wuxiaworld.live | ✔ | |
| https://wuxiaworld.online | ✔ | |
| https://wuxiaworld.site | | |
| https://www.aixdzs.com | | |
| https://www.asianhobbyist.com | | |
| https://www.centinni.com | ✔ | |
| https://www.daocaorenshuwu.com | | |
| https://www.fuyuneko.org | | |
| https://www.f-w-o.com | ✔ | |
| https://www.idqidian.us | | |
| https://www.lightnovelworld.com | ✔ | |
| https://www.machine-translation.org | ✔ | |
Expand All @@ -382,6 +391,7 @@ You are very welcome to contribute in this project. You can:
| https://www.novelall.com | ✔ | |
| https://www.novelcool.com | | |
| https://www.novelhall.com | | |
| https://www.novelhunters.com | ✔ | |
| https://www.novelringan.com | | |
| https://www.novelspread.com | | |
| https://www.novelupdates.cc | | |
Expand All @@ -399,6 +409,7 @@ You are very welcome to contribute in this project. You can:
| https://www.virlyce.com | | |
| https://www.wattpad.com | | |
| https://www.webnovel.com | ✔ | |
| https://www.webnovelover.com | ✔ | |
| https://www.worldnovel.online | ✔ | |
| https://www.wuxialeague.com | | |
| https://www.wuxiaworld.co | ✔ | |
Expand Down
2 changes: 1 addition & 1 deletion lncrawl/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.23.1
2.23.2
86 changes: 86 additions & 0 deletions lncrawl/sources/arangscans.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# -*- coding: utf-8 -*-
import json
import logging
import re
from urllib.parse import urlparse
from ..utils.crawler import Crawler

logger = logging.getLogger('ARANG_SCANS')
search_url = 'https://arangscans.com/?s=%s&post_type=wp-manga'


class ArangScans(Crawler):
base_url = 'https://arangscans.com/'

# FIXME: Can't seem to get search to work not showing up when running command "lncrawl -q "Rooftop Sword Master" --sources"
# def search_novel(self, query):
# query = query.lower().replace(' ', '+')
# soup = self.get_soup(search_url % query)

# results = []
# for tab in soup.select('.c-tabs-item__content'):
# a = tab.select_one('.post-title h3 a')
# latest = tab.select_one('.latest-chap .chapter a').text
# votes = tab.select_one('.rating .total_votes').text
# results.append({
# 'title': a.text.strip(),
# 'url': self.absolute_url(a['href']),
# 'info': '%s | Rating: %s' % (latest, votes),
# })
# # end for

# return results
# # end def

def read_novel_info(self):
'''Get novel title, autor, cover etc'''
logger.debug('Visiting %s', self.novel_url)
soup = self.get_soup(self.novel_url)

possible_title = soup.select_one('.post-title h1')
for span in possible_title.select('span'):
span.extract()
# end for
self.novel_title = possible_title.text.strip()
logger.info('Novel title: %s', self.novel_title)

self.novel_cover = self.absolute_url(
soup.select_one('.summary_image a img')['src'])
logger.info('Novel cover: %s', self.novel_cover)

self.novel_author = ' '.join([
a.text.strip()
for a in soup.select('.author-content a[href*="manga-author"]')
])
logger.info('%s', self.novel_author)

volumes = set()
chapters = soup.select('ul.main li.wp-manga-chapter a')
for a in reversed(chapters):
chap_id = len(self.chapters) + 1
vol_id = (chap_id - 1) // 100 + 1
volumes.add(vol_id)
self.chapters.append({
'id': chap_id,
'volume': vol_id,
'url': self.absolute_url(a['href']),
'title': a.text.strip() or ('Chapter %d' % chap_id),
})
# end for

self.volumes = [{'id': x} for x in volumes]
# end def

def download_chapter_body(self, chapter):
'''Download body of a single chapter and return as clean html format.'''
logger.info('Downloading %s', chapter['url'])
soup = self.get_soup(chapter['url'])

contents = soup.select_one('div.text-left')
for bad in contents.select('h3, .code-block, script, .adsbygoogle'):
bad.decompose()

body = self.extract_contents(contents)
return '<p>' + '</p><p>'.join(body) + '</p>'
# end def
# end class
102 changes: 102 additions & 0 deletions lncrawl/sources/asadatrans.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# -*- coding: utf-8 -*-
import json
import logging
import re
from ..utils.crawler import Crawler

logger = logging.getLogger(__name__)
search_url = 'https://asadatranslations.com/?s=%s&post_type=wp-manga&author=&artist=&release='


class AsadaTranslations(Crawler):
base_url = 'https://asadatranslations.com/'

def search_novel(self, query):
query = query.lower().replace(' ', '+')
soup = self.get_soup(search_url % query)

results = []
for tab in soup.select('.c-tabs-item__content'):
a = tab.select_one('.post-title h3 a')
latest = tab.select_one('.latest-chap .chapter a').text
votes = tab.select_one('.rating .total_votes').text
results.append({
'title': a.text.strip(),
'url': self.absolute_url(a['href']),
'info': '%s | Rating: %s' % (latest, votes),
})
# end for

return results
# end def

def read_novel_info(self):
'''Get novel title, autor, cover etc'''
logger.debug('Visiting %s', self.novel_url)
soup = self.get_soup(self.novel_url)

possible_title = soup.select_one('.post-title h1')
for span in possible_title.select('span'):
span.extract()
# end for
self.novel_title = possible_title.text.strip()
logger.info('Novel title: %s', self.novel_title)

# NOTE: Site doesn't have book covers.
# self.novel_cover = self.absolute_url(
# soup.select_one('.summary_image a img')['src'])
# logger.info('Novel cover: %s', self.novel_cover)

self.novel_author = ' '.join([
a.text.strip()
for a in soup.select('.author-content a[href*="manga-author"]')
])
logger.info('%s', self.novel_author)

volumes = set()
chapters = soup.select('ul.main li.wp-manga-chapter a')
for a in reversed(chapters):
chap_id = len(self.chapters) + 1
vol_id = (chap_id - 1) // 100 + 1
volumes.add(vol_id)
self.chapters.append({
'id': chap_id,
'volume': vol_id,
'url': self.absolute_url(a['href']),
'title': a.text.strip() or ('Chapter %d' % chap_id),
})
# end for

self.volumes = [{'id': x} for x in volumes]
# end def

def download_chapter_body(self, chapter):
'''Download body of a single chapter and return as clean html format.'''
logger.info('Downloading %s', chapter['url'])
soup = self.get_soup(chapter['url'])

contents = soup.select_one('div.text-left')
for bad in contents.select('h3, .code-block, script, .adsbygoogle, .sharedaddy'):
bad.decompose()

# remove bad text
self.blacklist_patterns = [
r'^Translator:',
r'^Qii',
r'^Editor:',
r'^Maralynx',
r'^Translator and Editor Notes:',
r'^Support this novel on',
r'^NU',
r'^by submitting reviews and ratings or by adding it to your reading list.',
]

for discord in contents.select('p'):
for bad in ["Join our", "<a>discord</a>", "to get latest updates and progress about the translations"]:
if bad in discord.text:
discord.decompose()

body = self.extract_contents(contents)
return '<p>' + '</p><p>'.join(body) + '</p>'
# end def
# end class
Loading

0 comments on commit 5ce52ee

Please sign in to comment.