Skip to content

Commit

Permalink
Merge pull request #690 from dipu-bd/dev
Browse files Browse the repository at this point in the history
Version 2.24.0
  • Loading branch information
dipu-bd authored Dec 13, 2020
2 parents d17b462 + 342d90d commit 3d2fb41
Show file tree
Hide file tree
Showing 19 changed files with 94 additions and 76 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.5, 3.6, 3.7, 3.8]
python-version: [3.5, 3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ $ lncrawl
### A3. Termux (Android)

> Mobile platforms are unpredictable. It is not guaranteed that the app will run on all devices.
> It is recommended to use the bots on either Discord or Telegram if you are on mobile.
📱 Using Termux, you can run this app in your android phones too. Follow this instructions:

- Install [Termux](https://play.google.com/store/apps/details?id=com.termux) from playstore.
Expand All @@ -106,9 +109,6 @@ $ lncrawl
- You navigate up using <kbd>Volume UP</kbd> + <kbd>W</kbd> and down using <kbd>Volume UP</kbd> + <kbd>S</kbd>.
- Run `pip install -U lightnovel-crawler` again to install the latest updates.

> Mobile platforms are unpredictable. It is not guaranteed that the app will run on all devices.
> It is recommended to use the bots on either Discord or Telegram if you are on mobile.
### A4. Chatbots

#### A4.1 Discord
Expand Down Expand Up @@ -309,7 +309,6 @@ You are very welcome to contribute in this project. You can:
| https://bestoflightnovels.com | ✔ | |
| https://book.qidian.com | | |
| https://boxnovel.com | ✔ | |
| https://chrysanthemumgarden.com | | |
| https://creativenovels.com | | |
| https://crescentmoon.blog | | |
| https://darktranslation.com | | |
Expand Down Expand Up @@ -420,22 +419,23 @@ You are very welcome to contribute in this project. You can:
### C4. Rejected sources
| Rejected Sources | Reason |
| ----------------------------- | ----------------------------------- |
| http://fullnovel.live | `403 - Forbidden: Access is denied` |
| http://moonbunnycafe.com | `Does not follow uniform format` |
| https://anythingnovel.com | `Site broken` |
| https://indomtl.com | `Does not like to be crawled` |
| https://lnindo.org | `Does not like to be crawled` |
| https://mtled-novels.com | `Domain is expired` |
| https://www.flying-lines.com | `Obfuscated content` |
| https://www.jieruihao.cn | `Unavailable` |
| https://www.noveluniverse.com | `Site is down` |
| https://www.novelupdates.com | `Does not host any novels` |
| https://www.novelv.com | `Site is down` |
| https://www.rebirth.online | `Site moved` |
| http://gravitytales.com | `Redirects to webnovel.com` |
| https://novelplanet.com | `Site is closed` |
| Rejected Sources | Reason |
| ------------------------------- | -------------------------------------------------------------------------------------------------- |
| http://fullnovel.live | `403 - Forbidden: Access is denied` |
| http://gravitytales.com | `Redirects to webnovel.com` |
| http://moonbunnycafe.com | `Does not follow uniform format` |
| https://anythingnovel.com | `Site broken` |
| https://chrysanthemumgarden.com | `Removed on request of the owner` [#649](https://github.com/dipu-bd/lightnovel-crawler/issues/649) |
| https://indomtl.com | `Does not like to be crawled` |
| https://lnindo.org | `Does not like to be crawled` |
| https://mtled-novels.com | `Domain is expired` |
| https://novelplanet.com | `Site is closed` |
| https://www.flying-lines.com | `Obfuscated content` |
| https://www.jieruihao.cn | `Unavailable` |
| https://www.noveluniverse.com | `Site is down` |
| https://www.novelupdates.com | `Does not host any novels` |
| https://www.novelv.com | `Site is down` |
| https://www.rebirth.online | `Site moved` |
### C5. Supported output formats
Expand Down
2 changes: 1 addition & 1 deletion lncrawl/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.23.3
2.24.0
5 changes: 2 additions & 3 deletions lncrawl/bots/console/get_crawler.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
import re

from PyInquirer import prompt
from questionary import prompt

from ...core import display
from ...core.arguments import get_args
Expand Down Expand Up @@ -34,8 +34,7 @@ def get_novel_url(self):
'type': 'input',
'name': 'novel',
'message': 'Enter novel page url or query novel:',
'validate': lambda val: 'Input should not be empty'
if len(val) == 0 else True,
'validate': lambda a: True if a else 'Input should not be empty',
},
])
return answer['novel'].strip()
Expand Down
10 changes: 4 additions & 6 deletions lncrawl/bots/console/login_info.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
from PyInquirer import prompt
from questionary import prompt
from ...core.arguments import get_args


Expand Down Expand Up @@ -29,16 +29,14 @@ def get_login_info(self):
{
'type': 'input',
'name': 'email',
'message': 'Username/Email:',
'validate': lambda val: True if len(val)
else 'Email address should be not be empty'
'message': 'User/Email:',
'validate': lambda a: True if a else 'User/Email should be not be empty'
},
{
'type': 'password',
'name': 'password',
'message': 'Password:',
'validate': lambda val: True if len(val)
else 'Password should be not be empty'
'validate': lambda a: True if a else 'Password should be not be empty'
},
])
return answer['email'], answer['password']
Expand Down
2 changes: 1 addition & 1 deletion lncrawl/bots/console/output_style.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import shutil

from PyInquirer import prompt
from questionary import prompt

from ...binders import available_formats
from ...core.arguments import get_args
Expand Down
33 changes: 20 additions & 13 deletions lncrawl/bots/console/range_selection.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# -*- coding: utf-8 -*-
from PyInquirer import prompt
from prompt_toolkit.styles.style import Style
from questionary import prompt
from questionary.prompts.common import Choice

from ...core.arguments import get_args

Expand Down Expand Up @@ -73,18 +75,25 @@ def validator(val):
# end def
answer = prompt([
{
'type': 'input',
'type': 'autocomplete',
'name': 'start_url',
'message': 'Enter start url:',
'choices': [chap['url'] for chap in self.app.crawler.chapters],
'validate': validator,
},
{
'type': 'input',
'type': 'autocomplete',
'name': 'stop_url',
'message': 'Enter final url:',
'choices': [chap['url'] for chap in self.app.crawler.chapters],
'validate': validator,
},
])
], style=Style([
("selected", "fg:#000000 bold"),
# ("highlighted", "fg:#000000 bold"),
("answer", "fg:#f44336 bold"),
("text", ""),
]))
start_url = answer['start_url']
stop_url = answer['stop_url']
# end if
Expand Down Expand Up @@ -163,15 +172,14 @@ def get_range_from_volumes(self, times=0):
'name': 'volumes',
'message': 'Choose volumes to download:',
'choices': [
{
'name': '%d - %s (Chapter %d-%d) [%d chapters]' % (
Choice(
'%d - %s (Chapter %d-%d) [%d chapters]' % (
vol['id'], vol['title'], vol['start_chapter'],
vol['final_chapter'], vol['chapter_count'])
}
vol['final_chapter'], vol['chapter_count']),
)
for vol in self.app.crawler.volumes
],
'validate': lambda ans: True if len(ans) > 0
else 'You must choose at least one volume.'
'validate': lambda a: True if a else (False, "Select at least one item")
}
])
selected = [int(val.split(' ')[0]) for val in answer['volumes']]
Expand Down Expand Up @@ -205,11 +213,10 @@ def get_range_from_chapters(self, times=0):
'name': 'chapters',
'message': 'Choose chapters to download:',
'choices': [
{'name': '%d - %s' % (chap['id'], chap['title'])}
Choice('%d - %s' % (chap['id'], chap['title']))
for chap in self.app.crawler.chapters
],
'validate': lambda ans: True if len(ans) > 0
else 'You must choose at least one chapter.',
'validate': lambda a: True if a else (False, 'Select at least one chapter.'),
}
])
selected = [
Expand Down
2 changes: 1 addition & 1 deletion lncrawl/bots/console/start.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
from urllib.parse import urlparse

from PyInquirer import prompt
from questionary import prompt

from ...assets.icons import Icons
from ...core import display
Expand Down
2 changes: 2 additions & 0 deletions lncrawl/sources/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from ..utils.crawler import Crawler

rejected_sources = {
'https://chrysanthemumgarden.com/': 'Removed on request of the owner (Issue #649)',
'https://novelplanet.com/': 'Site is closed',
'http://gravitytales.com/': 'Redirects to webnovel.com',
'http://fullnovel.live/': "403 - Forbidden: Access is denied",
Expand All @@ -35,6 +36,7 @@
'https://www.novelv.com/': "Site is down",
'https://www.rebirth.online/': 'Site moved',
'https://mtled-novels.com/': 'Domain is expired',
'https://4scanlation.com/': 'Site is down'
}

# this list will be auto-generated
Expand Down
2 changes: 1 addition & 1 deletion lncrawl/sources/fu_kemao.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
import logging
import re
from base64 import decodestring as b64decode
from base64 import b64decode
from concurrent import futures
from urllib.parse import quote_plus

Expand Down
5 changes: 3 additions & 2 deletions lncrawl/sources/lightnovelonline.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ def download_chapter_body(self, chapter):
'''Download body of a single chapter and return as clean html format.'''
logger.info('Downloading %s', chapter['url'])
soup = self.get_soup(chapter['url'])
body = soup.select('#chapter-content p')
return ''.join([str(p) for p in body if p.text.strip()])
body = soup.select_one('#chapter-content')
return str(body)
# return ''.join([str(p) for p in body if p.text.strip()])
# end def
# end class
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
logger = logging.getLogger(__name__)

novel_search_url = 'https://www.lightnovelworld.com/search?title=%s'
chapter_list_url = 'https://www.lightnovelworld.com/novelchapters/%s?pageNo=%d&chorder=asc'
chapter_list_url = 'https://www.lightnovelworld.com/novel/%s?pageNo=%d&tab=chapters&chorder=asc'


class LightNovelOnline(Crawler):
Expand Down Expand Up @@ -62,8 +62,11 @@ def read_novel_info(self):
'input[name="__RequestVerificationToken"]')['value']
logger.info('Verification token: %s', self.verificationToken)

last_page = soup.select_one('.PagedList-skipToLast a')['href']
page_count = int(re.findall(r'pageNo=(\d+)', last_page)[0])
try:
last_page = soup.select_one('.PagedList-skipToLast a')['href']
page_count = int(re.findall(r'pageNo=(\d+)', last_page)[0])
except:
page_count = 0
logger.info('Total pages: %d', page_count)

logger.info('Getting chapters...')
Expand Down
22 changes: 13 additions & 9 deletions lncrawl/sources/wattpad.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,17 @@


class WattpadCrawler(Crawler):
base_url = 'https://www.wattpad.com/'
base_url = [
'https://www.wattpad.com/',
'https://my.w.tt/',
]

def initialize(self):
self.home_url = self.base_url[0]

def read_novel_info(self):
'''Get novel title, autor, cover etc'''

logger.debug('Visiting %s', self.novel_url)
soup = self.get_soup(self.novel_url)

Expand All @@ -31,23 +38,20 @@ def read_novel_info(self):
chapters = soup.select('ul.table-of-contents a')
# chapters.reverse()

vols = set([])
for a in chapters:
chap_id = len(self.chapters) + 1
vol_id = chap_id//100 + 1
if len(self.chapters) % 100 == 0:
vol_title = 'Volume ' + str(vol_id)
self.volumes.append({
'id': vol_id,
'title': vol_title,
})
# end if
vol_id = len(self.chapters) // 100 + 1
vols.add(vol_id)
self.chapters.append({
'id': chap_id,
'volume': vol_id,
'url': self.absolute_url(a['href']),
'title': a.text.strip() or ('Chapter %d' % chap_id),
})
# end for

self.volumes = [{'id': i} for i in vols]
# end def

def download_chapter_body(self, chapter):
Expand Down
2 changes: 1 addition & 1 deletion lncrawl/sources/worldnovelonline.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def download_chapter_body(self, chapter):
'.post-content'
'.cha-words',
'.cha-content',
'.chapter-fill',
# '.chapter-fill',
'.entry-content.cl',
'#content',
]))
Expand Down
22 changes: 11 additions & 11 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# App requirements
ascii==3.6
win_unicode_console==0.5
python-dotenv==0.13.0
beautifulsoup4==4.9.1
requests==2.23.0
python-slugify==4.0.0
cssutils==1.0.2
PyInquirer==1.0.3
colorama==0.4.1
python-dotenv==0.15.0,<1.0.0
beautifulsoup4>=4.9.0,<5.0.0
requests>=2.20.0,<3.0.0
python-slugify>=4.0.0,<5.0.0
cssutils>=1.0.0,<1.1.0
colorama>=0.4.0<0.5.0
progress==1.5
Js2Py==0.70
EbookLib==0.17.1
pillow==7.2.0
EbookLib>=0.17.0,<1.0.0
pillow>=6.0.0
cloudscraper>=1.2.48
lxml==4.5.1
lxml>=4.0.0,<5.0.0
questionary>=1.6.0

# Bot requirements
discord.py>=1.5.0
python-telegram-bot>=12.8
pydrive==1.3.1
pydrive>=1.2.0
2 changes: 1 addition & 1 deletion scripts/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ rm -rf venv build dist *.egg-info
$PY -m venv venv
. venv/bin/activate

$PIP install -U pip==20.0.2
$PIP install -U pip wheel setuptools
$PIP install -r requirements.txt
$PIP install -r dev-requirements.txt

Expand Down
3 changes: 3 additions & 0 deletions scripts/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@ do
echo "Stopped shard $((i-1)) of $shards shards." &
done
wait

echo "Force stop remaining instances..."
/bin/bash scripts/stop.sh
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ classifiers =
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Topic :: Games/Entertainment
Topic :: Internet :: WWW/HTTP
Topic :: Multimedia :: Graphics
Expand Down
Loading

0 comments on commit 3d2fb41

Please sign in to comment.