You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
book_crawler/
scrapy.cfg <-- Configuration file (DO NOT TOUCH!)
tutorial/
__init__.py <-- Empty file that marks this as a Python folder
items.py <-- Model of the item to scrap
middlewares.py <-- Scrapy processing hooks (DO NOT TOUCH)
pipelines.py <-- What to do with the scraped item
settings.py <-- Project settings file
spiders/ <-- Directory of our spiders (empty by now)
__init__.py
python environment = python interpreter + installed packages
pipenv详解
准备(pipenv, 这东西性能太差,已放弃,重回venv)
新建项目
scrapy startproject book_crawler
新建爬虫
scrapy genspider fiction books.toscrape.com
运行
scrapy crawl fiction
保存
在控制台调式
$ scrapy shell 'http://books.toscrape.com/' >>> response.css(...) >>> response.xpath(...)
extract 相当于querySelectorAll, 返回list
extract_first相当于querySelector, 返回第一个匹配的元素
Rererences
The text was updated successfully, but these errors were encountered: