Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

给爬取古诗文的程序增加功能 #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions spiders/spider_gushiwen.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import requests
import re
import time
from docx import Document

HEADERS = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
Expand Down Expand Up @@ -94,11 +95,23 @@ def spider():

time.sleep(1)

# 2.显示数据
# 2.显示数据,并把爬取好的诗词保存到本地
keys_to_print = ['title', 'content']
doc = Document()
for poem in poems:
print(poem)
print("==" * 40)

for i in poem:
for key in keys_to_print:
value = i.get(key)
if value:
paragraph = doc.add_paragraph()
if key == 'title':
paragraph.add_run(f'《{value}》')
elif key == 'content':
paragraph.add_run(f'{value}')

doc.save('D:/output.docx') # 指定保存的位置
print('恭喜!爬取数据完成!')


Expand Down