Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何进行持久爬取呢 #2

Open
ZQbd opened this issue Aug 27, 2019 · 1 comment
Open

如何进行持久爬取呢 #2

ZQbd opened this issue Aug 27, 2019 · 1 comment

Comments

@ZQbd
Copy link

ZQbd commented Aug 27, 2019

我测试了爬取taptap数据,程序运行后,几分钟就爬取完毕,但是之后taptap又有了新数据,这部分新数据就爬取不到了。只能通过重启程序才能爬到新数据。有什么持久爬取的办法吗

@stanleylsx
Copy link
Owner

将爬取的url的md5记录在redis里面,每次重启爬取的时候做一次碰撞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants