Skip to content

Latest commit

 

History

History
49 lines (39 loc) · 1.44 KB

README.md

File metadata and controls

49 lines (39 loc) · 1.44 KB

WeiboTestCrawling

Crawling post details from Sina Weibo through keyword seraching

Prerequisites Installation

  1. xlrd
  2. xlwt
  3. xlutis
  4. selenium
$ pip install xlwt
$ pip install xlrd
$ pip install xlutis
$ pip install selenium

Procedure

1. Insert Username and Password for Sina Weibo account in line 177-178

    username = "[email protected]" #你的微博登录名
    password = "#########" #你的密码

2. Set file path for the crawled data output in line 180

    book_name_xls = "weibo_test.xls" #填写你想存放excel的路径,没有文件会自动创建

Note: Output file will be in .xls Excel file. If file does not exist, a new .xls file will be produce. If file does exist, output data will be written into the sheet which its name is same as current keyword search, if not, a new sheet will be produced.

3. Set Keywords for search results in line 183

    keywords = ["can", "high", "worry", "case"] #输入你想要的关键字,建议有超话的话加上##,如果结果较少,不加#

Keywords can be a list of keywords.

4. Run testCrawlMyself.py

Issue

1. elems cannot be found on webpage

Sometime selenium webdriver was not able to find or select the button control. Data Crawling is forced to stop. (without implement exception handling yet)

2. comment cannot be crawled

As title

  • Arthor: By TKT for FYP testing purpose
  • Resource & Reference: To be update