Crawler crawl data crawl_app_news.py can be used as a template. tokenize data use Baidu API to get pos and tokenization. json to cols