Skip to content
This repository has been archived by the owner on Dec 24, 2023. It is now read-only.

Latest commit

 

History

History
49 lines (24 loc) · 1.03 KB

README.md

File metadata and controls

49 lines (24 loc) · 1.03 KB

COURSE COMPLETED. ARCHIVED.


NLP

Project files for NLP proj of Fundamentals of Data Science 2022 spring, NJU.

This project is published under GPL v3 protocol.

WARNING! Please REMOVE files in dir "output" before commit, or it will exceed capacity limit of github.

Project Author

CybCom & Zhou

Preparation

ML&DL

Coursera: Machine Learning for basic issues https://www.coursera.org/learn/machine-learning

国立台湾大学:李宏毅机器学习 for BERT https://speech.ee.ntu.edu.tw/~hylee/ml/2021-spring.php

CS224n for Natural Language Processing, including word2vec http://web.stanford.edu/class/cs224n/index.html

Web Crawler

https://www.zhihu.com/question/20899988

http://c.biancheng.net/python_spider/what-is-spider.html

https://zhuanlan.zhihu.com/p/73742321

Structure

Data Source

Given sheet for training.

Web crawler from gov website cluster

Data Process

Preliminary filtering with logical judgment and string similarity.

Use word2vec with CNN for second classification .