- Python 2.7 with following packages: nltk, numpy, scikit-learn, gensim, html2text, beautifulsoup, Levenshtein, mysql.
- crfsuite
- MySQL database
A text file need to be converted into a CoNLL file, which is widely used in Natural Language Processing (NLP).
We provide a tool written by python (texttoconll.py) to convert text file into CoNLL file:
python texttoconll.py input.txt output.conll
-
The folder apidoc: API Inventory
-
The folder data: the results of word representation using Brown cluster and word embedding.
-
Experimental data for EMSE paper
- the folder api_recog contains the following files:
- train_all.conll: training data with manual label
- test_*.all: testing data
- the folder api_recog contains the following files:
- extract features and convert a CoNLL file into an input file for crfsuite
python enner.py bc-ce < api_recog/train_all.conll > api_recog/train_all.data
python enner.py bc-ce < api_recog/test_all.conll > api_recog/test_all.data
- learn a model using crfsuite
crfsuite learn -m model api_recog/train_all.data
- use the trained model to test data
crfsuite tag -m model -qt api_recog/test_all.data
Given a post from Stack Overflow, we first crawle the content of the whole post web page including question, answers, comments, tags; then we use our API recogntion tool to identify API entities from the crawled text (exclude code fragment); Finally, we link the idetified APIs to API documentations.
See the corresponding python file apilink.py, you can run this file using following command:
python apilink.py post_id output_file
- the folder mysql contains two sql files, which are the API documents that we crawle from internet. There are four libraries: matplotlib, numpy, pandas, matplotlib. You need setup the database use the following steps:
- create a database schema link_api
- import the two sql files into the database
- change your database username and password in python file apilink.py
- experimental data:
- experiment.xlxs in the folder api_link (60, 30, 30 records for Pandas, Numpy, and Matplotlib, respectively)