Pre-requisites

Python 2.7 with following packages: nltk, numpy, scikit-learn, gensim, html2text, beautifulsoup, Levenshtein, mysql.
crfsuite
MySQL database

API Recognition

Preliminaries

A text file need to be converted into a CoNLL file, which is widely used in Natural Language Processing (NLP).

We provide a tool written by python (texttoconll.py) to convert text file into CoNLL file:

python texttoconll.py input.txt output.conll

Data

The folder apidoc： API Inventory
The folder data： the results of word representation using Brown cluster and word embedding.
Experimental data for EMSE paper
- the folder api_recog contains the following files:
  - train_all.conll: training data with manual label
  - test_*.all: testing data

Usage

extract features and convert a CoNLL file into an input file for crfsuite

python enner.py bc-ce < api_recog/train_all.conll > api_recog/train_all.data

python enner.py bc-ce < api_recog/test_all.conll > api_recog/test_all.data

learn a model using crfsuite

crfsuite learn -m model api_recog/train_all.data

use the trained model to test data

crfsuite tag -m model -qt api_recog/test_all.data

API Linking

Usage

Given a post from Stack Overflow, we first crawle the content of the whole post web page including question, answers, comments, tags; then we use our API recogntion tool to identify API entities from the crawled text (exclude code fragment); Finally, we link the idetified APIs to API documentations.

See the corresponding python file apilink.py, you can run this file using following command:

python apilink.py post_id output_file

Data

the folder mysql contains two sql files, which are the API documents that we crawle from internet. There are four libraries: matplotlib, numpy, pandas, matplotlib. You need setup the database use the following steps:

create a database schema link_api
import the two sql files into the database
change your database username and password in python file apilink.py

experimental data:

experiment.xlxs in the folder api_link (60, 30, 30 records for Pandas, Numpy, and Matplotlib, respectively)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
api_link		api_link
api_recog		api_recog
apidoc		apidoc
data		data
mysql		mysql
.gitignore		.gitignore
README.md		README.md
api_recog.sh		api_recog.sh
apilink.py		apilink.py
crfutils.py		crfutils.py
dbimpl.py		dbimpl.py
emoticons.py		emoticons.py
enner.py		enner.py
html2txt.py		html2txt.py
model_all		model_all
mytokenizer.py		mytokenizer.py
myutil.py		myutil.py
sentencesplit.py		sentencesplit.py
ssplit.py		ssplit.py
sspostproc.py		sspostproc.py
texttoconll.py		texttoconll.py
twokenize.py		twokenize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre-requisites

API Recognition

Preliminaries

Data

Usage

API Linking

Usage

Data

About

Releases

Packages

Languages

baolingfeng/APIExing

Folders and files

Latest commit

History

Repository files navigation

Pre-requisites

API Recognition

Preliminaries

Data

Usage

API Linking

Usage

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages