Skip to content

Latest commit

 

History

History
81 lines (62 loc) · 3.82 KB

parsers.md

File metadata and controls

81 lines (62 loc) · 3.82 KB

Parsers

astminer supports multiple parsers for various programming languages. Here we describe the integrated parsers and their peculiarities.

ANTLR

ANTLR provides an infrastructure to generate lexers and parsers for languages based on grammars. For now, astminer supports ANTLR-based parsers for Java, Python, JS, and PHP.

GumTree

GumTree is a framework to work with source code as trees and to compute the differences between the trees in different versions of code. It also builds language-agnostic representations of code. For now, astminer supports GumTree-based parsers for Java and Python.

python-parser

Running GumTree with Python requires python-parser. You can set it up as follows:

  1. Download the sources from GitHub
  2. Install the dependencies
pip install -r requirements.txt
  1. Make the python-parser script executable
chmod +x src/main/python/pythonparser/pythonparser_3.py
  1. Add python-parser to PATH
cp src/main/python/pythonparser/pythonparser_3.py src/main/python/pythonparser/pythonparser
export PATH="<path>/src/main/python/pythonparser/pythonparser:${PATH}"

srcML backend

A lot of languages in gumtree additionally supported with srcML backend, so astminer uses gumtree with srcML as a whole new parser. Running it requires installing srcML: https://www.srcml.org/

If you have any problems with installation check the Dockerfile in the project root

Fuzzy

Originally fuzzyc2cpg, Fuzzy is now part of codepropertygraph. astmineruses it to parse C/C++ code. g++ is required for this parser.

JavaParser

Parser for Java which is used to get trees for Code2seq and Code2vec models, and is also used in many other studies to collect trees and work with them. When working with Javaparser astminer implements an algorithm similar to the algorithm in the JavaExtractor module in the Code2Vec repository to get similar trees.

JavaLang parser

Java parser written in pure python. In order to work with it, you need to install our self-written translator package that will reformat javalang inner AST into json AST that astminer can understand. To install this package simply run in the root of the project:

pip install src/main/python/parse/javalang

Other languages and parsers

Support for a new programming language can be implemented in a few simple steps.

If there is an ANTLR grammar for the language:

  1. Add the corresponding ANTLR4 grammar file to the antlr directory.
  2. Run the generateGrammarSource Gradle task to generate the parser.
  3. Implement a small wrapper around the generated parser. See JavaParser, AntlrJavaParsingResultFactory, and getParsingResultFactory for an example of building such a wrapper and integrating it in the pipeline.

If the language has a parsing tool that is available as a Java library:

  1. Add the library as a dependency in build.gradle.kts.
  2. Implement a wrapper for the parsing tool. See FuzzyCppParser for an example of such a wrapper.