Skip to content

Latest commit

 

History

History
125 lines (106 loc) · 5.69 KB

README.md

File metadata and controls

125 lines (106 loc) · 5.69 KB

Python programming language logo. Scikit Learn logo.

Information Retrieval for Fault Localization Using Latent Semantic Indexing (LSI) and Other Methods

Table of Contents

Introduction

There are a number of approaches used for Fault Localization in potential bug files that use Information Retrieval (IR) methods. Common techniques are the BugLocator IR methods that utilize a ranking system based on direct and indirect linking of potential source file fixes. A well known technique such as BugLocator would be a relevant benchmark IR for comparison against Latent Semantic Indexing (LSI). By comparing evaluation metrics, we were able to analyze performance of these methods. The first approach was broken into two methods (methods 1 and 2) to facilitate a benchmark for the full implementation of BugLocator (method 2) and LSI (method 3).

All methods were trained and tested with the bug reports and source files of Java open source project packages. However, Python was used to pre-process the data, as well as create/train/test the models.

Overall there are three methods that were implemented and evaluated:

  • Method 1: Simplified BugLocator
  • Method 2: Full BugLocator
  • Method 3: Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD)

The pre-processing code up to the Markdown heading "More Pre-processing (Team 7)" in the Jupyter notebook was provided by a course instructor.

Overall, method 2 showed the best performance based on Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) evaluation metric values. Visualization for these results are shown in the screenshots section of this readme document.

Features

  • Pre-processes bug reports (query results) and source files (query results) to train machine learning algorithms..
  • Ranks source files (query results) related to a bug report (query) to find the location of bugs related to the bug report.
  • NumPy style documentation for maintainability and clarity of application.

Launch

Setup

To prepare a dataset for the application to process, follow the "Getting Started" instructions here.

You must use Python 3 to run our notebook once the data has been processed as instructed in the aforementioned "Getting Started" section.

To run application, first install Jupyter Lab, then open a new console and enter:

jupyter lab

This will open a jupyter lab tab in your default browser, in which you can run the application.

Screenshots

MRR Results (Mean Reciprocal Rank)

MRR (Mean Reciprocal Rank) vs. Package.

MRR Difference (method 2 - method 1) vs. Package.

MRR Difference (method 3 - method 1) vs. Package

MAP Results (Mean Average Precision)

MAP (Mean Average Precision) vs. Package.

MAP (Mean Average Precision) Difference (method 2 - method 1) vs. Package.

MAP (Mean Average Precision) Difference (method 3 - method 1) vs. Package

Technologies

Contributors

NikelausM ConnorBritton Philip Rea Joseph Park
Nicolas Mora Connor Britton Philip Rea Joseph Park