DRAT Statistics

DRAT (Distributed Release Audit Tool) Statistics

What

This is a simple utility, written in Python, which uses DRAT to scan multiple code repositories sequentially, collect statistics and dumps into both, Apache Solr ("statistics" core) and user defined directory.

What Statistics

Crawl start time
Crawl end time
Index start time
Index end time
Mapper start time
Mapper end time
Reducer start time
Reducer end time
Notes (count from RatAggregator)
Binaries (count from RatAggregator)
Archives (count from RatAggregator)
Standards (count from RatAggregator)
Apache (count from RatAggregator)
Generated (count from RatAggregator)
Unknown (count from RatAggregator)
Mimetypes (count from "drat" core by doing a facet on "mimetype")

All license types are stored as "license_*" and mimetypes as "mime_*"

Why

As we know that DRAT runs on single code repository and generates the output. But what if we have a large number of repositories to be scanned and record their individual statistics. This utility can be leveraged to such large-scale tasks. The Solr core gives the advantage to understand and visualize the statistics through amazing function and facet queries.

How To Use

Set the following environment variables:

DRAT_HOME - (eg: ~/drat/deploy)
JAVA_HOME - (where your Java resides. Same what you have for DRAT installation)
OPSUI_URL - (eg: http://localhost:8080/opsui)
SOLR_URL - (eg: http://localhost:8080/solr)
WORKFLOW_URL - (eg: http://localhost:9001)

Run the script as below:

python dratstats.py <path to list of repository URLs> <path to output directory>

The details are as below:

Path to a flat file containing a list of repositories to traverse. Each line in the file represents the absolute path to one source code repository. Eg: the entries below provide examples of paths referencing Apache Tika and Apache Nutch codebases on a local file system.

/apacheSvn/tika ApacheTika http://github.com/apache/tika.git The digital babel fish.
/apacheSvn/nutch ApacheNutch http://github.com/apache/nutch.git The open source web crawler.

A sample repos.txt file is available.

Path to the output directory where the contents of ${DRAT_HOME}/data will be copied to, for each repository. Each folder in the output directory follow standard naming conventions i.e.
- Remove the first character i.e. ‘/’
- All ‘/’ will be replaced with ‘_’
- And it will be appended with the current timestamp. Example - An output directory of ‘/apacheSvn/tika’ repository can be written as apacheSvn_tika_2016-01-15T23:14:39Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAT Statistics

DRAT (Distributed Release Audit Tool) Statistics

Clone this wiki locally