Skip to content

Latest commit

 

History

History
43 lines (24 loc) · 4.26 KB

README.md

File metadata and controls

43 lines (24 loc) · 4.26 KB

Mailing List Community Analysis Tool (MLCAT)

Build Status Maintainability codecov Dependency Status

Parsing mailing lists to detect communication patterns.

Welcome to the Mailing List Community Analysis Tool! The main objective of this project is to develop a mailing list parser to extract information from a mailing list such as senders,receivers, time stamps etc and then use this to construct an organizational (or communication) structure like conversation thread hypergraphs for further analysis. We utilize social network analysis techniques to examine the relationships between authors on common mailing lists like LKML, Sakai etc. Study of these interactions on different levels helps us in deriving the local and global communication patterns between users on different threads. This project draws its inspiration from various fields such as data mining, graph theory, information retrieval and inferential modelling in order to form predictive models that help in understanding certain intricate characteristics of a social network. This involves analyzing graphs with, say, authors as nodes and their correspondences as edges, to study the details about various network graph properties such as density, size, node centrality, degree, connectedness etc.

Vertex Clustering

Usage

The library's working has been modularised into many modules namely input, analysis and util. The driver file presents a brief overview of how one can use the Mailing List Community Analysis Tool for parsing and analysis.

Some of the visualisable sample outputs can be seen here, categorized into various datasets, giving an idea about this project's capabilites and as to what kind of insights we can and we plan to derive from the datasets.

Documentation

Mailing List Community Analysis Tool uses Sphinx Python Documentation Generator for generating documentation of the library. The documentation is setup to work with Google Style Docstrings which eases the documentation writing process.

The documentation can be found under 'docs/' whose updation has been made automatic by the a bash script which otherwise requires to run make html in the 'docs/' directory after every commit.

You can view the documentation hosted on gh-pages here.

Testing Framework

Presently, Mailing List Community Analysis Tool has various end-to-end tests implemented which reside in the test directory. We use py.test as the unit test framework and you can get a comprehensive idea of the various use cases of each of the modules from the corresponding tests.

Conversation Characteristics

Installation

Mailing List Community Analysis Tool depends on various third-party libraries which are listed in requirements.txt. The libraries in requirements.txt along with other dependencies are installed using install_package_dependencies.sh Run cd etc && bash install_package_dependencies.sh in the project root directory to install these dependencies.

License

Mailing List Community Analysis Tool is available under the GPL 3.0 License