Skip to content

This text Analytics framework is built on top of other open source packages, it provides a flexible and extensible way to extract Entities, Topics, Categories, Sentiments, and Keywords from unstructured text regardless of its source.

Notifications You must be signed in to change notification settings

SubhasisDutta/Text-Analysis

Repository files navigation

Overview

DisKoveror is a Text Analytics framework developed by Serendio. Built on top of other open source packages, it provides a flexible and extensible way to extract Entities, Topics, Categories, Sentiments, and Keywords from unstructured text regardless of its source. The key advantage of DisKoveror over the numerous open source options is it provides access to the best-of-breed components through a plug and play approach and a unified programming interface. DisKoveror has also improved the output quality, in some cases, through Training sets, domain specific ontology, and folksonomy.

Demo

App Link : http://demo.serendio.com:3031/

Blog Post

Link : http://subhasisproject.blogspot.com/2015/07/text-analysis-using-stanford-nlp-and.html

DisKoveror has been used to mine brand sentiments from social media, understand customer satisfaction from emails, extract topics from Tweets, compute social influence score, auto-categorize legal documents and much more.

DisKoveror can be accessed through Command Line API, Java API or a RESTful interface.

License: Apache 2.0

Key Functionalities

System Architecture

The architecture of the system is as given below.

System Architecture

DisKoveror supports Java APIs and a RESTful interface.

#####DisKoveror leverages the open source modules shown below:

######Name Entity extraction

######Sentiment extraction

######Topic extraction

######Keyword extraction

###Getting Started

Software Requirements
  • JDK (Version 7 or above)
  • Maven (Apache Maven 3.0.5 or above)
  • Thrift server (Apache Thrift 0.9.2)
  • Python (version 2.7.X)
  • Pip (version 7.1.X)
Workspace to Download

DisKoveror-ta

Starting Thrift servers for Sentiment and Topics in DisKoveror-ta

The requirements.txt file specifies the software packages along with their versions to be installed. Execute the below command to install all python related dependencies for the Sentiment and Topics.

/text-analysis/sentiment-topic-keyword-server/python$ sudo pip install -r requirements.txt

Start the thrift servers for Topics and Sentiments

/text-analysis/sentiment-topic-keyword-server/python$ python server.py
Compiling DisKoveror TA Engine

To package it in a single executable jar for distribution (.jar file), the following command has to be run from the command line.

 /diskoveror-ta$ mvn package dependency:copy-dependencies clean
The diskoveror-ta package could be utilized by any of the below provided methods

About

This text Analytics framework is built on top of other open source packages, it provides a flexible and extensible way to extract Entities, Topics, Categories, Sentiments, and Keywords from unstructured text regardless of its source.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published