Automatic-Text-Summarizer

Author- Ayush Pareek

Access the publication here: https://www.aclweb.org/anthology/W/W16/W16-63.pdf#page=157

This is an implementation of "Automatic Consensus-Based Text Summarizer" along with text-organizing capabilities that can generate genre-specific, generic or user-configured summaries of a large amount of unorganized text. We are currently using a number of independent text-mining algorithms based on different statistical models to compute the summaries and combining them using configurable consensus techniques.

#How to Use

**STEP 1::**Put the text you want to summarize in "raw.txt"(sample text is already provided). Compile using g++ 5.1 and execute.
**STEP 2::**You will be asked to assign fraction ratio to each algorithm. Give that based on your requirements.
**STEP 3::**You will be asked the number of lines in which you want in each summary. Give as any as you want.(based on Compression Ratio)
**STEP 4::**O/P will be displayed on the screen as well as on "summary.txt".

Requirements

g++ 5.1 Compiler. (or maybe a few versions below would work as well)

Frequently Asked Questions

Q1. How to track the progress of the program to check its correctness?

CHECK 1: You will see each sentence being separated into its own text file numbered as "position of sentence.txt". Eg; 5.txt refers to the text file containing only the 5th sentence of the original text
CHECK 2: It has been made sure that pseudo full-stops (Eg; between acronyms like M.B.B.S. and after salutations like Mr. Mrs. Dr. Ms.) don't break the sentences. Put contents of 'raw2.txt' in 'raw.txt' to check the same
CHECK 3: Various text files will be formed. Their description is provided::
- post_formatting.txt:: GIVES each document after removing symbols and numbers and converting ' '(space) to '\n'(next line) character. This is done by the Format() function in the code.
- post_stemming.txt:: Gives each document after passing it through the Portal stemmer.(SEE:: PreStem() and stemfile() function)
- post_sorting.txt:: After lexicographical Sorting , only unique words are printed in the documents.
- post_stopword_filtering.txt:: Sorted words without the stop words
CHECK 4: post_stemming & post_formatting are two folders which are not supposed to be deleted. They display each sentence file after doing the operation suggested by their names
CHECK 5: "matrix.txt" shows a WORD v/s SENTENCE MATRIX
CHECK 6: "coefficient_matrix.txt" displays an N*N matrix where each element i.e. c(i,j) is the Pearson Correlation Coefficient of Sentence i and Sentence j. You can see that whenever (i == j) i.e. for diagonal elements ==> c(i,j) == 1
CHECK 7:Finally the summary is displayed in summary.txt

#Paper accepted in The Thirteenth International Conference on Natural Language Processing

Name		Name	Last commit message	Last commit date
Latest commit History 1,229 Commits
summarizer_ayush_v1.3		summarizer_ayush_v1.3
IEEE_Abstract_2015_ayush.docx		IEEE_Abstract_2015_ayush.docx
README.md		README.md
abridged_project_ppt_ayush.pptx		abridged_project_ppt_ayush.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-Text-Summarizer

Author- Ayush Pareek

Access the publication here: https://www.aclweb.org/anthology/W/W16/W16-63.pdf#page=157

Requirements

Frequently Asked Questions

Q1. How to track the progress of the program to check its correctness?

About

Releases

Packages

Contributors 3

Languages

ayushoriginal/Consensus-Based-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Automatic-Text-Summarizer

Author- Ayush Pareek

Access the publication here: https://www.aclweb.org/anthology/W/W16/W16-63.pdf#page=157

Requirements

Frequently Asked Questions

Q1. How to track the progress of the program to check its correctness?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages