Skip to content

Implementation of language model for parallel n-gram extraction from large text corpora

Notifications You must be signed in to change notification settings

romanyshyn-natalia/Word_prediction_system_based_on_n-Gram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project name: Words prediction system based on N-gram Language Model

Semester project of Architecture of Computer System course

Project description

Our project's aim is to implement autosuggestion system based on n-gram language model. Firstly, the user uploads the text corpus to train the probabilistic model. Next, after n-grams are extracted and their probabilities are calculated, the user starts inputing text and our interface suggests possible endings of the phrase.

Moreover, there are statistical data on the corpus uploaded, such as the most popular n-grams.

We have implemented three algorithms: the simple one, which works with strings (words) themselves, the one that works with string's hashes and the one that uses database. For our final release, we chose the best one.

Requirements

To run the program:

  • QT5
  • CMake 3.17
  • C++ standard 17
  • Boost
  • OpenMP

Usage

  1. Clone the repository and cd into its folder.
git clone https://github.com/romanyshyn-natalia/Word_prediction_system_based_on_n-Gram.git
cd Word_prediction_system_based_on_n-Gram
  1. To run the application, build the project.
mkdir build; cd build
cmake ..
cmake --build .
  1. Run the application.
./n_gram  # you should stay in build directory
  1. Have fun! >^-^<

Contributors

About

Implementation of language model for parallel n-gram extraction from large text corpora

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published