Here is the link to FreeTxt tool which is currently under development.
Below are the summary of the features currently available:
-
Data input feature: At the moment, the tool manages two modes of input:
a. Use example data: These are a collection of example data files in different formats (.xlsx, .txt and .tsv) that are mostly for test and demo purposes.
b. Upload data file: This feature allows users to upload their data in any of the formats above.
In both cases, each file can be up to 200mb in size and multiple files upload is also allowed and can be processed simultaneously. For better memory and processing efficiency, users can select the sections of the data (i.e. columns) they wish to visualise or work with. Outputs from multiple files can be viewed in dynamically managed tabs.
-
Data Visualizer: This is one of the key features of the tool and has three core components:
a. Data View: This allows the user to display and visualize the selected columns from the data file they wish to look at. The user can also dynamically modify the selection or the order of the columns as they wish before performing any other task on the selected columns
b. Word Cloud: This creates a word cloud from the content of the selected columns. It also allows the user to select the column(s) to build the word cloud from as well as the word cloud type – i.e. 'All words', 'Bigrams', 'Trigrams', '4-grams', 'Nouns', 'Proper nouns', 'Verbs', 'Adjectives', 'Adverbs', 'Numbers'
c. Key word in Context and Collocation: This extracts the keywords in the review text from the selected columns as well as the contexts within which they appeared in the text allowing the user to adjust the context window. It also shows the collocated words with the selected keywords
-
Text Summarizer: This tool, adapted from the Welsh Summarization project, produces a basic extractive summary of the review text from the selected columns.
-
Sentiment Analyzer: This feature performs sentiment classification on reviews from selected column(s) and displays a pie chart to visualize the output
-
POS and Sematic Tagger : This feature uses the PyMUSAS pipeline on Spacy to generate and display POS (CyTag) tags as well as semantic (USAS) tags. It currently works on the Ucrel-freetxt-VM as setting up Docker on the Streamlit cloud is a bit complex.
-
Language Identification: We have implemented a basic language identification feature which can easily detect whether the text is written in English or Welsh.
- This work with all the accompanying resources is licensed under a Creative Commons Attribution 4.0 International License.