Description

This utility generates a list of topics that relate to a given term, based on a search on a given list of greek news sites. The list can be found here: s3://rawlabs-public-test-data/competition/pp/topic-modelling/rss-feeds.csv Feel free to add new RSS feeds, but this may cause instability..

The search process is the following:

Read all RSS feeds and filter out those with descriptions that do not contain the given search term
Access the actual webpages of the related feeds
Get greek only content
Filter out greek stopwords
Find top-N most frequently used terms

Prerequisites

Access to RAW UI
Knowledge of Greek
Bash with curl and jq installed
Patience

Usage

The generic syntax is the following: curl -v 'https://<hostname>/arsenal/v1/topic-extraction?term=<search_term>&topN=<top_n>' E.g. curl -v 'https://<hostname>/arsenal/v1/topic-extraction?term=ΚΙΝΑΛ&topN=25' | jq .

At the time of writing these lines, the results are the following:

[
  "ελλάδα",
  "χπα",
  "νέα",
  "μετοχών",
  "δεικτών",
  "κιναλ",
  "δόση",
  "χαρτοφυλάκια",
  "ελλαδα",
  "αγοράς",
  "πράξεις",
  "γραφήματα",
  "χα",
  "ισοτιμίες",
  "αεκ",
  "όπως",
  "αναζήτηση",
  "εκλογές",
  "κύπελλο",
  "εγγραφή"
]

Another example is:

http://localhost:54325/arsenal/v1/topic-extraction?term=Ολυμπιακός&topN=20

Results:

[
  "χπα",
  "ελλάδα",
  "δεικτών",
  "μετοχών",
  "νέα",
  "αεκ",
  "κιναλ",
  "δόση",
  "χαρτοφυλάκια",
  "λάρισα",
  "αγοράς",
  "πράξεις",
  "γραφήματα",
  "χα",
  "ισοτιμίες",
  "αναζήτηση",
  "εκλογές",
  "εγγραφή",
  "ελλάδας",
  "ολυμπιακός"
]

TODO

Many improvements such as HTML parsing/processing, stopwords removal and topic generation algorithm. From the user perspective, the most problematic part is backend timeouts, which cannot be controlled by the client. So, feel free to make an initial call, then wait for half a minute and then repeat it. You should be able to see results.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
olympic-games		olympic-games
v1		v1
README.md		README.md
raw-site.yml		raw-site.yml
test-file-zT0YAW		test-file-zT0YAW

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Prerequisites

Usage

TODO

About

Releases

Packages

ppolydoras/e2e-test-repo

Folders and files

Latest commit

History

Repository files navigation

Description

Prerequisites

Usage

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages