🤓 ArxivHero

This is a hyper-specialized arxiv summarizer.

The source of inspiration was this video.

A script that uses this library is run daily at 8:00 am (if my server is up and my GPU not otherwise busy) and saves updates here.

🤔 FAQ

How is this different from searching arxiv and reading the abstracts?

The point of this summarizer, is to implement further customization facilities, to allow for a more personalized digest experience.

Eventually the goal is to have some basic paper metrics and better topic modeling based on citations and/or the knowledge of the reader (e.g., I don't need an explanation of transformers, whereas someone else might need one. Or, an electrical engineer may be interested in different types of transformers! (if it's not an electrical engineer working on LLMs...)

How is this different from other summarizers?

There is a similar and more mature paper summarizer called arxivDigest.

There are 2 main differences with the existing repo:

arxiv_hero performs re-ranking (using both embeddings) and topic modeling (using NMF/bag-of-words/tf-idf) and filtering according to the query for the retrieved results,
arxiv_hero performs ontologically configurable summarization, that includes the users' intent.

See, for instance, the Enum classes SummaryFocus and FlatDomainOntology. These enums contain some simple strings, that are used in the prompts to encode the intent of the target user in both the top-level summary and in the abstract summary.

Usage

Currently this contains just a notebook with everything needed (arxiv query manager, topic modeler, topic filtering, re-ranker, embedder, generative model language engine etc).

The lines that actually produce the output are the following (see "results" section in notebook)

interests_query = "llm chatgpt efficient inference"
sr = ArxivCustomRetrieval(topic_modeler=TFIDFNMFTopicModeler(), q_topic_thresh_val=0.5, top_n_relevant=10)
sr.run(interests_query)
d = DocGenerationEngine(sr)
doc = d.make_document()

Here is an example output:

Limmitations/planned extensions

This is at a work-in-progress stage at the moment. Plan is to operationalize this somehow.

It will probably be extended to support more sources and do some more OSINT on the retrieved papers.

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
img		img
notebooks		notebooks
outputs		outputs
src		src
README.md		README.md
arxivhero.py		arxivhero.py
index.html		index.html
requirements.txt		requirements.txt
retriever_demo.ipynb		retriever_demo.ipynb
run_update.sh		run_update.sh
update_website.py		update_website.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤓 ArxivHero

How is this different from searching arxiv and reading the abstracts?

How is this different from other summarizers?

Usage

Limmitations/planned extensions

About

Releases

Packages

Languages

mylonasc/arxiv_llm_assistant

Folders and files

Latest commit

History

Repository files navigation

🤓 ArxivHero

How is this different from searching arxiv and reading the abstracts?

How is this different from other summarizers?

Usage

Limmitations/planned extensions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages