Pass in multiple lists of extracted keywords to LLM in Representative step #2126

diazdata · 2024-08-20T21:13:02Z

diazdata
Aug 20, 2024

I'd like to experiment with sending multiple lists of keywords created by c-TF-IDF, CountVectorizer, or even keywords made by MaximalMarginalRelevance into the LLM call with a prompt as such:

prompt = """
I have topic that contains the following documents: \n[DOCUMENTS]
The topic is described by the following keywords, generated by different methods:
c-TF-IDF -> [KEYWORDS1]
CountVectorizer -> [KEYWORDS2]
MaximalMarginalRelevance -> [KEYWORDS3]

Based on the above information, can you give a short label of the topic?
"""

I don't see anything specifically within the docs referencing something like this, but is it possible?

Answered by MaartenGr

Aug 21, 2024

This is unfortunately not possible at the moment and only the main representations are given. Do note though that I actually haven't seen use cases where this is needed since LLMs tend to derive the topic labels more from the representative documents rather than the keywords. Moreover, although the keywords are likely to be different between c-TF-IDF and for for instance MMR, they will still be quite similar and contain overlap in keywords.

Having said that, it would indeed be interesting to test whether there would actually be an effect of having multiple representations. I should note that I am a bit hesitant to implement this until there is clear proof it actually has a positive (and p…

View full answer

MaartenGr · 2024-08-21T06:53:23Z

MaartenGr
Aug 21, 2024
Maintainer

This is unfortunately not possible at the moment and only the main representations are given. Do note though that I actually haven't seen use cases where this is needed since LLMs tend to derive the topic labels more from the representative documents rather than the keywords. Moreover, although the keywords are likely to be different between c-TF-IDF and for for instance MMR, they will still be quite similar and contain overlap in keywords.

Having said that, it would indeed be interesting to test whether there would actually be an effect of having multiple representations. I should note that I am a bit hesitant to implement this until there is clear proof it actually has a positive (and perhaps significant) effect on the resulting labels.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass in multiple lists of extracted keywords to LLM in Representative step #2126

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Pass in multiple lists of extracted keywords to LLM in Representative step #2126

diazdata Aug 20, 2024

Replies: 1 comment

MaartenGr Aug 21, 2024 Maintainer

diazdata
Aug 20, 2024

MaartenGr
Aug 21, 2024
Maintainer