🪞 Generative Reflections

A two-model system for reasonable text generation (via vector scoring)

Overview

"Think before you speak"

This repo demonstrates how to use two language models (LMs) to achieve more lucid and coherent text generations.

...these best scoring texts are given output priority:

...these worst scoring ones get filtered out:

Select a specific domain to build generators for (e.g. Machine Learning Ideas)
Acquire a text corpus for the domain at hand (e.g. aalksii/ml-arxiv-papers)
Fine-tune the Causal-LM on the corpus (e.g. finetune_causal.ipynb)
Fine-tune the Masked-LM on the corpus (e.g. finetune_masked.ipynb)
Acquire and check quality of Masked-LM vectors on the corpus and save them (e.g. vectors.ipynb)
Determine the generation objective of the Causal-LM w.r.t. the embeddings of the Masked-LM; for example:
- Novelty: generated ML idea should be at least 0.05 cosine distance away from any existing idea vector
- Feasibility: generated ML idea should not be too isolated; it should have at least 10 neighbors within 0.1 cosine distance away
Generate texts from Causal-LM; only output those that pass the objective. (e.g. generation.ipynb)

There is potential for Reinforcement Learning-inspired improvements that could be made here.