Skip to content

georgetown-cset/emerging-tech-topics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identifying Emerging Technology Research

This repository contains materials for identifying research publications relevant to LLM development and chip design or fabrication.

Our approach is to chain two prompts using Gemini 1.5 Flash. We first generate a one-sentence summary of the research described in a publication's title and abstract. Then, we generate a relevance prediction given the summary.

Prompt text can be found in the prompts directory:

In deployment on Google Cloud, we use Vertex batch prediction with BigQuery. The sql directory contains the queries used in our pipeline, and the demo notebook illustrates generation of summaries and classifications for a small set of publications. The sequence looks like:

  • udfs.sql: Define UDFs for wrangling batch prediction inputs and outputs.
  • chip_corpus.sql and llm_corpus.sql: Create tables holding the scholarly literature that we'll generate predictions for, given titles and abstracts.
  • summary_inputs.sql: Create a table of batch prediction requests.
  • Run the first batch job, yielding one-sentence summaries for each publication.
  • classify_inputs.sql: Create a table of batch prediction requests for the classification task.
  • Run the second batch job for predicted relevance.
  • labels.sql: Parse the responses from the classification task to create a table of labels for the input publications.
  • usage.sql: Estimate pipeline costs based on the input and output token counts contained in response metadata.

Two other repositories hold related materials. For identifying publications relevant to AI, CV, NLP, robotics, and cybersecurity, see here. For field of study classification, see here.

About

Identifying LLM & chip design/fabrication research

Resources

Stars

Watchers

Forks