Skip to content

Code for "Syntax-based Contextual Visualizations for LLM Interpretability"

Notifications You must be signed in to change notification settings

rkique/syntax-sae

Repository files navigation

Syntax-based Contextual Visualizations for SAE & LLM Interpretability

Project Overview

This project aims to improve interpretability measures by developing a new visualization method for Sparse Autoencoder (SAE) feature contexts. Specifically, we proposed using syntactic dependencies to illuminate similarities between contexts.

Project Features

We were able to develop three novel views for activation contexts, two of which utilize syntactic dependency structures and one which uses branching trees. We use the SpaCy dependency parser and sentence tagger on the backend. These new views are meant to supplement activation context lists, e.g. those developed by Anthropic:

Anthropic text contexts

The joint view shows individual contexts side by side. You can enable part of speech tagging or view inactive tokens through the top panel:

Joint view with syntactic contexts

The merged view aggregates commonly occurring contexts and displays them in a branching format. These trees are instantiated as list structures and subtree matches are located where possible.

Merged view with linear trees

The updated merged view simplifies the presentation to primarily consider cooccurrence information, giving an overall picture of relevant contexts for a feature. It fixes the issues with overlap encountered earlier:

Updated merged view with even spacing

Acknowledgements

This work was done for David Laidlaw's CSCI2370: Interdisciplinary Scientific Visualization class.

About

Code for "Syntax-based Contextual Visualizations for LLM Interpretability"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published