information-measures.html


<!doctype html>
<html lang="en">
<head>
<title>Measures of Information Reflect Memorization Patterns</title>
<meta name="viewport" content="width=device-width,initial-scale=1" />
<meta name="description" content="Measures of Information Reflect Memorization Patterns" />
<meta property="og:title" content="Measures of Information Reflect Memorization Patterns" />
<meta property="og:url" content="https://rachitbansal.github.io/information-measures" />
<!-- <meta property="og:url" content="https://rome.baulab.info/" />
<meta property="og:image" content="https://rome.baulab.info/images/ct-thumb.gif" /> -->
<meta property="og:description" content="Evaluating model generalization using intrinsic, information-theoretic metrics." />
<meta property="og:type" content="website" />
<meta name="twitter:card" content="summary" /> 
<meta name="twitter:title" content="Measures of Information Reflect Memorization Patterns" />
<meta name="twitter:description" content="Cracking open the black box of huge autoregressive transformer neural network language models." />
<meta name="twitter:image" href="/resources/websites/information-measures/paper-thumb.png" />
<!-- <link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/images/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/images/favicon-16x16.png">
<link rel="manifest" href="/site.webmanifest"> -->

<link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-rwoIResjU2yc3z8GV/NPeZWAv56rSmLldC3R/AZzGRnGxQQKnKkoFVhFQhNUwEyJ" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.2.1.min.js" integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" crossorigin="anonymous"></script>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:300,400,700" rel="stylesheet">
<link href="/resources/websites/style.css" rel="stylesheet">

<style>
.relatedthumb {
  float:left; width: 200px; margin: 3px 10px 7px 0;
}
.relatedblock {
  clear: both;
  display: inline-block;
}
.bold-sc {
  font-variant: small-caps;
  font-weight: bold;
}
.cite, .citegroup {
  margin-bottom: 8px;
}
:target {
  background-color: yellow;
}
</style>
<script async src="https://www.googletagmanager.com/gtag/js?id=G-FD12LWN557"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date()); gtag('config', 'G-FD12LWN557');
</script>

</head>
<body class="nd-docs">
<div class="nd-pageheader">
 <div class="container">
 <h1 class="lead">
 <nobr class="widenobr">Measures of Information</nobr>
 <nobr class="widenobr">Reflect Memorization Patterns</nobr>
 </h1>
<address>
  <nobr><a href="https://rachitbansal.github.io/" target="_blank"
  >Rachit Bansal</a><sup>1</sup>,</nobr>
  <nobr><a href="https://baulab.info/" target="_blank"
  >Danish Pruthi</a><sup>2</sup>,</nobr>
  <nobr><a href="https://www.cs.technion.ac.il/~belinkov/" target="_blank"
  >Yonatan Belinkov</a><sup>3</sup></nobr>
 <br>
  <nobr><sup>1</sup><a href="https://www.csail.mit.edu/" target="_blank"
  >Delhi Technological University (Now at Google AI)</a>,</nobr>
  <nobr><sup>2</sup><a href="https://www.cs.cmu.edu/~ddanish/" target="_blank"
  >Carnegie Mellon University (Now at Amazon Web Services)</a>,</nobr>
  <nobr><sup>3</sup><a href="https://www.cs.technion.ac.il/" target="_blank"
  >Technion - IIT</a></nobr>;
</address>
 </div>
</div><!-- end nd-pageheader -->

<div class="container">

<!-- <div class="row justify-content-center" style="margin-bottom: 20px">
<p class="text-center">
<a href="https://memit.baulab.info/"
   >Update!  See our <b>MEMIT</b> paper on scaling to thousands of facts:<br>
           Mass Editing Memory in a Transformer</a>
</p>
</div> -->

<div class="row justify-content-center text-center">
<p>
<a href="https://arxiv.org/pdf/2210.09404.pdf" class="d-inline-block p-3 align-top" target="_blank"><img height="100" width="78" src="/resources/websites/information-measures/paper-thumb.png" style="border:1px solid" alt="ArXiv Preprint thumbnail" data-nothumb=""><br>ArXiv<br>Preprint</a>
<a href="https://github.com/technion-cs-nlp/Information-Reflects-Memorization" class="d-inline-block p-3 align-top" target="_blank"><img height="100" width="78" src="/resources/websites/information-measures/code-thumb.png" style="border:1px solid" alt="Github code thumbnail" data-nothumb=""><br>Code and Datasets<br>Github</a>
<a href="https://drive.google.com/file/d/10wU8AcDR5LtUcXcOJkAjDJfPQHo6xROc/view?usp=sharing" class="d-inline-block p-3 align-top" target="_blank"><img height="100" width="78" src="/resources/websites/information-measures/slides-thumb.png" style="border:1px solid" alt="Slides thumbnail" data-nothumb=""><br>Slides</a>
<!-- <a href="/data" class="d-inline-block p-3 align-top" target="_blank"><img height="100" width="78" src="images/data-thumb.png" style="border:1px solid" data-nothumb="" alt="Data directory thumbnail"><br>Dataset<br>and Models</a>
<a href="https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/rome.ipynb" class="d-inline-block p-3 align-bottom" target="_blank"><img height="78" width="136" src="images/colab-thumb.png" style="border:1px solid" alt="Colab demo thumbnail" data-nothumb=""><br>Demo Colab:<br> Model Editing</a>
<a href="https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/causal_trace.ipynb" class="d-inline-block p-3 align-bottom" target="_blank"><img height="78" width="136" src="images/colab-thumb.png" style="border:1px solid" alt="Colab demo thumbnail" data-nothumb=""><br>Demo Colab:<br> Causal Tracing</a> -->
</p>
</div><!-- end container -->

<!-- <div class="card" style="max-width: 1020px;"> -->
<!-- <div class="card-block"> -->

<h3>Are Intrinsic Activation Patterns Indicative of Memorization?</h3>
<p>
  In this work, we posit that organization of information across internal activations of a network could
  be indicative of memorization. 

  Consider a sample task of separating concentric circles, illustrated
  in the figure below. A two-layered feed-forward network can learn the circular decision boundary for this
  task. However, if the nature of this learning task is changed, the network may resort to memorization.

  When a spurious feature is introduced in this dataset such that its value (+/−) correlates to the
  label (0/1), the network memorizes the feature-to-label mapping, reflected in a uniform
  activation pattern across neurons. In our work we refer to this kind of model behaviour as <em>heuristic memorization</em>.

  In contrast, when labels for the original set are shuffled,
  the same network memorizes individual examples during training and shows a high
  amount of diversity in its activation patterns. This example demonstrates how memorizing
  behavior is observed through diversity in neuron activations.
  We refer to this form of memorization as <em>example-level memorization</em>.
</p>
<img src="/resources/websites/information-measures/circles-reps.gif" class="medfig" alt="A toy task demonstrating that activation patterns are representative of memorization.">
<!-- </div>card-block-->
<!-- </div>
</div> -->

<div class="row">
<div class="col">
<h2>Why Intrinsic Evaluation?</h2>
<p>
  The impetus to study ways to perform intrinsic evaluation of neural networks is multi-fold:
  <ol>
    <li>The practical perspective.
      Generally, evaluating models for the afore-defined memorization
      requires researchers to laboriously collect specialized
      and labeled datasets to measure the extent of suspected fallacies in models. 
      While these sets make it possible to assess model behavior over a chosen set of features, 
      the larger remaining features remain hard to identify and study.
    <li>The interpretability perspective.
      Performance measures over the evaluation sets that are generally used to assess model behaviour
      lack interpretability and are not indicative of internal workings
      that manifest certain model behaviors.
  </ol>
  In this work, we find that activation patterns from in-distribution example sets
  are indicative generalization behaviour on their own.
</p>

<h2>How do we Quantify Neuron Diversity?</h2>
<p>
  We formalize the notion of diversity across neuron activations through two measures:
  (i) <b>intra-neuron diversity</b>: the variation of activations for a neuron across a set of examples, and
  (ii) <b>inter-neuron diversity</b>: the dissimilarity between pairwise neuron activations on the same set of examples.
  We hypothesize that the nature of these quantities for two networks could point to underlying differences
  in their generalizing behavior. In order to quantify intra-neuron and inter-neuron diversity, we adopt
  the information-theoretic measures of <b>entropy</b> and <b>mutual information</b> (MI), respectively.
</p>
<p class="text-center">
<img src="/resources/websites/information-measures/hypothesis.png" class="smallfig" alt="Summarizing our hypothesis.">
</p>

<h2>What did we find?</h2>

<h3>Heuristic Memorization</h3>
<p>
  Through our experiments, we first note that <b>low entropy across neural activations
  indicates heuristic memorization in networks</b>. This is evident from the figure below, where we see that
  (1) as we increase α the validation performance decreases, indicating heuristic memorization (see the
  solid line in the plots); and (2) with an increase in this heuristic memorization, we see lower entropy
  across neural activations. We show the entropy values of neural activations for the 3 layers of an MLP
  trained on Colored MNIST (left sub-plot) and for the last two layers of DistilBERT on Sentiment
  Adjectives (right sub-plot).
  In both these two scenarios, we see a consistent drop in the entropy
  with increasing values of heurisitic memorization (α), with a particularly sharp decline when α = 1.0.
</p>
<p class="text-center">
<img src="/resources/websites/information-measures/entropy-hm.png" class="bigfig" alt="Entropy results for heuristic memorization.">
</p>

<p>
  Furthermore, we observe that <b>networks with higher heuristic memorization exhibit higher mutual information</b> across pairs of neurons.
  In the figure below, networks with higher memorization (↑ α),
  have larger density of neurons in the high mutual information region. While this trend is consistent
  across the two settings, we see some qualitative differences: The memorizing (α = 1.0) MLP network on Colored MNIST (left) has a uniform distribution across the entire scale of MI values, while
  DistilBERT on Sentiment Adjectives (right) largely has a high-density peak for an MI of ∼ 0.9
</p>
<p class="text-center">
<img src="/resources/websites/information-measures/mi-hm.png" class="bigfig" alt="MI results for heuristic memorization.">
</p>

<h3>Example-level Memorization</h3>
<p>
  We simulate example-level memorization through label shuffling on a variety of datasets.
  First, we note that model performance on the validation set decreases with increased label
  shuffling, validating an increase in example-level memorization (in the figure below). Interestingly, this dip in
  validation accuracy is accompanied with a consistent rise in entropy across the neurons. For MLP
  networks trained on MNIST (left), we see a distinct rise in entropy even with a small amount of label
  shuffling (β = 0.25), followed by a steady increase (layers 2 and 3) or no change (layer 1) in entropy.
  A dissimilar trend is seen for DistilBERT fine-tuned on IMDb (right): a distinct rise for high values of
  β and a consistent value for low or no label shuffling.
</p>
<p class="text-center">
<img src="/resources/websites/information-measures/entropy-em.png" class="bigfig" alt="Entropy results for example-level memorization.">
</p>

<p>
  Our hypothesis for the relation between example-level memorization and MI is supported by the figure below.
  In both settings, networks trained on higher β values consist of neuron pairs that show low values of
  MI (left side of the plots). In line with the previous observations, we find that MLPs trained on some
  amount of label noise (any β > 0.00) on MNIST (left sub-plot) have a higher density of neuron pairs
  concentrated at low values of MI. Meanwhile, for DistilBERT on IMDb (right sub-plot), we observe
  that neuron pair density gradually shifts towards lower values of MI with increasing β.
</p>
<p class="text-center">
<img src="/resources/websites/information-measures/mi-em.png" class="bigfig" alt="MI results for example-level memorization.">
</p>

<div class="row justify-content-center" style="margin-bottom: 20px">
<p class="text-center">
<a href="https://arxiv.org/abs/2210.09404"
   >Please see our paper for experiments on real-world instances of memorization
   and more detailed discussions on the larger implications of our findings.</a>
</p>
</div>
<!-- <h2>Related Work</h2>

<p>Our work builds upon insights in other work that has examined
large transformer language models and large neural networks from
several other perspectives:

<h3>Transformer Mechanisms</h3>
<p class="citation"><a href="https://transformer-circuits.pub/2021/framework/index.html"><img src="images/elhage-2021.png" alt="elhage-2021">Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah. A Mathematical Framework for Transformer Circuits. Anthropic 2021.</a><br>
<b>Notes:</b> Analyzes internal mechanisms of transformer components, developing mathematical tools for understanding patterns of computations.  Observes information-copying behavior in self-attention and implicates it in the strong performance of transformers.
<p class="citation"><a href="https://aclanthology.org/2021.emnlp-main.446.pdf"><img src="images/geva-2021.png" alt="geva-2021">Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy. Transformer Feed-Forward Layers Are Key-Value Memories.  EMNLP 2021.</a><br>
<b>Notes:</b> Proposes the view that transformer MLP modules act as key-value memories akin to two-layer softmax-based memory data structures.  Analyzes the contribution of these modules to token representations at each layer.
<p class="citation"><a href="https://arxiv.org/abs/2101.04547"><img src="images/zhao-2021.png" alt="zhao-2021">Sumu Zhao, Dami&aacute;n Pascual, Gino Brunner, Roger Wattenhofer. Of Non-Linearity and Commutativity in BERT. IJCNN 2021.</a><br>
<b>Notes:</b> Conducts a number of experiments of the computations of
transformer models, including an experiment that shows that swapping adjacent layers
of a transformer has only minimal impact on its behavior.

<h3>Extracting Knowledge from LMs</h3>

<p class="citation"><a href="https://arxiv.org/pdf/1909.01066.pdf"><img src="images/petroni-2019.png" alt="petroni-2019">Fabio Petroni, Tim Rocktaschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. Language Models as Knowledge Bases? EMNLP-IJCNLP 2019.</a><br>
<b>Notes:</b>
Proposes using fill-in-the-blank prompts for extracting knowledge from large language models.

<p class="citation"><a href="https://arxiv.org/pdf/1911.12543.pdf"><img src="images/jiang-2020.png" alt="jiang-2020">Zhengbao Jiang, Frank F. Xu, Jun Araki, Graham Neubig. How Can We Know What Language Models Know? TACL 2020.</a><br>
<b>Notes:</b> Discusses various ways to diversify prompts to improve extraction of knowledge from language models.

<p class="citation"><a href="https://arxiv.org/pdf/2002.08910.pdf"><img src="images/roberts-2020.png" alt="roberts-2020">Adam Roberts, Colin Raffel, Noam Shazeer.  How Much Knowledge Can You Pack Into the Parameters of a Language Model?  EMNLP 2020.</a><br>
<b>Notes:</b> Proposes fine-tuning a pretrained transformer language model to expand its ability to answer factual questions without reliance on an external knowledge source.

<p class="citation"><a href="https://arxiv.org/pdf/2104.05240.pdf"><img src="images/zhong-2021.png" alt="zhong-2021">Zexuan Zhong, Dan Friedman, Danqi Chen.  Factual Probing Is [MASK]: Learning vs. Learning to Recall.  NAACL 2021.</a><br>
<b>Notes:</b> Examines the use of learned knowledge probes for extracting knowledge, and also notes the risks of hallucinating new knowledge rather than extracting knowledge when using this technique.

<p class="citation"><a href="https://arxiv.org/pdf/2102.01017.pdf"><img src="images/elazer-2021.png" alt="elazer-2021">Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Sch&uuml;tze, Yoav Goldberg.  Measuring and Improving Consistency in Pretrained Language Models.  TACL 2021.</a><br>
<b>Notes:</b> Examines consistent generalization of language models, i.e., whether they predict the same facts under paraphrases.  The fact that models are often inconsistent under paraphrases can be seen as evidence that they do not have generalizable knowledge of some facts.   We use their <em>ParaRel</em> data set as the basis for
<span class="bold-sc">CounterFact</span>.

<h3>Causal Effects inside NNs</h3>

<p class="citation"><a href="https://arxiv.org/pdf/1907.07165.pdf"><img src="images/goyal-2020.png" alt="goyal-2020">Yash Goyal, Amir Feder, Uri Shalit, Been Kim.  Explaining Classifiers with Causal Concept Effect (CaCE).  2019.</a><br>
<b>Notes:</b> From computer vision; observes that causal explanations can come to different conclusion from a correlative analysis, and proposes ways to construct counterfactual explanations in computer vision.

<p class="citation"><a href="https://proceedings.neurips.cc/paper/2020/file/92650b2e92217715fe312e6fa7b90d82-Paper.pdf"><img src="images/vig-2020.png" alt="vig-2020">Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, Stuart Shieber.  Investigating Gender Bias in Language Models Using Causal Mediation Analysis.  NeurIPS 2020.</a><br>
<b>Notes:</b> Applies causal mediation analysis to identify decisive neurons and attention heads responsible for gender bias in large language models.  Identifies a small handful of decisive attention heads in this case.
</p>

<p class="citation"><a href="https://arxiv.org/pdf/2005.13407.pdf"><img src="images/feder-2021.png" alt="feder-2021">Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart.  CausaLM: Causal Model Explanation Through Counterfactual Language Models.  CL 2021.</a><br>
<b>Notes:</b> Devises a framework for understanding the structure of a language model by constructing representation-based counterfactuals and testing the model's causal response to them. 

<p class="citation"><a href="https://arxiv.org/pdf/2006.00995.pdf"><img src="images/elazer-2021b.png" alt="elazer-2021">Yanai Elazar, Shauli Ravfogel, Alon Jacovi, Yoav Goldberg.  Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals.  TACL 2021.</a><br>
<b>Notes:</b> Proposes measuring the importance of specific information within a model by introducing a causal intervention to erase that information, then observing the causal effects.

<h3>Knowledge Editing</h3>

<p class="citation"><a href="https://arxiv.org/pdf/2012.00363.pdf"><img src="images/zhu-2020.png" alt="zhu-2021">Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar. Modifying Memories in Transformer Models.  2020.</a><br>
<b>Notes:</b> Finds that a simple constrained fine-tuning, in which weights are constrained to lie near their pretrained values, is very effective at modifying learned knowledge within a transformer.

<p class="citation"><a href="https://arxiv.org/pdf/2104.08696.pdf"><img src="images/dai-2021.png" alt="dai-2021">Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Furu Wei.  Knowledge Neurons in Pretrained Transformers. 2021.</a><br>
<b>Notes:</b> Building upon Geva (2021), proposes that individual neurons within MLP layers encode individual facts.  Describes an attribution method to find the neurons for a fact, and and conducts experiments manipulating these neurons to edit stored facts.

<p class="citation"><a href="https://arxiv.org/pdf/2104.08164.pdf"><img src="images/decao-2021.png" alt="decao-2021">Nicola De Cao, Wilker Aziz, Ivan Titov. Editing Factual Knowledge in Language Models. EMNLP 2021.</a><br>
<b>Notes:</b> Develops a "KnowledgeEditor" (KE) hypernetwork to fine-tune a model to incorporate a new fact given by a textual description of the fact.  The hypernetwork is an RNN that processes the description as well as the gradients of a loss to propose a complex multilayer change in the network.

<p class="citation"><a href="https://arxiv.org/abs/2110.11309"><img src="images/mitchell-2021.png" alt="mitchell-2021">Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning.  Fast Model Editing at Scale.  ICLR 2022.</a><br>
<b>Notes:</b> Develops a hypernetwork (MEND) to fine-tune a model to change its predictions to match a single run of text.  The hypernetwork uses gradients within the network to infer a small rank-one update to the model; the method is shown to scale to very large transformers.

<h3>Model Editing in Computer Vision</h3>

<p>Model Editing methods that use little or no training data have also been studied in computer vision.

<p class="citation"><a href="https://arxiv.org/pdf/2007.15646.pdf"><img src="images/bau-2020-rewriting.png" alt="bau-2020">David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba. Rewriting a Deep Generative Model. ECCV 2020.</a><br>
<b>Notes:</b> Demonstrates direct editing of associative rules within layers of a generative adversarial network (GAN), allowing a user to alter the appearance of objects in a model without supplying any new training images.  In our current work, we adopt the rank-one memory editing framework and apply it to large language model transformers.

<p class="citation"><a href="https://arxiv.org/pdf/2108.02774.pdf"><img src="images/wang-2021-sketch.png" alt="wang-2021">Sheng-Yu Wang, David Bau, Jun-Yan Zhu. Sketch Your Own GAN. ICCV 2021.</a><br>
<b>Notes:</b> Develops a method for altering a model using only a small number of user-provided sketches and without any new training photos.  Addresses the challenge of having user guidance that is given by examples in a much simpler data domain than the output data.

<p class="citation"><a href="https://arxiv.org/pdf/2108.00946.pdf"><img src="images/gal-2021-nada.png" alt="gal-2021">Rinon Gal, Or Patashnik, Haggai Maron, Amit Bermano, Gal Chechik, Daniel Cohen-Or. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators.</a><br>
<b>Notes:</b> Introduces the use of text guidance to alter a generative model without providing any new training images. Alters stylegan parameters using a directional CLIP objective that guides modified-model images to have specific differences with original-model images, and selects specific layers to modify based on their effect on the objective. -->

<h2>How to Cite</h2>

<p>This work will be presented at NeurIPS 2022.  It can be cited as follows.
</p>

<div class="card">
<h3 class="card-header">bibliography</h3>
<div class="card-block">
<p style="text-indent: -3em; margin-left: 3em;" class="card-text clickselect">
Rachit Bansal, Danish Pruthi, and Yonatan Belinkov. "<em>Measures of Information Reflect Memrorization Patterns.</em>" Advances in Neural Information Processing Systems 35 (2022).
</p>
</div>
<h3 class="card-header">bibtex</h3>
<div class="card-block">
<pre class="card-text clickselect">
@article{bansal2022measures,
  title={Measures of Information Reflect Memrorization Patterns},
  author={Rachit Bansal and Danish Pruthi and Yonatan Belinkov},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  year={2022}
}
</pre>
</div>
</div>
</p>

</div>
</div><!--row -->
</div> <!-- container -->

<footer class="nd-pagefooter">
  <div class="row">
    <div class="col-6 col-md text-center">
      <a href="https://rachitbansal.github.io/">Home</a>
    </div>
  </div>
</footer>

</body>
<script>
$(document).on('click', '.clickselect', function(ev) {
  var range = document.createRange();
  range.selectNodeContents(this);
  var sel = window.getSelection();
  sel.removeAllRanges();
  sel.addRange(range);
});
// Google analytics below.
window.dataLayer = window.dataLayer || [];
</script>
</html>