Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
ptimons44 committed Dec 13, 2023
2 parents 8c7b1ae + 299f1ce commit e2f5ab7
Show file tree
Hide file tree
Showing 288 changed files with 12,823 additions and 26 deletions.
24 changes: 0 additions & 24 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,25 +1 @@
<!-- Please make sure you are opening a pull request against the `accepted` branch (not master!) of the STAGING repo (not 2023!) -->

## OpenReview Submission Thread

<!-- link to your OpenReview submission -->

## Checklist before requesting a review

<!-- To tick a box, put an 'x' inside it (e.g. [x]) -->

- [ ] I am opening a pull request against the `accepted` branch of the `staging` repo.
- [ ] I have de-anonymized my post, added author lists, etc.
- [ ] My post matches the formatting requirements
- [ ] I have a short 2-3 sentence abstract in the `description` field of my front-matter ([example](https://github.com/iclr-blogposts/staging/blob/aa15aa3797b572e7b7bb7c8881fd350d5f76fcbd/_posts/2022-12-01-distill-example.md?plain=1#L4-L5))
- [ ] I have a table of contents, formatted using the `toc` field of my front-matter ([example](https://github.com/iclr-blogposts/staging/blob/aa15aa3797b572e7b7bb7c8881fd350d5f76fcbd/_posts/2022-12-01-distill-example.md?plain=1#L33-L42))
- [ ] My bibliography is correctly formatted, using a `.bibtex` file as per the sample post

## Changes implemented in response to reviewer feedback

- [ ] Tick this box if you received a conditional accept
- [ ] I have implemented the necessary changes in response to reviewer feedback (if any)

<!-- briefly add your changes in response to reviewer feedback -->

## Any other comments
1 change: 0 additions & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ footer_text: >
keywords: machine-learning, ml, deep-learning, reinforcement-learning, iclr # add your own keywords or leave empty

lang: en # the language of your site (for example: en, fr, cn, ru, etc.)
icon: iclr_favicon.ico # the emoji used as the favicon (alternatively, provide image name in /assets/img/)
url: https://deep-learning-mit.github.io # the base hostname & protocol for your site
baseurl: /staging # the subpath of your site, e.g. /blog/

Expand Down
180 changes: 180 additions & 0 deletions _posts/2022-11-09-how-cnns-learn-shapes.md

Large diffs are not rendered by default.

93 changes: 93 additions & 0 deletions _posts/2023-11-01-Symmetry-Optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: distill
title: Investigating the Impact of Symmetric Optimization Algorithms on Learnability
description: Recent theoretical papers in machine learning have raised concerns about the impact of symmetric optimization algorithms on learnability, citing hardness results from theoretical computer science. This project aims to empirically investigate and validate these theoretical claims by designing and conducting experiments at scale. Understanding the role of optimization algorithms in the learning process is crucial for advancing the field of machine learning.
date: 2023-11-09
htmlwidgets: true

# Anonymize when submitting
# authors:
# - name: Anonymous

authors:
- name: Kartikesh Mishra
url: ""
affiliations:
name: MIT
- name: Divya P Shyamal
url: ""
affiliations:
name: MIT

# must be the exact same name as your blogpost
bibliography: 2023-11-01-Symmetry-Optimization.bib

# Add a table of contents to your post.
# - make sure that TOC names match the actual section names
# for hyperlinks within the post to work correctly.
toc:
- name: Introduction
- name: Experimental design
subsections:
- name: Learning Tasks and Datasets
- name: Learning Algorithms
- name: Evaluation Metrics

# Below is an example of injecting additional post-specific styles.
# This is used in the 'Layouts' section of this post.
# If you use this post as a template, delete this _styles block.
_styles: >
.fake-img {
background: #bbb;
border: 1px solid rgba(0, 0, 0, 0.1);
box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
margin-bottom: 12px;
}
.fake-img p {
font-family: monospace;
color: white;
text-align: left;
margin: 12px 0;
text-align: center;
font-size: 16px;
}
---

## Introductions

In practice, the majority of machine learning algorithms exhibit symmetry. Our objective is to explore the impact of introducing asymmetry to different components of a machine learning algorithm, such as architecture, loss function, or optimization, and assess whether this asymmetry enhances overall performance.

Andrew Ng's research <d-cite key="ng2004feature"></d-cite> (https://icml.cc/Conferences/2004/proceedings/papers/354.pdf) suggests that in scenarios requiring feature selection, employing asymmetric (or more precisely, non-rotationally invariant) algorithms can result in lower sample complexity. For instance, in the context of regularized logistic regression, the sample complexity with the L1 norm is O(log n), while with the L2 norm, it is O(n). This insight underscores the potential benefits of incorporating asymmetry, particularly in tasks involving feature selection, to achieve improved learning outcomes. Can asymmetry be more advantageous in other learning tasks? What are the costs associated with using symmetric or asymmetric learning algorithms?

## Experimental Design

Our experiments will proceed as follows. We will have a set of datasets and a set of learning algorithms (both symmetric and asymmetric) from which we will generate models and test them on validation datasets from the same distribution on which they were trained. We will analyze the learning process as well as the performance of these learned models.

### Learning Tasks and Datasets

We plan to use MNIST, CIFAR-100, IRIS Datasets like Banknote Dataset, and a subset of ImageNet. If we complete our training on the image datasets, we may include some text-based datasets from Kaggle. Using these datasets, we plan to analyze several learning tasks: classification, regression, feature selection, and reconstruction.

### Learning Algorithms

We define a gradient descent parametric learning algorithm to be symmetric if it uses the same function to update each parameter value. Currently, we are considering using CNN models with varying numbers of convolution layers, VisTransformers with varying numbers of attention blocks, and MultiLayer Perceptron with varying depths of the network. We will use dropout, skip connections, variation in activation functions, and initialization across layers to introduce asymmetry in the architecture. We will use cross-entropy and MSE Loss functions as asymmetric and symmetric loss functions. For our optimizers, we will use Batch Gradient Descent, Stochastic Gradient Descent, and Adam algorithms, and to introduce asymmetry, we will vary the learning rates, momentum, and weight decay across parameters.

For our initial tests, we plan to compare a few pairs of multi-layer perceptions on the MNIST dataset. Each pair is described in detail below.

- 3-layer perceptron with l as learning rate vs 3-layer perceptron with each layer k having lk learning rates
- 4-layer perceptron vs 4-layer perceptron where some neurons on the 2nd layer skip to the 4th layer directly


## Evaluation Metrics

We will evaluate the trained models using the following metrics and compare the models generated from symmetric algorithms with those from asymmetric algorithms on the same dataset.
- validation accuracy
- Percentage of correct classifications
- negative mean square error for regression and reconstruction
- k-fold cross validation accuracy
- accuracy on perturbed dataset (we will use guassian noise)
- convergence speed during training

## Compute Resources

We plan to use Google Collab for our initial experiments and then use MIT Supercloud for training and
inference on large models.
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
layout: distill
title: Visualization of CLIP's Learning and Perceiving Dynamics
description: This project aims to develop methods and tools to enhance the interpretability of AI systems, focusing on how these systems make decisions and predictions. By creating more transparent AI models, the research seeks to bridge the communication gap between humans and AI, fostering trust and efficiency in various applications, from healthcare to autonomous driving. Such advancements would not only demystify AI operations for non-experts but also aid in the ethical and responsible development of AI technologies.
date: 2023-11-01
htmlwidgets: true

# Anonymize when submitting
# authors:
# - name: Anonymous

authors:
- name: Chi-Li Cheng
url: "https://chilicheng.com"
affiliations:
name: Massachusetts Institute of Technology

# must be the exact same name as your blogpost
bibliography: 2023-11-01-Visualization of CLIP's Learning and Perceiving Dynamics.bib

# Add a table of contents to your post.
# - make sure that TOC names match the actual section names
# for hyperlinks within the post to work correctly.
toc:
- name: Project Proposal
subsections:
- name: Abstract
- name: Introduction
- name: Methodology
- name: Potential Contributions

# Below is an example of injecting additional post-specific styles.
# This is used in the 'Layouts' section of this post.
# If you use this post as a template, delete this _styles block.
_styles: >
.fake-img {
background: #bbb;
border: 1px solid rgba(0, 0, 0, 0.1);
box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
margin-bottom: 12px;
}
.fake-img p {
font-family: monospace;
color: white;
text-align: left;
margin: 12px 0;
text-align: center;
font-size: 16px;
}
---

## Project Proposal
In this project, I delve into the intricate capabilities of the CLIP (Contrastive Language–Image Pre-training) model<d-cite key="radford2021learning"></d-cite>, renowned for its human-like ability to process both visual and textual data. Central to my research is the belief that visualization plays a crucial role in understanding complex AI systems. With this in mind, I have set two primary objectives: first, to develop innovative visualization techniques that can provide a deeper, more intuitive understanding of CLIP's learning and perception processes; and second, to analyze how the CLIP model dynamically processes sequential images or videos, focusing on visualizing and interpreting the flow field during training and the trajectory characteristics during video content processing.


### Introduction

The CLIP model, which stands for Contrastive Language–Image Pre-training, represents a groundbreaking approach in integrating visual and textual data within the realm of artificial intelligence. In my project, I undertake an in-depth exploration of this model through a two-fold approach. Initially, my focus is on developing advanced visualization techniques that are tailored to decode and highlight the intricate learning and perception mechanisms at the core of CLIP. This inspired by a detailed investigations<d-cite key="wang2020understanding"></d-cite> <d-cite key="shi2023understanding"></d-cite> <d-cite key="zhao2017exact"></d-cite>into the behavior of features on the unit sphere, offering a unique and insightful understanding of the model's operations.

Furthermore, this research extends to a thorough analysis of how the CLIP model processes sequential visual content, with a specific focus on video data. This part of my study goes beyond merely visualizing the model's feature embeddings; it involves a meticulous examination of its dynamic interpretive behaviors. By emphasizing innovative visualization methods, my aim is to demystify the complex and often abstract functionalities of the CLIP model, making these processes more accessible and understandable.

In essence, my project seeks to bridge the gap between the sophisticated computational processes of the CLIP model and our comprehension of these processes. By focusing on groundbreaking visualization techniques, I aspire to deepen our understanding of AI's learning behaviors, thereby contributing significantly to the advancement of artificial intelligence research.

### Method

The project involves several key methodologies:

Innovative Visualization of CLIP's Feature Embeddings: Developing intuitive visual representations of CLIP's embeddings on a hypersphere to demystify high-dimensional data processing and understand the model's predictive mechanisms.

Analyzing Factors Influencing CLIP’s Learning: Examining the impact of pretrained data quality and training dataset composition on CLIP’s learning efficacy.

Visualizing Dynamic Behavior with Sequential Images: Focusing on visualizing CLIP's processing of videos to observe learning patterns and trajectory characteristics, including the creation of a specialized interface for 3D visualization.

Experimental Analysis with Movie Clips: Testing various movie clips to explore if trajectory patterns can reveal video themes or genres, and understanding the correlation between these trajectories and cinematic content.


### Potential Contributions

The research is poised to offer significant contributions:

Enhanced Understanding of CLIP’s Learning Dynamics: Insights into how data quality and dataset composition influence CLIP's learning process.

Evaluating Training Dataset Quality: Providing valuable information on the effectiveness of training datasets, potentially guiding data selection and preparation strategies.

Semantic Trajectory Analysis in Video Content: New insights into CLIP's semantic interpretations of dynamic content, including the evolution of model perception and the formation of 'data islands'.

Implications for Model Training and Content Analysis: The findings could lead to improved training methods for CLIP and similar models, as well as novel methods for content analysis in understanding cinematic themes and narrative structures.
Loading

0 comments on commit e2f5ab7

Please sign in to comment.