-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
40bdefb
commit 72ba683
Showing
87 changed files
with
213 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+78.2 KB
...rdCourses/01.CS224N自然语言处理/src/02.Neural_Classifiers/image-20221206111247762.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+106 KB
...rdCourses/01.CS224N自然语言处理/src/02.Neural_Classifiers/image-20221206152922620.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
30 changes: 30 additions & 0 deletions
30
docs/DeepLearning/04.PaperReading/Attention/01.attention-is-all-you-need.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
title: Attention is All You Need | ||
tags: | ||
- Attention | ||
--- | ||
|
||
# Attention is All You Need | ||
|
||
## Abstract | ||
|
||
现在主流的序列翻译模型(序列到序列的生成模型)都是基于循环神经网络或者卷积神经网络来做的,也就是包含了 encoder 和 decoder。现阶段最好的模型也是通过注意力机制将 encoder 和 decoder 连接起来。所以我们提出了一种仅用注意力机制的简单结构,完全不使用 CNN 和 RNN。我们的架构在其他领域也非常适用。(使用机器翻译领域作为验证) | ||
|
||
## Introduction | ||
|
||
循环语言模型和 encoder-decoder 模型是当时使用频率较高的两种解决方案。输出结构化信息比较多的情况下会使用 encoder-decoder 架构来解决这类问题(比较通用)。提出了两种模型的问题: | ||
|
||
1. 对于语言模型的解决方案通常会使用 RNN 进行学习,第 $t$ 个词的状态由前面的 第 $t-1$ 个词的状态得到。但这个过程是串行的,难以并行地进行运算,性能较差;并且对于早期的状态信息可能会在训练后期丢失(记忆)。 | ||
2. encoder-decoder 也是有类似的问题,结构化的信息较多的情况下彼此之间的具有强相关的关系,难以解耦,导致学习的过程比较粗糙,不利于收敛。 | ||
|
||
将上述的两种结构进行优化,不使用 RNN,而是用 attention 的方式替代,解决无法并行的问题 | ||
|
||
## Background | ||
|
||
传统的 seq2seq 任务中, | ||
|
||
## Conclusion | ||
|
||
第一个仅用注意力机制实现序列转换模型,将 encoder-decoder 架构中的循环层替换成多头自注意力。对于翻译任务 Transformer 的训练速度也比循环层或卷积层快。收敛速度快。 | ||
|
||
我们的模型可以应用其他任务上,在任何基于 attention 的模型都是适用的。Transformer 可以应对不同的数据类型的输入都是适用的,例如图像、音频、视频。 |
45 changes: 45 additions & 0 deletions
45
docs/DeepLearning/04.PaperReading/VLMs/01.Grounded_Language-Image_Pre-training.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
title: Grounded Language-Image Pre-training | ||
tags: | ||
- VLMs | ||
--- | ||
|
||
# Grounded Language-Image Pre-training | ||
|
||
## Abstract | ||
|
||
学习对象级、语言感知和语义丰富的视觉表示。GLIP 将目标检测和 phrase grounding 统一起来进行预训练。好处在于: | ||
|
||
1. 它允许 GLIP 从检测和接地数据中学习以改进任务并引导良好的接地模型; | ||
2. GLIP 可以通过自我训练的方式生成接地框来利用海量的图像-文本对,使学习的表示语义丰富。 | ||
|
||
学习的表征对各种对象级识别任务具有很强的零样本和少样本可迁移性,适合用于处理 zero-shot 和 few-shot 任务。实验结果也表明我们效果最好。 | ||
|
||
## Motivation | ||
|
||
提高模型的 zero-shot 和 few-shot 能力,利用目前的 pre-trained 大模型来实现。 | ||
|
||
## Method how to use? | ||
|
||
![Alt text](./src/01.Grounded_Language-Image_Pre-training/image.png) | ||
|
||
目标检测任务中的对每个区域进行分类的任务转换成将每个区域对齐到文本 prompt 中的 c 个短语,将视觉检测定义为 grounding task。输入一些 image-text pair,将其融合后使得图像上的物体和文本描述中的 prompt 对应的单词对齐来完成分类的任务。(带检测模块的 CLIP) | ||
|
||
1. 定义好新的范式以及如何利用 prompt 作为特征信息的一部分。 | ||
|
||
2. 设计好对应的 image encoder 和 text encoder。本文使用的 image encoder 为 DyHead (Dynamic Head: Unifying Object Detection Heads with Attentions),使用的 text encoder 为 BERT。 | ||
|
||
3. 除此之外还要设计一个 cross-modality multi-head attention module (X-MHA)。每个头部通过关注另一个模态来计算一个模态的上下文向量。 | ||
|
||
4. 最后再准备大量数据进行 pre-training 的训练。 | ||
|
||
5. 迁移到其他 benchmark 进行验证训练。 | ||
|
||
## Heuristic Thinking | ||
|
||
`GLIP 统一了 phrase grounding 和目标检测任务,因为目标检测可以被视为上下文无关的 phrase grounding,而 phrase grounding 可以被视为上下文化的目标检测任务。` | ||
是否能够利用 phrase grounding 增强其视觉的语义丰富度?(增强 context) | ||
|
||
是否可以通过 prompt 提示出 visual feature (ROI feature, union feature, etc.) 中的关系呢,通过设计 prompt? | ||
|
||
之前的工作中已经有用 glove 使用其 phrase feature 来提高 zero-shot 的能力。是否可以说明 nlp 当中的模型天生具备比较强的 zero-shot 学习能力呢? |
Binary file added
BIN
+143 KB
...ning/04.PaperReading/VLMs/src/01.Grounded_Language-Image_Pre-training/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
28 changes: 28 additions & 0 deletions
28
...-Compositional-Learning-towards-Unbiased-Training-for-Scene-Graph-Generation.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
title: State-aware Compositional Learning towards Unbiased Training for Scene Graph Generation | ||
tags: | ||
- PseudoLabels | ||
- SceneGraphGeneration | ||
--- | ||
|
||
# State-aware Compositional Learning towards Unbiased Training for Scene Graph Generation | ||
|
||
## Motivation | ||
|
||
1. 对于 SGG 会产生 biased prediction 的现象探究还不够 | ||
2. 发掘真正影响 SGG 的因素 | ||
|
||
## Introduction | ||
|
||
模型不能过多地依赖 object identity feature,否则就会产生更大的 biased prediction。主要把 object class feature 分解成表示其类别和其状态( relation 的内在特征) | ||
|
||
## Methods | ||
|
||
解耦的目的是?能够预测出相似类别的物体,能够建模出更加具体的 relation。 | ||
|
||
## Conclusions | ||
|
||
Visual Feature 对 SGG 模型的影响不大,去除 Visual Feature 后 SGG 模型的 performance 反而提升。 原因如下: | ||
- visual feature 包含太多冗余信息 | ||
- object identity embedding 也不是影响 SGG model 的影响因素,对指标没有改变。 | ||
- object identity feature 是影响 SGG model 的 关键因素。 |
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters