-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of https://github.yuuza.net/PommesPeter/memo.po…
- Loading branch information
Showing
21 changed files
with
248 additions
and
8,005 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -108,4 +108,6 @@ dist | |
|
||
docs/privated | ||
blog/privated | ||
.vscode | ||
.vscode | ||
|
||
yarn.lock |
14 changes: 0 additions & 14 deletions
14
...-Resistance-Training-using-Prior-Bias-toward-Unbiased-Scene-Graph-Generation.md
This file was deleted.
Oops, something went wrong.
18 changes: 18 additions & 0 deletions
18
...Resistance-Training-using-Prior-Bias-toward-Unbiased-Scene-Graph-Generation.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
title: Prior Bias for Unbiased Scene Graph Generation | ||
authors: [peter] | ||
tags: [SceneGraphGeneration] | ||
--- | ||
|
||
import SumCard from '@site/src/components/SumCard'; | ||
|
||
<SumCard | ||
title="Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation" | ||
authors="Chao Chen / Yibing Zhan / Baosheng Yu / Liu Liu / Yong Luo / Bo Du" | ||
abstract="Scene Graph Generation (SGG) aims to build a structured representation of ascene using objects and pairwise relationships, which benefits downstreamtasks. However, current SGG methods usually suffer from sub-optimal scene graphgeneration because of the long-tailed distribution of training data. To addressthis problem, we propose Resistance Training using Prior Bias (RTPB) for thescene graph generation. Specifically, RTPB uses a distributed-based prior biasto improve models' detecting ability on less frequent relationships duringtraining, thus improving the model generalizability on tail categories. Inaddition, to further explore the contextual information of objects andrelationships, we design a contextual encoding backbone network, termed as DualTransformer (DTrans). We perform extensive experiments on a very popularbenchmark, VG150, to demonstrate the effectiveness of our method for theunbiased scene graph generation. In specific, our RTPB achieves an improvementof over 10% under the mean recall when applied to current SGG methods.Furthermore, DTrans with RTPB outperforms nearly all state-of-the-art methodswith a large margin." | ||
url="http://arxiv.org/abs/2201.06794"/> | ||
|
||
## Motivation | ||
|
||
- 使用 resistance bias 为了加强联系关联性差距比较大的两个关系的能力 | ||
- 使用 dual Transformer 加强对于关系的识别的全局上下文的特征,使得编码器编码的特征信息更充分 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
docs/DeepLearning/04.PaperReading/VLMs/01.Grounded_Language-Image_Pre-training.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
title: Grounded Language-Image Pre-training | ||
tags: | ||
- VLMs | ||
--- | ||
|
||
# Grounded Language-Image Pre-training | ||
|
||
## Abstract | ||
|
||
学习对象级、语言感知和语义丰富的视觉表示。GLIP 将目标检测和 phrase grounding 统一起来进行预训练。好处在于: | ||
|
||
1. 它允许 GLIP 从检测和接地数据中学习以改进任务并引导良好的接地模型; | ||
2. GLIP 可以通过自我训练的方式生成接地框来利用海量的图像-文本对,使学习的表示语义丰富。 | ||
|
||
学习的表征对各种对象级识别任务具有很强的零样本和少样本可迁移性,适合用于处理 zero-shot 和 few-shot 任务。实验结果也表明我们效果最好。 | ||
|
||
## Motivation | ||
|
||
提高模型的 zero-shot 和 few-shot 能力,利用目前的 pre-trained 大模型来实现。 | ||
|
||
## Method how to use? | ||
|
||
![Alt text](./src/01.Grounded_Language-Image_Pre-training/image.png) | ||
|
||
目标检测任务中的对每个区域进行分类的任务转换成将每个区域对齐到文本 prompt 中的 c 个短语,将视觉检测定义为 grounding task。输入一些 image-text pair,将其融合后使得图像上的物体和文本描述中的 prompt 对应的单词对齐来完成分类的任务。(带检测模块的 CLIP) | ||
|
||
1. 定义好新的范式以及如何利用 prompt 作为特征信息的一部分。 | ||
|
||
2. 设计好对应的 image encoder 和 text encoder。本文使用的 image encoder 为 DyHead (Dynamic Head: Unifying Object Detection Heads with Attentions),使用的 text encoder 为 BERT。 | ||
|
||
3. 除此之外还要设计一个 cross-modality multi-head attention module (X-MHA)。每个头部通过关注另一个模态来计算一个模态的上下文向量。 | ||
|
||
4. 最后再准备大量数据进行 pre-training 的训练。 | ||
|
||
5. 迁移到其他 benchmark 进行验证训练。 | ||
|
||
## Heuristic Thinking | ||
|
||
`GLIP 统一了 phrase grounding 和目标检测任务,因为目标检测可以被视为上下文无关的 phrase grounding,而 phrase grounding 可以被视为上下文化的目标检测任务。` | ||
是否能够利用 phrase grounding 增强其视觉的语义丰富度?(增强 context) | ||
|
||
是否可以通过 prompt 提示出 visual feature (ROI feature, union feature, etc.) 中的关系呢,通过设计 prompt? | ||
|
||
之前的工作中已经有用 glove 使用其 phrase feature 来提高 zero-shot 的能力。是否可以说明 nlp 当中的模型天生具备比较强的 zero-shot 学习能力呢? |
Binary file added
BIN
+143 KB
...ning/04.PaperReading/VLMs/src/01.Grounded_Language-Image_Pre-training/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
3cfe3fe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
memo-pommespeter-space – ./
memo-pommespeter-space-git-master-pommespeter.vercel.app
memo-pommespeter-space.vercel.app
memo-pommespeter-space-pommespeter.vercel.app
memo.sylin.host
memo2.xiee.ltd
memo.pommespeter.space