-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
425 lines (392 loc) · 31.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
<!DOCTYPE html>
<html lang="ch">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>Computer Vision Reading Group</title>
<link rel="shortcut icon" type="image/png" href="/favicon.png">
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
<link rel="stylesheet" href="/theme/css/screen.css" type="text/css" />
<link rel="stylesheet" href="/theme/css/pygments.css" type="text/css" />
<link rel="stylesheet" href="/theme/css/print.css" type="text/css" media="print" />
<meta name="generator" content="Pelican" />
<meta name="description" content="" />
<meta name="author" content="hustcv" />
</head>
<body>
<header>
<nav>
<ul>
<li class="selected"><a href="/">Home</a></li>
<li><a href="https://github.com/hustcv">GitHub</a></li>
<li><a href="/archives">Archives</a></li>
<li><a href="/tags">Tags</a></li>
<li><a href="/authors">Authors</a></li>
<li><a href="/categories">Categories</a></li>
<li><a href="/notes">Notes</a></li>
</ul>
</nav>
<div class="header_box">
<h1><a href="/">Computer Vision Reading Group</a></h1>
</div>
</header>
<div id="wrapper">
<div id="content"> <h4 class="date">7月 18, 2017</h4>
<article class="post">
<h2 class="title">
<a href="/a-fast-rcnn-hard-positive-generation-via-adversary-for-object-detection.html" rel="bookmark" title="Permanent Link to "A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection"">A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection</a>
</h2>
<p><a href="https://arxiv.org/abs/1704.03414">paper</a></p>
<h2>一、论文思想</h2>
<p>训练一个目标检测器,对遮挡和形变鲁棒,目前的主要方法是增加不同场景下的图像数据,但这些数据有时又特别少。作者提出使用对抗生成有遮挡或形变的样本,这些样本对检测器来说识别比较困难,使用这些困难的正样本训练可以增加检测器的鲁棒性。 使用对抗网络生成有遮挡和有形变的两种特征,分别对应网络ASDN和ASTN。使用对抗网络生成有遮挡和有形变的两种特征,分别对应网络ASDN和ASTN。</p>
<h3>1.ASDN</h3>
<ul>
<li>FAST-RCNN中RoI-池化层之后的每个目标proposal卷积特征作为对抗网络的输入,给定一个目标的特征,ASDN尝试生成特征某些部分被dropout的掩码,导致检测器无法识别该物体。</li>
<li><code>ASDN网络初始化</code>:
给定尺寸大小为d×d的特征图X,使用d3×d3的滑动窗,并将滑动窗位置映射到原图,将原图对应位置清零,生成新的特征向量,传入到分类层计算损失,选择具有最大损失的滑动窗,用这个窗口生成二值掩码M(滑动窗位置为1,其余位置为0),用n个目标proposal生成n对对抗网络的训练样本(x1,M1),...,(xn,Mn) ,使用二值交叉熵损失训练ASDN: </li>
</ul>
<p><img alt="cross_extropy" src="images/cross_etropy.jpg"></p>
<ul>
<li>在前向传播过程中,首先使用ASDN在RoI-池化层之后生成特征掩码,然后使用重要性采样法生成二值掩码,使用该掩码将特征对应部位值清零,修改后的特征继续前向传播计算损失。这个过程生成了困难的特征,用于训练检测器。训练过程流程图如下所示:</li>
</ul>
<p><img alt="ASDN" src="images/ASDN.jpg"></p>
<h3>2.ASTN</h3>
<ul>
<li>STN网络包含三部分:定位网络,网格生成器,采样器。定位网络估计出形变的参数(旋转角度、平移距离和缩放因子)。这三个参数作为后两部分的输入,输出是形变后的特征图。论文主要学习定位网络的三个参数。</li>
<li><code>ASTN</code>:
主要关注特征旋转,定位网络包含三层全连接层,前两层是ImageNet预训练的fc6和fc7,训练过程与ASDN类似,ASTN对特征进行形变,使得ASTN将正样本识别成负样本。将特征图划分为4个block,每个block估计四个方向的旋转,增加了任务的复杂度。</li>
<li>两种对抗网络可以相结合,使得检测器更鲁棒,RoI-池化层提取的特征首先传入ASDN丢弃一些激活,之后使用ASTN对特征进行形变,如下图所示:
<img alt="ASTNandASDN" src="images/ASTNandASDN.jpg"></li>
<li>ASDN 与 ASTN 网络组合架构示意。首先创建遮挡蒙版,随后旋转路径以产生用于训练的例子。</li>
</ul>
<h2>二、训练</h2>
<h3>stage1:training a standard Fast-RCNN</h3>
<div class="highlight"><pre><span></span>./experiments/scripts/fast_rcnn_std.sh [GPU_ID] VGG16 pascal_voc
</pre></div>
<h3>stage2:pre-training stage for the adversarial network</h3>
<div class="highlight"><pre><span></span>./experiments/scripts/fast_rcnn_adv_pretrain.sh [GPU_ID] VGG16 pascal_voc
</pre></div>
<h3>stage3:copy the weights of the above two models to initialize the joint model</h3>
<div class="highlight"><pre><span></span>./copy_model.h
</pre></div>
<h3>stage4: joint training of the detector and the adversarial network</h3>
<div class="highlight"><pre><span></span>./experiments/scripts/fast_rcnn_adv.sh [GPU_ID] VGG16 pascal_voc
</pre></div>
<h2>三、代码解析</h2>
<h3>1.sigmod交叉熵</h3>
<ul>
<li>交叉熵化简:<br>
<img alt="交叉熵化简" src="images/f1.jpg"><br>
<img alt="交叉熵化简" src="images/f2.jpg"> </li>
<li>进一步可化简为:<br>
<img alt="交叉熵化简" src="images/f3.jpg"><br>
<img alt="交叉熵化简" src="images/f4.jpg"> </li>
<li>对应本文中的代码是: <strong><em>adversarial-frcnn/lib/roi_data_layer/layer.py</em></strong><br>
<img alt="代码1" src="images/code1.jpg"></li>
<li>代码解析: </li>
</ul>
<h5>1.注意绝对值使用的巧妙之处:</h5>
<p>** lZ = np.log(1+np.exp(-np.abs(f))) * mask
Lz对应化简公式的第二项,其中e的指数项x在两种情况下,均为非正,可以概括为代码中np.exp(-np.abs(f))</p>
<h5>2.注意判断语句使用的巧妙之处:</h5>
<p>*<em> ((f>0)-t)</em>f * mask该项对应化简公式的第一项,对应caffe源码为:<br>
<img alt="代码2" src="images/code2.jpg"></p>
<h2>四、参考链接</h2>
<ul>
<li>caffe网络可视化工具:http://ethereon.github.io/netscope/#/editor</li>
<li>交叉熵公式推导:http://caffecn.cn/?/question/25</li>
<li>交叉熵公式说明:http://blog.csdn.net/u014114990/article/details/47975739</li>
<li>论文代码:https://github.com/xiaolonw/adversarial-frcnn </li>
</ul>
<p><strong>关于作者</strong>
Edited by <a href="https://github.com/orgs/hustcv/people/galaxy-fangfang">fangfang</a> <a href="https://github.com/orgs/hustcv/people/summerZXH">xiuhong</a></p>
<div class="clear"></div>
<div class="info">
<a href="/a-fast-rcnn-hard-positive-generation-via-adversary-for-object-detection.html">posted at 19:54</a>
· <a href="/category/paper.html" rel="tag">Paper</a>
·
<a href="/tag/detection.html" class="tags">detection</a>
<a href="/tag/adversarial.html" class="tags">adversarial</a>
</div>
<a href="/a-fast-rcnn-hard-positive-generation-via-adversary-for-object-detection.html#disqus_thread">Click to read and post comments</a>
</article> <h4 class="date">7月 16, 2017</h4>
<article class="post">
<h2 class="title">
<a href="/object-detection-in-cvpr2017.html" rel="bookmark" title="Permanent Link to "Object detection in CVPR2017"">Object detection in CVPR2017</a>
</h2>
<p><a href="http://www.cvpapers.com/cvpr2017.html">cvpr2017</a></p>
<h2>detector</h2>
<ol>
<li>Accurate Single Stage Detector Using Recurrent Rolling Convolution <a href="https://arxiv.org/pdf/1704.05776.pdf">paper</a></li>
<li>Training Object Class Detectors With Click Supervision <a href="http://calvin.inf.ed.ac.uk/wp-content/uploads/Publications/papadopoulos17cvpr.pdf">paper</a></li>
<li>Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model <a href="https://arxiv.org/pdf/1611.07544.pdf">paper</a></li>
<li>EAST: An Efficient and Accurate Scene Text Detector <a href="https://arxiv.org/pdf/1704.03155.pdf">paper</a></li>
<li>Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors <a href=""></a></li>
<li>What Is and What Is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors <a href=""></a></li>
<li>Expecting the Unexpected: Training Detectors for Unusual Pedestrians With Adversarial Imposters <a href=""></a></li>
<li>Learning Discriminative and Transformation Covariant Local Feature Detectors <a href=""></a></li>
</ol>
<h2>detection</h2>
<ol>
<li>SRN: Side-output Residual Network for Object Symmetry Detection in the Wild</li>
<li>Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes From 2D Ones in RGB-Depth Images</li>
<li>Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection</li>
<li>Deep Level Sets for Salient Object Detection</li>
<li>Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes</li>
<li>Object Detection in Videos With Tubelet Proposal Networks</li>
<li>Feature Pyramid Networks for Object Detection</li>
<li>Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features</li>
<li>Temporal Convolutional Networks for Action Segmentation and Detection</li>
<li>Discriminative Bimodal Networks for Visual Localization and Detection With Natural Language Queries</li>
<li>Interspecies Knowledge Transfer for Facial Keypoint Detection</li>
<li>Deep Joint Rain Detection and Removal From a Single Image</li>
<li>CASENet: Deep Category-Aware Semantic Edge Detection</li>
<li>Image Splicing Detection via Camera Response Function Analysis</li>
<li>Scale-Aware Face Detection</li>
<li>Perceptual Generative Adversarial Networks for Small Object Detection</li>
<li>Predictive-Corrective Networks for Action Detection (project, abstract, PDF)</li>
<li>Unified Embedding and Metric Learning for Zero-Exemplar Event Detection</li>
<li>A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection (PDF)</li>
<li>Multiple Instance Detection Network With Online Instance Classifier Refinement</li>
<li>Visual Translation Embedding Network for Visual Relation Detection</li>
<li>SCC: Semantic Context Cascade for Efficient Action Detection</li>
<li>End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering</li>
<li>Joint Detection and Identification Feature Learning for Person Search</li>
<li>Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection</li>
<li>Visual-Inertial-Semantic Scene Representation for 3D Object Detection</li>
<li>A Deep Regression Architecture With Two-Stage Re-Initialization for High Performance Facial Landmark Detection</li>
<li>Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection</li>
<li>Polyhedral Conic Classifiers for Visual Object Detection and Classification</li>
<li>Incremental Kernel Null Space Discriminant Analysis for Novelty Detection</li>
<li>Straight to Shapes: Real-Time Detection of Encoded Shapes</li>
<li>Learning Cross-Modal Deep Representations for Robust Pedestrian Detection</li>
<li>Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection From Videos</li>
<li>Provable Self-Representation Based Outlier Detection in a Union of Subspaces</li>
<li>Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection</li>
<li>CityPersons: A Diverse Dataset for Pedestrian Detection</li>
<li>Hand Keypoint Detection in Single Images Using Multiview Bootstrapping</li>
<li>Minimum Delay Moving Object Detection</li>
<li>Weakly Supervised Affordance Detection</li>
<li>RON: Reverse Connection With Objectness Prior Networks for Object Detection</li>
<li>Deeply Supervised Salient Object Detection With Short Connections</li>
<li>Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion</li>
<li>Joint Gap Detection and Inpainting of Line Drawings</li>
<li>MCMLSD: A Dynamic Programming Approach to Line Segment Detection</li>
<li>Richer Convolutional Features for Edge Detection</li>
<li>What Can Help Pedestrian Detection?</li>
<li>UntrimmedNets for Weakly Supervised Action Recognition and Detection</li>
<li>Multi-View 3D Object Detection Network for Autonomous Driving</li>
<li>Non-Local Deep Features for Salient Object Detection</li>
<li>Unsupervised Vanishing Point Detection and Camera Calibration From a Single Manhattan Image With Radial Distortion</li>
<li>Action Unit Detection With Region Adaptation, Multi-Labeling Learning and Optimal Temporal Fusing</li>
<li>Mimicking Very Efficient Network for Object Detection</li>
<li>Learning Detection With Diverse Proposals</li>
<li>YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video</li>
</ol>
<div class="clear"></div>
<div class="info">
<a href="/object-detection-in-cvpr2017.html">posted at 07:44</a>
· <a href="/category/list.html" rel="tag">List</a>
·
<a href="/tag/detection.html" class="tags">detection</a>
<a href="/tag/cvpr.html" class="tags">cvpr</a>
</div>
<a href="/object-detection-in-cvpr2017.html#disqus_thread">Click to read and post comments</a>
</article> <h4 class="date">7月 01, 2017</h4>
<article class="post">
<h2 class="title">
<a href="/people-in-object-detection.html" rel="bookmark" title="Permanent Link to "People in object detection"">People in object detection</a>
</h2>
<h2><a href="http://kaiminghe.com/">Kaiming He</a></h2>
<h4><a href="https://arxiv.org/find/all/1/all:+AND+Kaiming+He/0/1/0/all/0/1">arxiv paper list</a> :white_check_mark:</h4>
<ol>
<li>Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition <a href="https://arxiv.org/pdf/1406.4729.pdf">paper</a> <a href="https://github.com/ShaoqingRen/SPP_net">code</a></li>
<li>Convolutional Neural Networks at Constrained Time Cost <a href="https://arxiv.org/pdf/1412.1710.pdf">paper</a> </li>
<li>Efficient and Accurate Approximations of Nonlinear Convolutional Networks <a href="https://arxiv.org/pdf/1411.4229.pdf">paper</a></li>
<li>ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation <a href="http://arxiv.org/abs/1604.05144">paper</a></li>
<li>Instance-aware Semantic Segmentation via Multi-task Network Cascades <a href="https://arxiv.org/abs/1512.04412">paper</a></li>
<li>Deep Residual Learning for Image Recognition <a href="http://arxiv.org/abs/1512.03385">paper</a></li>
<li>Identity Mappings in Deep Residual Networks <a href="http://arxiv.org/abs/1603.05027">paper</a></li>
<li>Instance-sensitive Fully Convolutional Networks <a href="http://arxiv.org/abs/1603.08678">paper</a></li>
<li>Is Faster R-CNN Doing Well for Pedestrian Detection? <a href="http://arxiv.org/abs/1607.07032">paper</a></li>
<li>R-FCN: Object Detection via Region-based Fully Convolutional Networks <a href="http://arxiv.org/abs/1605.06409">paper</a></li>
<li>Aggregated Residual Transformations for Deep Neural Networks <a href="https://arxiv.org/abs/1611.05431">paper</a></li>
<li>Feature Pyramid Networks for Object Detection <a href="https://arxiv.org/abs/1612.03144">paper</a></li>
<li>Mask R-CNN <a href="https://arxiv.org/abs/1703.06870">paper</a></li>
<li>Detecting and Recognizing Human-Object Interactions <a href="https://arxiv.org/abs/1704.07333">paper</a> </li>
</ol>
<h2><a href="http://www.rossgirshick.info/">Ross Girshick</a></h2>
<h4><a href="https://arxiv.org/find/cs/1/au:+Girshick_R/0/1/0/all/0/1">arxiv paper list</a> :white_check_mark:</h4>
<h2><a href="https://www2.eecs.berkeley.edu/Faculty/Homepages/darrell.html">Trevor Darrell</a></h2>
<h4><a href="https://arxiv.org/find/all/1/all:+AND+Trevor+Darrell/0/1/0/all/0/1">arxiv paper list</a> :white_check_mark:</h4>
<h2><a href="http://rogerioferis.com">Rogerio Feris</a></h2>
<ol>
<li>S3Pool: Pooling with Stochastic Spatial Sampling <a href="https://arxiv.org/pdf/1611.05138.pdf">paper</a> <a href="https://github.com/Shuangfei/s3pool">code</a></li>
<li>Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes <a href="http://rogerioferis.com/publications/DDANCVPR2015.pdf">paper</a></li>
<li>A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection <a href="http://rogerioferis.com/publications/ZhaoweiECCV2016.pdf">paper</a> <a href="https://github.com/zhaoweicai/mscnn">caffe</a></li>
<li>Shape Classification Through Structured Learning of Matching Measures <a href="http://rogerioferis.com/publications/FerisCVPR09a.pdf">paper</a></li>
<li>Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance <a href="http://rogerioferis.com/publications/FerisICME2012.pdf">paper</a></li>
<li>Boosting Object Detection Performance in Crowded Surveillance Videos <a href="http://rogerioferis.com/publications/FerisWACV2013.pdf">paper</a></li>
<li>Efficient Maximum Appearance Search for Large-Scale Object Detection <a href="http://rogerioferis.com/publications/ChenCVPR2013.pdf">paper</a></li>
<li>Fast Face Detector Training Using Tailored Views <a href="http://rogerioferis.com/publications/KristinaICCV2013.pdf">paper</a></li>
<li>Attribute-based People Search: Lessons Learnt from a Practical Surveillance System <a href="http://rogerioferis.com/publications/FerisICMR2014.pdf">paper</a></li>
<li>Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification <a href="https://arxiv.org/pdf/1611.05377.pdf">paper</a></li>
<li>[BOOK] Visual Attributes <a href="https://www.springer.com/us/book/9783319500751">link</a> <a href="http://rogerioferis.com/publications/IntroductionToVisualAttributes.pdf">introduce</a></li>
</ol>
<h2><a href="https://pdollar.github.io/">Piotr Dollar</a></h2>
<h4><a href="https://pdollar.wordpress.com/">blog</a></h4>
<ol>
<li>Supervised Learning of Edges and Object Boundaries <a href="https://pdollar.github.io/files/papers/DollarCVPR06edge.pdf">paper</a></li>
<li>Multiple Component Learning for Object Detection <a href="https://pdollar.github.io/files/papers/DollarECCV08mcl.pdf">paper</a></li>
<li>Fast Feature Pyramids for Object Detection <a href="https://pdollar.github.io/files/papers/DollarPAMI14pyramids.pdf">paper</a></li>
<li>Detecting Objects using Deformation Dictionaries <a href="https://pdollar.github.io/files/papers/HariharanCVPR14deformdict.pdf">paper</a></li>
<li>Edge Boxes: Locating Object Proposals from Edges <a href="https://pdollar.github.io/files/papers/ZitnickDollarECCV14edgeBoxes.pdf">paper</a></li>
<li>What makes for effective detection proposals? <a href="https://arxiv.org/pdf/1502.05082.pdf">paper</a></li>
<li>Learning to Segment Object Candidates <a href="https://arxiv.org/pdf/1506.06204.pdf">paper</a></li>
<li>Semantic Amodal Segmentation <a href="https://arxiv.org/pdf/1509.01329.pdf">paper</a></li>
<li>Unsupervised Learning of Edges <a href="http://arxiv.org/pdf/1511.04166.pdf">paper</a></li>
<li>A MultiPath Network for Object Detection <a href="http://arxiv.org/pdf/1604.02135.pdf">paper</a></li>
<li>Learning to Refine Object Segments <a href="http://arxiv.org/pdf/1603.08695.pdf">paper</a></li>
</ol>
<h2><a href="http://www.xiaoyumu.com">Xiaoyu Wang</a></h2>
<ol>
<li>Regionlets for Generic Object Detection <a href="http://www.xiaoyumu.com/s/PDF/Regionlets.pdf">ICCV 2013</a> <a href="http://www.xiaoyumu.com/s/PDF/Regionlets-pami.pdf">T-PAMI 2015</a></li>
<li>Generic Object Detection with Dense Neural Patterns and Regionlets <a href="http://www.xiaoyumu.com/s/PDF/cnnRegionlet.pdf">paper</a></li>
<li>Accurate Object Detection with Location Relaxation and Regionlets Relocalization <a href="http://www.xiaoyumu.com/s/PDF/Regionlets_relocalization.pdf">paper</a></li>
<li>Deep Reinforcement Learning-based Image Captioning with Embedding Reward
<a href="http://www.xiaoyumu.com/s/PDF/CVPR17.pdf">paper</a></li>
<li>SEP-Nets: Small and Effective Pattern Networks
<a href="http://www.xiaoyumu.com/s/PDF/SEP-Nets.pdf">paper</a></li>
</ol>
<h2><a href="https://rodrigob.github.io/">Rodrigo Benenson</a></h2>
<ol>
<li>Traffic Sign Recognition – How far are we from the solution <a href="https://rodrigob.github.io/documents/2013_ijcnn_traffic_signs.pdf">paper</a></li>
<li>Seeking the strongest rigid detector <a href="https://rodrigob.github.io/documents/2013_cvpr_roerei_with_supplementary_material.pdf">paper</a></li>
<li>How good are detection proposals, really? <a href="https://rodrigob.github.io/documents/2014_bmvc_selective_search_with_supplementary_material.pdf">paper</a></li>
<li>Ten Years of Pedestrian Detection, What Have We Learned? <a href="https://rodrigob.github.io/documents/2014_eccvw_ten_years_of_pedestrian_detection_with_supplementary_material.pdf">paper</a></li>
<li>Taking a Deeper Look at Pedestrians <a href="https://arxiv.org/pdf/1501.05790.pdf">paper</a></li>
<li>Filtered Channel Features for Pedestrian Detection <a href="https://arxiv.org/pdf/1501.05759.pdf">paper</a></li>
<li>What makes for effective detection proposals? <a href="https://arxiv.org/pdf/1502.05082.pdf">paper</a></li>
<li>What is Holding Back Convnets for Detection? <a href="https://arxiv.org/pdf/1508.02844.pdf">paper</a> </li>
<li>Weakly Supervised Object Boundaries <a href="https://arxiv.org/pdf/1511.07803.pdf">paper</a></li>
<li>How Far are We from Solving Pedestrian Detection? <a href="https://arxiv.org/pdf/1602.01237.pdf">paper</a></li>
<li>The Cityscapes Dataset <a href="https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2015cvprw.pdf">paper</a></li>
<li>Detecting Surgical Tools by Modelling Local Appearance and Global Shape <a href="https://rodrigob.github.io/documents/2015_tmi_bouget_et_al_detecting_surgical_tools.pdf">paper</a></li>
</ol>
<h2><a href="https://janhosang.com/">Jan Hosang</a></h2>
<ol>
<li>A convnet for non-maximum suppression <a href="https://arxiv.org/pdf/1511.06437.pdf">paper</a></li>
<li>Simple does it: Weakly supervised instance and semantic segmentation <a href="https://arxiv.org/pdf/1603.07485.pdf">paper</a></li>
<li>Learning non-maximum suppression <a href="https://arxiv.org/pdf/1705.02950.pdf">paper</a></li>
</ol>
<h2>Workshop</h2>
<ol>
<li><a href="http://mp7.watson.ibm.com/ICCV2015/ObjectDetectionICCV2015.html">ICCV 2015 Tutorial on Tools for Efficient Object Detection</a></li>
</ol>
<div class="clear"></div>
<div class="info">
<a href="/people-in-object-detection.html">posted at 14:40</a>
· <a href="/category/list.html" rel="tag">list</a>
·
<a href="/tag/detection.html" class="tags">detection</a>
<a href="/tag/people.html" class="tags">people</a>
</div>
<a href="/people-in-object-detection.html#disqus_thread">Click to read and post comments</a>
</article> <h4 class="date">4月 14, 2017</h4>
<article class="post">
<h2 class="title">
<a href="/reading-list.html" rel="bookmark" title="Permanent Link to "Reading List"">Reading List</a>
</h2>
<h2>Object detection</h2>
<ol>
<li>Rich feature hierarchies for accurate object detection and semantic segmentation <a href="https://arxiv.org/pdf/1311.2524">paper</a></li>
<li>Fast R-CNN <a href="https://arxiv.org/pdf/1504.08083">paper</a></li>
<li>Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks <a href="https://arxiv.org/pdf/1506.01497">paper</a></li>
<li>[<a href="http://hustcv.github.io">read</a>]R-FCN: Object Detection via Region-based Fully Convolutional Networks <a href="https://arxiv.org/pdf/1605.06409">paper</a></li>
<li>[<a href="http://hustcv.github.io">read</a>]Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks <a href="https://arxiv.org/pdf/1512.04143.pdf">paper</a></li>
<li>Feature Pyramid Networks for Object Detection <a href="https://arxiv.org/pdf/1612.03144.pdf">paper</a></li>
<li>[<a href="http://hustcv.github.io">read</a>] A-Fast-RCNN: Hard positive generation via adversary for object detection <a href="https://arxiv.org/pdf/1704.03414.pdf">paper</a> <a href="https://github.com/xiaolonw/adversarial-frcnn">github</a></li>
<li>[<a href="">read</a>] Generative Adversarial Networks <a href="https://arxiv.org/abs/1406.2661">paper</a></li>
<li>[<a href="http://hustcv.github.io">read</a>] Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization <a href="https://arxiv.org/abs/1606.04446">paper</a> <a href="https://github.com/gidariss/AttractioNet">caffe</a></li>
<li>[<a href="http://hustcv.github.io">read</a>] Spatial Memory for Context Reasoning in Object Detection <a href="https://arxiv.org/pdf/1704.04224.pdf">paper</a></li>
<li>Accurate Single Stage Detector Using Recurrent Rolling Convolution <a href="https://arxiv.org/pdf/1704.05776.pdf">paper</a></li>
<li>ME R-CNN: Multi-Expert Region-based CNN for Object Detection <a href="https://arxiv.org/pdf/1704.01069.pdf">paper</a></li>
<li>[<a href="http://hustcv.github.io">read</a>] Beyond Skip Connections: Top-Down Modulation for Object Detection <a href="https://arxiv.org/pdf/1612.06851.pdf">paper</a></li>
<li>Improving Object Detection With One Line of Code <a href="https://arxiv.org/pdf/1704.04503.pdf">paper</a></li>
<li>S-OHEM: Stratified Online Hard Example Mining for Object Detection <a href="https://arxiv.org/pdf/1705.02233.pdf">paper</a></li>
<li>Adaptive Object Detection Using Adjacency and Zoom Prediction <a href="https://arxiv.org/pdf/1512.07711.pdf">paper</a> <a href="https://github.com/luyongxi/az-net">caffe</a></li>
<li>You Only Look Once: Unified, Real-Time Object Detection <a href="https://arxiv.org/pdf/1506.02640.pdf">paper</a> </li>
<li>YOLO9000: Better, Faster, Stronger <a href="https://arxiv.org/pdf/1612.08242.pdf">paper</a> </li>
<li>Deformable Convolutional Networks <a href="https://arxiv.org/abs/1703.06211">paper</a> <a href="https://github.com/msracver/Deformable-ConvNets">mxnet</a></li>
<li>Learning Detection with Diverse Proposals <a href="https://arxiv.org/pdf/1704.03533.pdf">paper</a> <a href="https://github.com/azadis/LDDP">caffe</a></li>
<li>Feature Pyramid Networks for Object Detection <a href="https://arxiv.org/pdf/1612.03144.pdf">paper</a></li>
<li>[<a href="http://blog.csdn.net/duinodu/article/details/75013528">read</a>] RON: Reverse Connection with Objectness Prior Networks for Object Detection <a href="Towards lightweight convolutional neural networks for object detection">paper</a></li>
</ol>
<h2>Text detection</h2>
<ol>
<li>Detecting Text in Natural Image with Connectionist Text Proposal Network <a href="https://arxiv.org/pdf/1609.03605.pdf">paper</a></li>
<li>EAST: An Efficient and Accurate Scene Text Detector <a href="https://arxiv.org/pdf/1704.03155.pdf">paper</a></li>
</ol>
<h2>Semantic Image Segmentation</h2>
<ol>
<li>Fully Convolutional Networks for Semantic Segmentation <a href="https://people.eecs.berkeley.edu/%7Ejonlong/long_shelhamer_fcn.pdf">paper</a> <a href="https://github.com/shelhamer/fcn.berkeleyvision.org">caffe</a></li>
<li>Semantic Image Sementation with Deep Convolutional Nets and Fully Connected CRF <a href="https://arxiv.org/pdf/1412.7062.pdf">paper</a></li>
<li>Conditional Random Fields as Recurrent Neural Networks <a href="http://www.robots.ox.ac.uk/~szheng/papers/CRFasRNN.pdf">paper</a> <a href="https://github.com/torrvision/crfasrnn">caffe</a></li>
<li>DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs <a href="https://arxiv.org/pdf/1606.00915.pdf">paper</a> <a href="https://github.com/isht7/pytorch-deeplab-resnet">pytorch</a></li>
<li>[<a href="">read</a>] Fully Convolutional Instance-aware Semantic Segmentation <a href="https://arxiv.org/pdf/1611.07709.pdf">paper</a> <a href="https://github.com/msracver/FCIS">mxnet</a></li>
<li>Loss Max-Pooling for Semantic Image Segmentation <a href="https://arxiv.org/pdf/1704.02966.pdf">paper</a> </li>
<li>[<a href="">read</a>] Mask R-CNN <a href="https://arxiv.org/pdf/1703.06870.pdf">paper</a> <a href="https://github.com/CharlesShang/FastMaskRCNN">tf</a></li>
</ol>
<h2>Recognition and Detection in 3D</h2>
<ol>
<li>3D ShapeNets: A Deep Representation for Volumetric Shapes <a href="https://arxiv.org/pdf/1406.5670.pdf">paper</a> <a href="https://github.com/zhirongw/3DShapeNets">matlab</a></li>
<li>VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition <a href="http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7353481">paper</a> <a href="https://github.com/dimatura/voxnet">Lasagne</a></li>
</ol>
<h2>Visual Reasoning</h2>
<ol>
<li>Inferring and Executing Programs for Visual Reasoning <a href="https://arxiv.org/pdf/1705.03633.pdf">paper</a> <a href="https://github.com/facebookresearch/clevr-iep">pytorch</a></li>
</ol>
<h2>Human motion</h2>
<ol>
<li>Unsupervised Learning of Depth and Ego-Motion from Video <a href="https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner/cvpr17_sfm_final.pdf">paper</a> <a href="https://github.com/tinghuiz/SfMLearner">github</a></li>
</ol>
<h2>CNN and its property</h2>
<ol>
<li>Group Invariant Scattering <a href="https://arxiv.org/abs/1101.2286">paper</a></li>
<li>Invariant Scattering Convolution Networks <a href="https://arxiv.org/pdf/1203.1513.pdf">paper</a></li>
<li>Structured Receptive Fields in CNNs <a href="https://arxiv.org/pdf/1605.02971.pdf">paper</a></li>
<li>Dynamic Filter Networks <a href="https://arxiv.org/pdf/1605.09673.pdf">paper</a></li>
<li>Multiscale Hierarchical Convolutional Networks <a href="https://arxiv.org/abs/1703.04140">paper</a></li>
<li>Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors <a href="https://arxiv.org/abs/1603.04733">paper</a></li>
</ol>
<h2>Image classification</h2>
<ol>
<li>Deep Residual Learning for Image Recognition <a href="https://arxiv.org/pdf/1512.03385.pdf">paper</a></li>
</ol>
<h2>Lightweight CNN</h2>
<ol>
<li>Towards lightweight convolutional neural networks for object detection <a href="https://arxiv.org/pdf/1707.01395.pdf">paper</a></li>
</ol>
<div class="clear"></div>
<div class="info">
<a href="/reading-list.html">posted at 14:40</a>
· <a href="/category/list.html" rel="tag">List</a>
·
<a href="/tag/detection.html" class="tags">detection</a>
<a href="/tag/segmentation.html" class="tags">segmentation</a>
<a href="/tag/lightweight.html" class="tags">lightweight</a>
</div>
<a href="/reading-list.html#disqus_thread">Click to read and post comments</a>
</article>
<div class="clear"></div>
<footer>
<p>
<a href="https://github.com/jody-frankowski/blue-penguin">Blue Penguin</a> Theme
·
Powered by <a href="http://getpelican.com">Pelican</a>
</footer>
</div>
<div class="clear"></div>
</div>
</body>
</html>