From d2b5cfce3dd9e8c0bc1690697fb898fdfcb5a8ea Mon Sep 17 00:00:00 2001 From: greatzh Date: Wed, 6 Nov 2024 00:12:32 +0800 Subject: [PATCH] add papers --- README.md | 14 ++++++- SUMMARY.md | 7 +--- image-forgery/2024/README.md | 16 +++---- image-forgery/2024/eitlnet.md | 71 ++++++++++++++++++++++++++++++++ image-forgery/2024/fakebench.md | 25 +++++++++++ image-forgery/2024/fakeshield.md | 7 ++++ image-forgery/2024/forgerygpt.md | 6 +++ image-forgery/2024/forgeryttt.md | 6 +++ image-forgery/2024/miml.md | 7 ++++ image-forgery/2024/omg-fuser.md | 5 +++ image-splicing/mstaf.md | 30 ++++++++++++++ 11 files changed, 180 insertions(+), 14 deletions(-) create mode 100644 image-forgery/2024/fakebench.md create mode 100644 image-forgery/2024/fakeshield.md create mode 100644 image-forgery/2024/forgerygpt.md create mode 100644 image-forgery/2024/forgeryttt.md create mode 100644 image-forgery/2024/miml.md create mode 100644 image-forgery/2024/omg-fuser.md create mode 100644 image-splicing/mstaf.md diff --git a/README.md b/README.md index 44d4b00..a40a4c3 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,7 @@ description: >- * [ ] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention (_CVPR '23_) **\[**[**Paper**](https://arxiv.org/abs/2305.07027)**]** **\[**[**Code**](https://github.com/microsoft/Cream/tree/main/EfficientViT)**]** **\[**[**Note\_community**](https://blog.csdn.net/P\_LarT/article/details/130687567)**]** * [ ] Vision Transformers Need Registers _(ICLR '24)_ **[[Paper](https://openreview.net/forum?id=2dnO3LLiJ1)]** * [ ] LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors _(ECCV '24)_ **[[Paper](https://www.cs.umd.edu/~sakshams/LiFT/)]** +* [ ] DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain [![paper](https://img.shields.io/badge/NeurIPS_'24-dc3545)](https://arxiv.org/abs/2410.14980) [![GitHub](https://img.shields.io/github/stars/w2kun/DCDepth?style=flat)](https://github.com/w2kun/DCDepth) ### Image Tampering @@ -47,6 +48,8 @@ description: >-
2024 +* [ ] Learning Universal Features for Generalizable Image Forgery Localization [![Static Badge](https://img.shields.io/badge/OpenReview-6c757d)](https://openreview.net/forum?id=OKzvovmUbh) +* [ ] A Large-scale Interpretable Multi-modality Benchmark for Image Forgery Localization [![Static Badge](https://img.shields.io/badge/OpenReview-6c757d)](https://openreview.net/forum?id=7AvYFqcNfn) * [ ] ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.04032) * [ ] ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.10238) * [ ] FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2404.13306) [![GitHub](https://img.shields.io/github/stars/Yixuan423/FakeBench?style=flat)](https://github.com/Yixuan423/FakeBench) @@ -157,6 +160,7 @@ description: >-
2021 +* [ ] Multi-modality image manipulation detection (_ICME '21_) **\[**[**Paper**](https://doi.org/10.1109/ICME51207.2021.9428232)**]** * [ ] MSTA-Net: Forgery Detection by Generating Manipulation Trace Based on Multi-Scale Self-Texture Attention (_TCSVT '21_) **\[**[**Paper**](https://ieeexplore.ieee.org/document/9643421)**]** * [ ] Image Manipulation Detection by Multi-View Multi-Scale Supervision (_ICCV '21_) **\[**[**Paper**](https://arxiv.org/abs/2104.06832)**]** **\[**[**Code**](https://github.com/dong03/MVSS-Net)**]** * [x] [TransForensics: Image Forgery Localization with Dense Self-Attention](image-forgery/2021/transforensics.md) (_ICCV '21_) **\[**[**Paper**](https://arxiv.org/abs/2108.03871)**]** @@ -207,7 +211,7 @@ _Some of the above papers also contain methods to detect tampered images generat * [ ] D-Net: A dual-encoder network for image splicing forgery detection and localization [![Static Badge](https://img.shields.io/badge/PR_'24-ffc107)](https://arxiv.org/abs/2012.01821) * [ ] UGEE-Net: Uncertainty-Guided and Edge-Enhanced Network for Image Splicing Localization (_Neural Networks '24_) **[[Paper]( https://doi.org/10.1016/j.neunet.2024.106430)]** **[[Dataset](https://github.com/QixianHao/-HTSI12K-dataset)]** * [ ] Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics (_arXiv '24_) **[[Paper](https://arxiv.org/abs/2404.16296)]** -* [ ] A Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler +* [ ] A Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler (_arXiv '24_) **[[Paper](https://arxiv.org/abs/2401.06995)]** * [ ] Feature Aggregation and Region-Aware Learning for Detection of Splicing Forgery _(SPL '24)_ **[[Paper](https://ieeexplore.ieee.org/abstract/document/10378732/)]** * [ ] Towards Effective Image Forensics via A Novel Computationally Efficient Framework and A New Image Splice Dataset _( Signal, Image and Video Processing (IF: 2.3, not included in CCFs), '24 )_ **[[Paper](https://arxiv.org/abs/2401.06998)]** @@ -288,6 +292,7 @@ _Some of the above papers also contain methods to detect tampered images generat **人脸篡改**,篡改方法以及检测问题 +* [ ] Can We Leave Deepfake Data Behind in Training Deepfake Detector? [![arXiv](https://img.shields.io/badge/arXiv_'24-6c757d)](http://arxiv.org/abs/2408.17052) * [ ] Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [![arXiv](https://img.shields.io/badge/arXiv_'24-6c757d)](http://arxiv.org/abs/2408.12791) * [ ] Hierarchical Forgery Classifier On Multi-modality Face Forgery Clues [![Static Badge](https://img.shields.io/badge/TMM_'24-ffc107)](https://arxiv.org/abs/2212.14629) [![GitHub](https://img.shields.io/github/stars/EdWhites/HFC-MFFD?style=flat)](https://github.com/EdWhites/HFC-MFFD) * [ ] Identity-Driven Multimedia Forgery Detection via Reference Assistance [![paper](https://img.shields.io/badge/MM_'24-dc3545)](https://openreview.net/forum?id=aspe8HE0ZA) @@ -376,6 +381,8 @@ Low-level tasks include super-resolution, denoise, dehze, low-light enhancement, > > \*Equal contribution. #Corresponding author. +* [ ] Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models [![arXiv](https://img.shields.io/badge/CVPR_'24-dc3545)](https://openaccess.thecvf.com/content/CVPR2024/html/Wu_Q-Instruct_Improving_Low-level_Visual_Abilities_for_Multi-modality_Foundation_Models_CVPR_2024_paper.html) [![GitHub](https://img.shields.io/github/stars/qcf-568/MIML?style=flat)](https://github.com/Q-Future/Q-Instruct/) + * [ ] (**EVP**) Explicit Visual Prompting for Low-Level Structure Segmentations (_CVPR '23_) [📖](https://arxiv.org/abs/2303.10883), [👨‍💻](https://github.com/NiFangBaAGe/Explicit-Visual-Prompt) (_including defocus blur, shadow, forgery, camouflaged dection_) > [Weihuang Liu](https://github.com/nifangbaage)1, [Xi Shen](https://xishen0220.github.io/)2, [Chi-Man Pun](https://www.cis.um.edu.mo/\~cmpun/)#,1, [Xiaodong Cun](https://vinthony.github.io/)#,2 @@ -439,6 +446,11 @@ Low-level tasks include super-resolution, denoise, dehze, low-light enhancement, * [ ] PromptAD: Zero-Shot Anomaly Detection Using Text Prompts _(WACV '24)_ **[[Paper](https://openaccess.thecvf.com/content/WACV2024/html/Li_PromptAD_Zero-Shot_Anomaly_Detection_Using_Text_Prompts_WACV_2024_paper.html)]** * [ ] Holistic Representation Learning for Multitask Trajectory Anomaly Detection _(WACV '24)_ **[[Paper](https://arxiv.org/abs/2311.01851)]** **[[Code](https://alexandrosstergiou.github.io/project_pages/TrajREC/index.html)]** +### Image Steganography + +* [ ] Finding Incompatible Blocks for Reliable JPEG Steganalysis [![paper](https://img.shields.io/badge/TIFS_'24-dc3545)](https://arxiv.org/abs/2402.13660) +* [ ] LiDiNet: A Lightweight Deep Invertible Network for Image-in-Image Steganography [![paper](https://img.shields.io/badge/TIFS_'24-dc3545)](https://doi.org/10.1109/TIFS.2024.3463547) + ### Useful Links 1. IJCAI 2024 Main Track Accepted Papers diff --git a/SUMMARY.md b/SUMMARY.md index cc33327..819c575 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -24,7 +24,8 @@ ## Image Forgery * [2024](image-forgery/2024/README.md) - + * [ForgeryTTT](image-forgery/2024/forgeryttt.md) + * [ForgeryGPT](image-forgery/2024/forgerygpt.md) * [2023](image-forgery/2023/README.md) * [HIFI_IFDL](image-forgery/2023/hifi_ifdl.md) * [ReLoC](image-forgery/2023/reloc.md) @@ -33,7 +34,6 @@ * [TruFor](image-forgery/2023/trufor.md) * [CFL-Net](image-forgery/2023/cfl-net.md) * [ERMPC](image-forgery/2023/ermpc.md) - * [2022](image-forgery/2022/README.md) * [ObjectFormer](image-forgery/2022/objectformer.md) * [IF-OSN](image-forgery/2022/ifosn.md) @@ -43,13 +43,10 @@ * [MSFF-AES](image-forgery/2022/msff.md) * [CA-IFL](image-forgery/2022/caifl.md) * [CAT-Net v2](image-forgery/2022/catnetv2.md) - * [2021](image-forgery/2021/README.md) * [TransForensics](image-forgery/2021/transforensics.md) - * [2020](image-forgery/2020/README.md) * [GSRNet](image-forgery/2020/gsrnet.md) - * [2019](image-forgery/2019/README.md) * [HLED](image-forgery/2019/hled.md) diff --git a/image-forgery/2024/README.md b/image-forgery/2024/README.md index 511282d..6c634bf 100644 --- a/image-forgery/2024/README.md +++ b/image-forgery/2024/README.md @@ -1,9 +1,9 @@ # 2024 -* [ ] ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.04032) -* [ ] ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.10238) -* [ ] FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2404.13306) [![GitHub](https://img.shields.io/github/stars/Yixuan423/FakeBench?style=flat)](https://github.com/Yixuan423/FakeBench) -* [ ] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.02761) [![GitHub](https://img.shields.io/github/stars/zhipeixu/FakeShield?style=flat)](https://github.com/zhipeixu/FakeShield) +* [x] [ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training](forgeryttt.md) [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.04032) +* [x] [ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization](forgerygpt.md) [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.10238) +* [x] [FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models](fakebench.md) [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2404.13306) [![GitHub](https://img.shields.io/github/stars/Yixuan423/FakeBench?style=flat)](https://github.com/Yixuan423/FakeBench) +* [x] [FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models](fakeshield.md) [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2410.02761) [![GitHub](https://img.shields.io/github/stars/zhipeixu/FakeShield?style=flat)](https://github.com/zhipeixu/FakeShield) * [ ] EL-FDL: Improving Image Forgery Detection and Localization via Ensemble Learning [![conf](https://img.shields.io/badge/ICANN_'24-28a745)](https://link.springer.com/chapter/10.1007/978-3-031-72335-3_17) * [ ] Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [![paper](https://img.shields.io/badge/IJCV_'24-dc3545)](https://arxiv.org/abs/2309.09667) * [ ] Detecting and Grounding Multi-Modal Media Manipulation and Beyond [![paper](https://img.shields.io/badge/TPAMI_'24-dc3545)](https://ieeexplore.ieee.org/abstract/document/10440475/) [![GitHub](https://img.shields.io/github/stars/rshaojimmy/MultiModal-DeepFake?style=flat)](https://github.com/rshaojimmy/MultiModal-DeepFake) @@ -23,20 +23,20 @@ * [ ] EC-Net: General image tampering localization network based on edge distribution guidance and contrastive learning [![Static Badge](https://img.shields.io/badge/KBS_'24-28a745)](https://doi.org/10.1016/j.knosys.2024.111656) * [ ] Frequency-constrained transferable adversarial attack on image manipulation detection and localization [![Static Badge](https://img.shields.io/badge/TVC_'24-28a745)](https://link.springer.com/article/10.1007/s00371-024-03482-4) * [ ] A Contribution-Aware Noise Feature representation model for image manipulation localization [![Static Badge](https://img.shields.io/badge/KBS_'24-28a745)](https://doi.org/10.1016/j.knosys.2024.111988) -* [ ] Effective Image Tampering Localization via Enhanced Transformer and Co-attention Fusion [![Static Badge](https://img.shields.io/badge/ICASSP_'24-ffc107)](https://arxiv.org/abs/2309.09306) [![GitHub](https://img.shields.io/github/stars/multimediaFor/EITLNet?style=flat)](https://github.com/multimediaFor/EITLNet) +* [x] [Effective Image Tampering Localization via Enhanced Transformer and Co-attention Fusion](eitlnet.md) [![Static Badge](https://img.shields.io/badge/ICASSP_'24-ffc107)](https://arxiv.org/abs/2309.09306) [![GitHub](https://img.shields.io/github/stars/multimediaFor/EITLNet?style=flat)](https://github.com/multimediaFor/EITLNet) * [ ] PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2401.00653) * [ ] Diffusion models meet image counter-forensics [![Static Badge](https://img.shields.io/badge/WACV_'24-ffc107)](https://arxiv.org/abs/2311.13629) [![GitHub](https://img.shields.io/github/stars/mtailanian/diff-cf?style=flat)](https://github.com/mtailanian/diff-cf) * [ ] Research about the Ability of LLM in the Tamper-Detection Area [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2401.13504) * [ ] Deep Image Restoration For Image Anti-Forensics [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2405.02751) [![GitHub](https://img.shields.io/github/stars/99eren99/DIRFIAF?style=flat)](https://github.com/99eren99/DIRFIAF) * [ ] Deep Image Composition Meets Image Forgery [![Static Badge](https://img.shields.io/badge/arXiv_'24-6c757d)](https://arxiv.org/abs/2404.02897) [![GitHub](https://img.shields.io/github/stars/99eren99/DIS25k?style=flat)](https://github.com/99eren99/DIS25k) -* [ ] Fusion Transformer with Object Mask Guidance for Image Forgery Analysis [![Static Badge](https://img.shields.io/badge/CVPRW_'24-dc3545)](https://arxiv.org/abs/2403.12229) [![GitHub](https://img.shields.io/github/stars/mever-team/omgfuser?style=flat)](https://github.com/mever-team/omgfuser) +* [ ] [Fusion Transformer with Object Mask Guidance for Image Forgery Analysis](omg-fuser.md) [![Static Badge](https://img.shields.io/badge/CVPRW_'24-dc3545)](https://arxiv.org/abs/2403.12229) [![GitHub](https://img.shields.io/github/stars/mever-team/omgfuser?style=flat)](https://github.com/mever-team/omgfuser) * [ ] Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization [![arXiv](https://img.shields.io/badge/MMM_'24-28a745)](https://arxiv.org/abs/2312.01790) [![GitHub](https://img.shields.io/github/stars/idt-iti/mmfusion-iml?style=flat)](https://github.com/idt-iti/mmfusion-iml) -* [ ] A New Benchmark and Model for Challenging Image Manipulation Detection [![arXiv](https://img.shields.io/badge/AAAI_'24-dc3545)](https://arxiv.org/abs/2311.14218) [![GitHub](https://img.shields.io/github/stars/ZhenfeiZ/CIMD?style=flat)](https://github.com/ZhenfeiZ/CIMD) +* [x] [A New Benchmark and Model for Challenging Image Manipulation Detection](cimd.md) [![arXiv](https://img.shields.io/badge/AAAI_'24-dc3545)](https://arxiv.org/abs/2311.14218) [![GitHub](https://img.shields.io/github/stars/ZhenfeiZ/CIMD?style=flat)](https://github.com/ZhenfeiZ/CIMD) * [ ] MGQFormer: Mask-Guided Query-Based Transformer for Image Manipulation Localization [![arXiv](https://img.shields.io/badge/AAAI_'24-dc3545)](https://ojs.aaai.org/index.php/AAAI/article/view/28520) [![arXiv](https://img.shields.io/badge/News-4096ff.svg)](https://dml.fudan.edu.cn/d1/65/c35285a643429/page.htm) * [ ] Learning Discriminative Noise Guidance for Image Forgery Detection and Localization [![arXiv](https://img.shields.io/badge/AAAI_'24-dc3545)](https://ojs.aaai.org/index.php/AAAI/article/view/28608) * [ ] CatmullRom Splines-Based Regression for Image Forgery Localization [![arXiv](https://img.shields.io/badge/AAAI_'24-dc3545)](https://ojs.aaai.org/index.php/AAAI/article/view/28548) * [ ] UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization [![arXiv](https://img.shields.io/badge/CVPR_'24-dc3545)](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_UnionFormer_Unified-Learning_Transformer_with_Multi-View_Representation_for_Image_Manipulation_Detection_CVPR_2024_paper.pdf) -* [ ] Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods [![arXiv](https://img.shields.io/badge/CVPR_'24-dc3545)](https://openaccess.thecvf.com/content/CVPR2024/papers/Qu_Towards_Modern_Image_Manipulation_Localization_A_Large-Scale_Dataset_and_Novel_CVPR_2024_paper.pdf) [![GitHub](https://img.shields.io/github/stars/qcf-568/MIML?style=flat)](https://github.com/qcf-568/MIML) +* [x] [Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods](miml.md) [![arXiv](https://img.shields.io/badge/CVPR_'24-dc3545)](https://openaccess.thecvf.com/content/CVPR2024/papers/Qu_Towards_Modern_Image_Manipulation_Localization_A_Large-Scale_Dataset_and_Novel_CVPR_2024_paper.pdf) [![GitHub](https://img.shields.io/github/stars/qcf-568/MIML?style=flat)](https://github.com/qcf-568/MIML) * [ ] EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [![arXiv](https://img.shields.io/badge/CVPR_'24-dc3545)](https://arxiv.org/abs/2312.08883) [![GitHub](https://img.shields.io/github/stars/xuanyuzhang21/EditGuard?style=flat)](https://github.com/xuanyuzhang21/EditGuard) * [ ] DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization [![arXiv](https://img.shields.io/badge/CVPR_'24-dc3545)](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_DiffForensics_Leveraging_Diffusion_Prior_to_Image_Forgery_Detection_and_Localization_CVPR_2024_paper.pdf) * [ ] IML-ViT: Image Manipulation Localization by Vision Transformer [![arXiv](https://img.shields.io/badge/AAAI_'24-dc3545)](https://arxiv.org/abs/2307.14863) [![GitHub](https://img.shields.io/github/stars/SunnyHaze/IML-ViT?style=flat)](https://github.com/SunnyHaze/IML-ViT) diff --git a/image-forgery/2024/eitlnet.md b/image-forgery/2024/eitlnet.md index 1a7fef1..002b788 100644 --- a/image-forgery/2024/eitlnet.md +++ b/image-forgery/2024/eitlnet.md @@ -1,4 +1,75 @@ # EITLNet +![image-20241025100938723](https://s2.loli.net/2024/10/25/DjalEIcNUAwuKhm.png) +> Guo, K., Zhu, H., & Cao, G. (2024). Effective Image Tampering Localization Via Enhanced Transformer and Co-Attention Fusion. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), abs/2309.09306, 4895–4899. https://doi.org/10.1109/ICASSP48485.2024.10446332 +代码链接:https://github.com/multimediaFor/EITLNet + +作者使用的依然是transformer的编码器结构,Segformer 的backbone,通过基于注意力的特征融合对transformer做了进一步增强。基于 Mix Transformer encoder Mit-B2,从 RGB 和 noise stream(先通过cw-hpf channel-wise high pass filter) 中提取的特征再增强之后,经过基于注意力的融合模块在多尺度进行特征的融合。与其他同类别的方法不同的是,作者指出其他结合两个特征的方法,大多在网络的前面或者后面,忽视了两个模态之间的交互,解码器中注意力模块也往往只在单一的尺度。作者设计了 FE feature enhancement 和 CAF coordinate attention-based fusion 两个模块,分别用来增强解码器的特征表示能力和在多尺度有效整合 RGB 和 noise 特征。 + +![image-20241025102852159](https://s2.loli.net/2024/10/25/LQHv9CcoeDaKuSG.png) + +```python +class CoordAtt(nn.Module): + def __init__(self, inp, oup,reduction=32): + super(CoordAtt, self).__init__() + self.pool_h = nn.AdaptiveAvgPool2d((None, 1)) + self.pool_w = nn.AdaptiveAvgPool2d((1, None)) + + mip = max(8, inp // reduction) + + self.conv1 = nn.Conv2d(inp*2, mip, kernel_size=1, stride=1, padding=0) + self.bn1 = nn.BatchNorm2d(mip) + self.act = h_swish() + + self.conv_h = nn.Conv2d(mip, oup*2, kernel_size=1, stride=1, padding=0) + self.conv_w = nn.Conv2d(mip, oup*2, kernel_size=1, stride=1, padding=0) + + + def forward(self, x): + identity = x + + n, c, h, w = x.size() + x_h = self.pool_h(x) + x_w = self.pool_w(x).permute(0, 1, 3, 2) + + y = torch.cat([x_h, x_w], dim=2) + y = self.conv1(y) + y = self.bn1(y) + y = self.act(y) + + x_h, x_w = torch.split(y, [h, w], dim=2) + x_w = x_w.permute(0, 1, 3, 2) + + a_h = self.conv_h(x_h).sigmoid() + a_w = self.conv_w(x_w).sigmoid() + + out = identity * a_w * a_h + + return out +``` + +### Feature enhanced module + + + + + +### Coordinate Attention-based Fusion Module + + + + + +## 结果 + +![image-20241025103758595](https://s2.loli.net/2024/10/25/MYx9zPKHRshCQWk.png) + +作者在 Columbia(2006) casiA-v1(2013) DSO(2013) NIST(2019) Coverage(2016) IMD(2020) 这些数据集上进行测试,与上述几个方法进行比较,都属于比较出色的表现。但这几个数据集都是比较老的数据集,不过除了Columbia 被打败了,其他的数据集的结果也确实还有很多提升的空间。我拿这个方法用在 CIMD 上提出的那个测试集去测试,发现结果也不如 CIMD 那篇论文报告的,不过那篇论文还没有公开代码。 + +而在ablation study上作者很好的证明了自己提出的两个模块,FE 和 CAF 是有作用的,虽然好像有没有 FE 提升不算特别明显。 + +除此之外,作者还针对鲁棒性评估做了一些实验,即将测试集的图片经过Facebook 微博 微信 WhatsApp四个社交媒体压缩后(压缩后的测试集是由此前的 OSN 工作提出),进一步评估其表现。 + +作者在介绍中表示一些非内容的篡改不影响语义信息,所以没有考虑jpeg 压缩、光照、高斯模糊等篡改操作。 diff --git a/image-forgery/2024/fakebench.md b/image-forgery/2024/fakebench.md new file mode 100644 index 0000000..eccc58e --- /dev/null +++ b/image-forgery/2024/fakebench.md @@ -0,0 +1,25 @@ +# FakeBench + +![image-20241027170055242](https://s2.loli.net/2024/10/27/ioRJIBfF4H8Z9CV.png) + +> Li, Y., Liu, X., Wang, X., Lee, B. S., Wang, S., Rocha, A., & Lin, W. (2024). FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models. In arXiv. + +代码链接:https://github.com/Yixuan423/FakeBench + +文章提到现在的篡改检测方法只给出一个mask,对大众来说其实并不友好,在图像取证工作上不够有说服力;同时大语言模型对图像文本的处理能力却无法应用到篡改检测这样的任务中,于是作者从人类感知的角度提出了 FakeBench 这一包含文本真实性描述的多模态数据集。FakeBench 使用了人类参与的评估标准,检测、推理、解释和细粒度伪造分析来检验大语言模型,来获得对图像篡改检测能力更加深入的理解。作者也在不同的14个大语言模型,例如 GPT-4V,Q-Instruct、Gemini Pro、LLaVA-v1.5、Qwen-VL等模型上进行了实验,以阐释这项研究把图像篡改检测从黑盒子向更透明的方向转变的范式。 + +主要贡献: + +1. 针对可解释性的图像取证评估,探索LLM对于检测、推理、解释和细粒度篡改分析的数据集。 +2. 提出了一个针对生成篡改图片的细粒度分类方法, +3. 在14个大语言模型上进行实验,得到了关于模型表现,提示词,调优等影响的结果。 + +![image-20241029220624557](https://s2.loli.net/2024/10/29/C629Sn5ywBNjIe4.png) + +与其他数据集的对比,可以看到FakeBench更多的关注还是deepfake以及AIGC等方面的篡改。 + +![image-20241029220745686](https://s2.loli.net/2024/10/29/9Jgwateo4svWuFN.png) + +这张图也显示了FakeBench里面的图片详情,它里面的图片都是使用 ProGAN,StyleGAN,DALL.E2,Midjournery 等GAN、DM或者专门的方法生成了,也包含了人像,艺术作品,风景,数字简笔画等类别。 + +在FakeBench里面除了图片,作者是分成了 FakeClaskk,FakeClue和FakeQA 三个部分,分别对应作者提到了白盒取证里面的检测,因果调查和细粒度分析三个部分。 diff --git a/image-forgery/2024/fakeshield.md b/image-forgery/2024/fakeshield.md new file mode 100644 index 0000000..dccd704 --- /dev/null +++ b/image-forgery/2024/fakeshield.md @@ -0,0 +1,7 @@ +# FakeShield + +![image-20241027170153062](https://s2.loli.net/2024/10/27/SfXiw6PhYks78ao.png) + +> Xu, Z., Zhang, X., Li, R., Tang, Z., Huang, Q., & Zhang, J. (2024). FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. In arXiv. + +代码链接:https://github.com/zhipeixu/FakeShield \ No newline at end of file diff --git a/image-forgery/2024/forgerygpt.md b/image-forgery/2024/forgerygpt.md new file mode 100644 index 0000000..b31c7f3 --- /dev/null +++ b/image-forgery/2024/forgerygpt.md @@ -0,0 +1,6 @@ +# ForgeryGPT + +![image-20241027163430079](https://s2.loli.net/2024/10/27/b5Z6nHTwCIOc2pQ.png) + +> Li, J., Zhang, F., Zhu, J., Sun, E., Zhang, Q., & Zha, Z.-J. (2024). ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization. In arXiv. + diff --git a/image-forgery/2024/forgeryttt.md b/image-forgery/2024/forgeryttt.md new file mode 100644 index 0000000..ea6bbbe --- /dev/null +++ b/image-forgery/2024/forgeryttt.md @@ -0,0 +1,6 @@ +# ForgeryTTT + +![image-20241027163141744](https://s2.loli.net/2024/10/27/6lXVdHKonwiZGDh.png) + +> Liu, W., Shen, X., Pun, C.-M., & Cun, X. (2024). ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training. In arXiv. + diff --git a/image-forgery/2024/miml.md b/image-forgery/2024/miml.md new file mode 100644 index 0000000..9e9ee1b --- /dev/null +++ b/image-forgery/2024/miml.md @@ -0,0 +1,7 @@ +# MIML + +![image-20241027162707114](https://s2.loli.net/2024/10/27/HUv4P8uJxLXY7iW.png) + +> Qu, C., Zhong, Y., Liu, C., Xu, G., Peng, D., Guo, F., & Jin, L. (2024). Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). + +代码链接:https://github.com/qcf-568/MIML \ No newline at end of file diff --git a/image-forgery/2024/omg-fuser.md b/image-forgery/2024/omg-fuser.md new file mode 100644 index 0000000..f981d4e --- /dev/null +++ b/image-forgery/2024/omg-fuser.md @@ -0,0 +1,5 @@ +# OMG-Fuser + + + +代码链接:https://github.com/mever-team/omgfuser \ No newline at end of file diff --git a/image-splicing/mstaf.md b/image-splicing/mstaf.md new file mode 100644 index 0000000..26bc1fb --- /dev/null +++ b/image-splicing/mstaf.md @@ -0,0 +1,30 @@ +# MSTAF + +> Tan, Y., Li, Y., Zeng, L., Ye, J., Wang, W., & Li, X. (2023). Multi-scale Target-Aware Framework for Constrained Splicing Detection and Localization. Proceedings of the 31st ACM International Conference on Multimedia, abs/2308.09357. https://doi.org/10.1145/3581783.3613763 + +![image-20231112103316516](https://s2.loli.net/2023/11/12/ynqRs5ILNtXalTo.png) + +## 摘要 + +作者观察到现有的受限图像拼接篡改检测和定位的网络框架会将**特征提取**和**相关性匹配**作为独立的过程来设计,作者认为这会阻碍模型学习用于匹配具有区分性的特征。于是作者提出多尺度目标感知框架,将特征提取和相关性匹配融合在一个统一的流程中。目标感知的注意力机制能够共同学习和促进在供体和受体图片上的特征和相关性匹配。同时为了实现多尺度的变换,作者引入了一种多尺度投影方法,可以使注意力过程在不同尺度信息的标记之间进行,并轻松集成到目标感知框架中。 + +![image-20231112103038565](https://s2.loli.net/2023/11/12/gP2er8QAniG1fat.png) + +1. 动机/要解决的问题:将特征提取与相关性匹配分开会无法利用两幅图像之间的相似图像块的相关性来感知目标特征;而在匹配上,独立的过程也也不能提供两张图片特征信息的交流。 +2. 框架的启发来源:MixFormer (CVPR '22) 和 SimTrack (ECCV '22),根据前面提到的问题输入两张图片进行检测而不是在一个统一的过程,作者从这两个框架中得到启发,设计了统一的框架,将特征学习和相关性的匹配结合起来。在做相关性匹配的同时,模型也能在特征学习接管感知到相关区域,然后根据注意力机制来抑制这些非相关和背景区域的影响。 +3. 框架的重要部分:和一般的多头注意力机制不同,**目标感知注意力机制** 将两个输入图片的特征作为输入,然后将注意力过程分为两个头,另一个头用来出来交叉注意力以计算相关性匹配和提取另一张图片的特征。 +4. 实验与结果:作者自己根据先前的方法用 MS COCO生成了一些训练的数据集;用户自己合成的数据集进行了测试,CAISA 数据集,以及MFC2018这三个数据集进行评估,评价的指标采用了 IoU,NMM,以及MCC来评估定位表现,而检测表现则是使用了Precision,Recall,以及F1.在ablation上,作者主要考虑了用分开的特征提取和相关性匹配以及用户统一的管道,然后用大小不等的数据集来分析了模型针对尺寸变换的鲁棒性。 + +## 框架 + +![image-20231112103226759](https://s2.loli.net/2023/11/12/p1rRSHbhw6gjtsK.png) + +![image-20231115090958880](https://s2.loli.net/2023/11/15/Ptfya9YSvzCcuNQ.png) + +![image-20231115091032027](https://s2.loli.net/2023/11/15/QBzxanhwVAypkRd.png) + +## 结果 + +![image-20231115090655196](https://s2.loli.net/2023/11/15/5vQuzeH3M1SFwP4.png) + +![image-20231115090715877](https://s2.loli.net/2023/11/15/vboGI3fLZCpSBrt.png) \ No newline at end of file