Multimodal Article Summarization

This project aims to perform multimodal article summarization using pretrained models. This project was done with Prof Vasudev Varma and Balaji Vasan Srinivasan (Adobe). We leverage Pretrained models to perform summarization of articles. Unlike previous methods, our method takes both the article and image as input and the output is a text summary

Detailed Report

Method

In this codebase we leverage OSCAR as our pretrained encoder and GPT2 as our pretrained decoder. We use nucleus sampling to generate text. OSCAR constructs a shared image-text embedding and minimizes distance b/w the Faster-RCNN features of the object and the corresponding word embedding. However you can replace OSCAR with any other visio-linguistic transformer like LXMERT, UNITER,etc. Similarly you can replace GPT2LMHead with any other LM head to generate logits. The components are extremely modular.

Installation

This codebase uses vilio library as our backbone which inturn uses huggingface-3.5.0. To install simply do pip3 install -r requirements.txt. Further instructions are present in GETTING_STARTED.md To run this code, simply run bash exp.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bash		bash
entry		entry
ernie-vil		ernie-vil
fts_lmdb		fts_lmdb
fts_tsv		fts_tsv
notebooks		notebooks
pretrain_code		pretrain_code
py-bottom-up-attention		py-bottom-up-attention
runs		runs
src/vilio		src/vilio
utils		utils
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
HM_CONCLUSION.md		HM_CONCLUSION.md
LICENSE		LICENSE
README.md		README.md
README_VILLIO.md		README_VILLIO.md
SCORE_REPRO.md		SCORE_REPRO.md
exp.sh		exp.sh
experiment.py		experiment.py
hm.py		hm.py
oscar.py		oscar.py
param.py		param.py
requirements.txt		requirements.txt
requirements_new.txt		requirements_new.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Article Summarization

Method

Installation

About

Releases

Packages

Languages

License

darthgera123/Multimodal-Summarization

Folders and files

Latest commit

History

Repository files navigation

Multimodal Article Summarization

Method

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages