Book Summaries

This repository is for book enthusiasts📕🪱

We have a text file of books called booksummaries.txt. Three main tasks need to be performed:

Exploratory Data Analysis (EDA):
- Identify and handle missing values in the dataset.
NLP Summarization:
- Implement NLP techniques to generate condensed summaries of books from the provided text data.
Computer Vision:
- Investigate methods for converting text summaries into images.
- Experiment with text-to-image models to visualize book summaries.

EDA Phase

In the first section, dedicated to the Exploratory Data Analysis (EDA) phase, I aimed to extract important information from each book using regex. Additionally, leveraging together.ai and the large language model Llama, I converted an unstructured file into two structured JSON files named bookinfo.json and bookInfo(withSummary).json.

Dataframe Creation

The obtained data allowed me to create a dataframe with the following columns:

Book Name: Name of each book.
Author Name: Name of the author.
Publication Date: Date when the book was published.
Book Genres: Genres of the book.
Summary: Summary of the book.

Data Cleaning

Using this dataframe, I performed data cleaning operations on the columns to ensure data quality and consistency.

Dataset Report

Finally, using the ydata_profiling library, I generated a report on the dataset, which can be found in the bookReport.html file.

Missing Values Handling

Various approaches were considered to address the issue of missing values. However, so far, the desired result has not been achieved. Detailed explanations can be found in the MissingValue.ipynb notebook. I tried using NER language models from Hugging Face🤗, but they fell short in performance and the powerful models demanded robust hardware resources

NLP Summarization

In the second phase, I experimented with language models tailored for summarization available on Hugging Face🤗. Specifically, I employed the facebook/bart-large-cnn model for compressing book summaries, yielding satisfactory results.

Computer Vision

For generating visual representations of book summaries, I utilized the Stable Diffusion model available on Hugging Face🤗. Stable Diffusion is a deep learning, text-to-image model released in 2022 from stability.ai based on diffusion techniques. It is considered to be a part of the ongoing artificial intelligence boom. Although this model provides excellent results, it demands substantial hardware resources. Due to memory constraints, I resorted to the Clipdrop platform, allowing me to generate images using the Stable Diffusion API.

Results

The first generated image is available below. Additionally, I provide some statistics regarding the dataset used for evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
img		img
.gitignore		.gitignore
ComputerVision.ipynb		ComputerVision.ipynb
EDA.ipynb		EDA.ipynb
MissingValue.ipynb		MissingValue.ipynb
NLP.ipynb		NLP.ipynb
README.md		README.md
bookInfo(withSummary).json		bookInfo(withSummary).json
bookInfo.xlsx		bookInfo.xlsx
bookReport.html		bookReport.html
bookinfo.json		bookinfo.json
booksummaries.txt		booksummaries.txt
refined_metadata.txt		refined_metadata.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Summaries

EDA Phase

Dataframe Creation

Data Cleaning

Dataset Report

Missing Values Handling

NLP Summarization

Computer Vision

Results

About

Releases

Packages

Languages

ShayanDarabi/Book-Summaries

Folders and files

Latest commit

History

Repository files navigation

Book Summaries

EDA Phase

Dataframe Creation

Data Cleaning

Dataset Report

Missing Values Handling

NLP Summarization

Computer Vision

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages