This project performs sentiment analysis on Sephora skincare reviews using a pre-trained DistilBERT model fine-tuned on a custom dataset.
Sentiment analysis is a natural language processing technique used to determine the sentiment expressed in a piece of text. In this project, we use a pre-trained DistilBERT model from Hugging Face to perform sentiment analysis on Sephora skincare product reviews. The model is fine-tuned on a custom dataset to improve performance on the specific domain.
The dataset used in this project is available on Kaggle: Sephora Products and Skincare Reviews.
The dataset contains various columns, such as product_id, user_id, rating, and review_text. For our sentiment analysis task, we focus on the review_text
column.
To run this project, you will need the following libraries:
- Python 3.7 or later
- PyTorch 1.9.0 or later
- TorchText 0.6.0
- Transformers 4.0.0 or later
- Pandas
- CUDA (if using a GPU)
- Clone this repository:
git clone https://github.com/yourusername/sephora-skincare-reviews-sentiment-analysis.git
- Change to the project directory:
cd sephora-skincare-reviews-sentiment-analysis
- Install the required libraries:
pip install -r requirements.txt
- Download the dataset from Kaggle and place it in the
data
folder.
- Preprocess the dataset by running
preprocess_data.py
:
python preprocess_data.py
This script will split the dataset into training, validation, and test sets.
2. Fine-tune the DistilBERT model using the training data by running train.py
:
python train.py
- Evaluate the model on the test data by running
evaluate.py
:
python evaluate.py
- To perform sentiment analysis on new data, use the
predict_sentiment
function in thepredict.py
script.
-
Download or open the .ipynb file in a notebook of your choice
-
Run each cell in a sequential order
-
To perform sentiment analysis on new data, use the
predict_sentiment
function from the dedicated cell at the end
This project is licensed under the MIT License. See the LICENSE file for details.