This project automates the process of web scraping fridges data from major e-commerce platforms Exito, Falabella, Alkosto, and Sodimac using Selenium and stores the scraped data in a PostgreSQL database. The data is visualized by Power Bi. Final Dashboard is available to consult here or in the .pbix file. The project is developed in Python and aims to streamline data collection for market analysis and research.
This project involves developing a PostgreSQL database by web scraping product data from the following e-commerce websites:
- Exito
- Falabella
- Alkosto
- Sodimac
The data is collected using Selenium for automation and then processed and stored in a PostgreSQL database. This facilitates market analysis and data-driven decision-making.
- Python 3.11.2
- PostgreSQL 16.3, compiled by Visual C++ build 1938, 64-bit
- Google Chrome or Mozilla Firefox
It's recommended to use a virtual environment to manage dependencies. You can create and activate a virtual environment as follows:
# Create a virtual environment
python -m venv venv
# Activate the virtual environment (Windows)
.\venv\Scripts\activate
# Activate the virtual environment (macOS/Linux)
source venv/bin/activate
Install the required packages using the requirements.txt file:
pip install -r requirements.txt
In order to run the codes, several variables need to be set in order to run
- PYTHONPATH (use src has working directory)
- POSTGRES_PASSWORD
- POSTGRES_PORT
- POSTGRES_DB
- POSTGRES_SERVER
- USER_AGENT
You can run the scraping scripts using either Chrome or Firefox. Ensure that the appropriate WebDriver (e.g., chromedriver or geckodriver) is installed and added to your PATH.
Example command to run a script:
python alkosto_scraper.py
Full run: The script may take between 15 minutes to 1 hour to complete.
Quick run: Scraping about 50 links per store/script takes approximately 5 to 10 minutes.
The scraping scripts are compatible with both Google Chrome and Mozilla Firefox. Make sure you have the corresponding WebDriver:
Chrome: Download Chromedriver
Firefox: Download Geckodriver
The scripts will scrape product details such as:
Product name
Price
space (liters)
energy consumption
Product URL
The following entity diagram shows the expected output of information scraped from websites:
The data is then saved into the PostgreSQL database configured in your setup.
To view the interactive project dashboard:
You can view the interactive dashboard here.