Skip to content

alokkr016/LatestStories

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Time.com Latest Stories API

Project Overview

This Spring Boot application uses Selenium to scrape the latest story headlines from Time.com. The scraped data is returned via a REST API, providing both the titles and links of the latest stories in a JSON format.

Features

  • Web Scraping with Selenium: Automatically opens Time.com, scrolls to the latest stories, and extracts their headlines and links.
  • REST API: Exposes an endpoint to retrieve the latest story headlines in JSON format.

Example API response:

[
    {
        "link": "https://time.com/7023265/israel-gaza-school-strike/",
        "title": "Israeli Strike on a School Kills at Least 22, Gaza Health Ministry Says"
    },
    {
        "link": "https://time.com/7023258/arizona-supreme-court-citizenship-vote-ruling/",
        "title": "Arizonans Whose Citizenship Hadn't Been Confirmed Can Vote"
    },
    {
        "link": "https://time.com/7023255/stairs-exercise-calories-weight-loss/",
        "title": "Climbing Stairs Might Be the Most Effective Exercise for You"
    },
    {
        "link": "https://time.com/7022935/kamala-harris-israel-gaza-west-bank/",
        "title": "How Kamala Harris Can Craft a Fair Middle East Strategy"
    },
    {
        "link": "https://time.com/7023226/pesto-penguin-australia/",
        "title": "Move Over Moo Deng, Meet Pesto the Penguin"
    },
    {
        "link": "https://time.com/7023138/sean-diddy-combs-music-back-catalog/",
        "title": "What Happens to Diddy's Back Catalog?"
    }
]

Technologies Used

  • Spring Boot: Java-based framework for building the REST API.
  • Selenium: Automates the browser for web scraping.
  • ChromeDriver: WebDriver for controlling the Chrome browser.
  • WebDriverManager: Simplifies the setup and management of WebDriver binaries.

How It Works

  1. Scraping Process:

    • The application launches Chrome in headless mode.
    • It navigates to Time.com.
    • Scrolls to the latest stories section and extracts the headlines and their respective links.
  2. API Endpoint:

    • The GET /latest-stories endpoint returns the latest stories as a list of JSON objects containing the title and link.

How to Run

No installation is required beyond running a typical Spring Boot application.

  1. Clone the repository:

    git clone https://github.com/alokkr016/LatestStories.git
    cd LatestStories
  2. Run the application: You can run the Spring Boot app using Maven or Gradle:

    ./mvnw spring-boot:run
  3. Access the API: Once the application is running, visit the following endpoint in your browser or via tools like Postman or curl:

    http://localhost:8080/latest-stories
    

Future Enhancements

  • Add error handling for cases where the page structure changes.
  • Implement retry mechanisms for failed scraping attempts.
  • Add caching to reduce the number of requests made to the site.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages