Skip to content

Commit

Permalink
Done with all checks
Browse files Browse the repository at this point in the history
  • Loading branch information
msali123 committed Apr 30, 2024
1 parent d523491 commit 0b0faa7
Show file tree
Hide file tree
Showing 12 changed files with 253 additions and 6 deletions.
4 changes: 3 additions & 1 deletion .github/styles/HouseStyle/tech-terms/db.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ timedelta
sqlalchemy
ssn

sql
sql

groupby
9 changes: 8 additions & 1 deletion .github/styles/HouseStyle/tech-terms/general.txt
Original file line number Diff line number Diff line change
Expand Up @@ -651,4 +651,11 @@ microframeworks
minimalistic
upscaling
Microframeworks
microframework
microframework

Scrapy
Colab
statsmodels
Keras
seaborn
Bokeh
6 changes: 5 additions & 1 deletion blog/_data/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -592,4 +592,8 @@ Vivek Singh:
Jura Gorohovsky:
name : "Jura Gorohovsky"
bio : "Jura is an experienced product manager and marketer, technical writer, and amateur software developer. After a decade at JetBrains in various capacities, he believes he knows a thing or two about developer tools."
avatar : "/assets/images/authors/jura-gorohovsky.jpg"
avatar : "/assets/images/authors/jura-gorohovsky.jpg"
Alen Kalac:
name : "Alen Kalac"
bio : "Alen is a data scientist working in finance. He's a freelance data scientist, too, and writes about data science and machine learning."
avatar : "/assets/images/authors/person.jpg"
6 changes: 3 additions & 3 deletions blog/_posts/2024-04-26-python-web-scraping.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Beautiful Soup is a Python library used to extract text from HTML and XML data.

In contrast, Selenium is a browser automation tool that interacts with a website programmatically. It retrieves information by replicating user interactions like keyboard input and mouse clicks. It helps users gather data rendered by client-side JavaScript or data behind paywalls.

## Implementing Web Scraping with Python
## Implementing Web Scraping With Python

So, let's create a web scraping solution from scratch with both Selenium and Beautiful Soup. All the code is available in [this GitHub repository](https://github.com/vivekthedev/python-web-scraping-tutorial).

Expand Down Expand Up @@ -81,7 +81,7 @@ To view the classes or IDs of individual books, you need to click the arrow icon

You can access an element's details simply by hovering over it.

We will scrape the URL, title, and price of each book. Here, you can see that each book is encapsulated within an `<li>` tag:
We will scrape the URL, title, and price of each book. Here, you can see that each book is encapsulated within an `<li>` tag:

~~~{.html caption=""}
<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">
Expand Down Expand Up @@ -209,7 +209,7 @@ If you're looking to scrape dynamic content or content rendered with JavaScript,

Let's scrape the [freeCodeCamp YouTube channel](https://www.youtube.com/@freecodecamp/videos). This involves scraping the top hundred recently uploaded videos, capturing each video's URL, title, duration, upload info, and views. Scraping YouTube videos is a complex process because the content is dynamically loaded as users scroll down to view more videos. Selenium handles this dynamic behavior by replicating user scroll interactions. As before, you must analyze the website to pinpoint all the elements that you want to scrape.

#### Analyzing the freeCodeCamp YouTube Channel
#### Analyzing the FreeCodeCamp YouTube Channel

When you access developer tools for the freeCodeCamp YouTube channel, you'll notice the structure of the HTML is much more complicated than for the Books to Scrape website. As before, utilize the hover-to-inspect feature to pinpoint the tags responsible for rendering each web element:

Expand Down
234 changes: 234 additions & 0 deletions blog/_posts/2024-04-30-python-libraries.md

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/assets/images/python-libraries/I7H1Peq.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/assets/images/python-libraries/Tq4EaIF.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/assets/images/python-libraries/header.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/assets/images/python-libraries/rDpyFSR.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/assets/images/python-libraries/vt0crHe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/assets/images/python-libraries/xjI2urK.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0b0faa7

Please sign in to comment.