Skip to content

Commit

Permalink
Merge pull request #38 from UI-Research/web-scraping-workshop
Browse files Browse the repository at this point in the history
Web scraping workshop
  • Loading branch information
judah-axelrod authored May 3, 2024
2 parents 4e91e98 + 6ffbb2b commit c341e0f
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -546,9 +546,10 @@ <h2>A note on ~AI~</h2>
<h2>Homework: Installations for Next Time</h2>
<ul>
<li>Install Python via Anaconda - see guidance from PUG’s Python Installation training <a href="https://ui-research.github.io/python-at-urban/content/installation.html">here</a></li>
<li>Install the following Python packages: <code>requests</code>, <code>beautifulsoup4</code>, <code>lxml</code>, and <code>selenium</code></li>
<li>Install the following Python packages: <code>requests</code>, <code>beautifulsoup4</code>, <code>lxml</code>, <code>selenium</code>, and <code>webdriver-manager</code>.</li>
<li>Launch a new Jupyter Notebook if you’ve never done so before - see guidance from PUG’s Intro to Python training <a href="https://ui-research.github.io/python-at-urban/content/intro-to-python.html">here</a></li>
<li>If you have any issues, please use the #python-users channel and we’d love to help. Someone else probably has the same question!</li>
<li>Sign up for GitHub using <a href="https://ui-research.github.io/reproducibility-at-urban/git-installation.html">this guide</a> if you haven’t so that you can access these workshop materials!</li>
</ul>
</section>
<section id="next-session" class="slide level2">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@ sm.report('display')

## Homework: Installations for Next Time
- Install Python via Anaconda - see guidance from PUG's Python Installation training [here](https://ui-research.github.io/python-at-urban/content/installation.html)
- Install the following Python packages: `requests`, `beautifulsoup4`, `lxml`, and `selenium`
- Install the following Python packages: `requests`, `beautifulsoup4`, `lxml`, `selenium`, and `webdriver-manager`.
- Launch a new Jupyter Notebook if you've never done so before - see guidance from PUG's Intro to Python training [here](https://ui-research.github.io/python-at-urban/content/intro-to-python.html)
- If you have any issues, please use the #python-users channel and we'd love to help. Someone else probably has the same question!
- Sign up for GitHub using [this guide](https://ui-research.github.io/reproducibility-at-urban/git-installation.html) if you haven't so that you can access these workshop materials!

## Next Session
- How to scrape text from static webpages using BeautifulSoup
Expand Down
12 changes: 10 additions & 2 deletions site/content/web-scraping-dynamic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,22 @@
"from selenium.webdriver.common.by import By\n",
"from selenium.webdriver.chrome.service import Service \n",
"from webdriver_manager.chrome import ChromeDriverManager\n",
"from selenium.webdriver.chrome.options import Options\n",
"\n",
"## NOTE: Some users may want to try a Firefox Driver instead;\n",
"## Can comment above two lines and uncomment the below two lines\n",
"# from selenium.webdriver.firefox.service import Service\n",
"# from webdriver_manager.firefox import GeckoDriverManager\n",
"from selenium.webdriver.support import expected_conditions as EC\n",
"from selenium.webdriver.support.ui import Select, WebDriverWait\n",
"import pandas as pd\n",
"import time"
"import time\n",
"\n",
"# Set Chrome options - NOTE: you can remove these options and still have the code work when running things locally\n",
"options = Options()\n",
"options.add_argument(\"--headless\") # Run Chrome in headless mode\n",
"options.add_argument(\"--no-sandbox\")\n",
"options.add_argument(\"--disable-dev-shm-usage\")"
]
},
{
Expand All @@ -71,7 +79,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "3b5b3848",
"metadata": {},
"outputs": [],
Expand Down

0 comments on commit c341e0f

Please sign in to comment.