Skip to content

Dashboard Web Scraper

smsmith97 edited this page Mar 15, 2021 · 6 revisions

This will open every unique link of the longest postcodes/journeys for a given council_id (e.g. DER) as a command line argument. Only seems to be working with Chrome, requires selenium.

N.B.: Recently webdriver is unable to control Chrome version 89+. My solution has been to download Chrome version 87 and change the auto update time for Chrome (see below for commands to do this).

import sys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

url = "http://127.0.0.1:8000/dashboard/council/" + sys.argv[1] + "/"

driver = webdriver.Chrome()
driver.get(url)
links = set(link.get_attribute('href') for link in driver.find_elements_by_partial_link_text(' '))
for link in links:
	if '/dashboard/postcode/' in link:
		 driver.execute_script("window.open('" + link +"')")
		 driver.switch_to.window(driver.current_window_handle)

Changing Chrome update time (may only work on OSX):

 defaults write com.google.Keystone.Agent checkInterval 200000000000000