Skip to content

Commit

Permalink
Update scraping ode for BEM changes to cf.gov
Browse files Browse the repository at this point in the history
  • Loading branch information
wpears committed May 1, 2024
1 parent 90796c3 commit ef16675
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions .github/workflows/save-iregs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,14 @@ jobs:
run: |
IFS="," read -a BASH_PARTS <<< "$PARTS"
for PART in "${BASH_PARTS[@]}"; do
echo "Chunking part $PART"
CHUNKS=($(curl -sSL "https://www.consumerfinance.gov/rules-policy/regulations/${PART}/" |
htmlq -t '.o-secondary-nav_link' -a href -b 'https://www.consumerfinance.gov'
htmlq -t '.o-secondary-nav__link' -a href -b 'https://www.consumerfinance.gov'
))
SUBPARTS=($(for chunk in "${CHUNKS[@]}"; do echo "$chunk" | awk -F '/' '{print $(NF-1)}'; done))
echo "${SUBPARTS[@]} found for part $PART at $(date '+%X')"
curl -sSL "${CHUNKS[@]}" |
htmlq -r '.regulation-meta, .inline-interpretation, .block__sub, .o-regulations-wayfinder' |
htmlq -t '.u-layout-grid_main' |
htmlq -r '.regulation-meta, .inline-interpretation, .block--sub, .o-regulations-wayfinder' |
htmlq -t '.u-layout-grid__main' |
./.github/workflows/sed.sh |
awk '{$1=$1};1' > "./iregs/${PART}.txt"
sleep 1
Expand Down

0 comments on commit ef16675

Please sign in to comment.