-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
15 fix cle design review #34
15 fix cle design review #34
Conversation
2. There is no mention of the year anywhere in the text of the site. We can extract it from the agenda link - at least | ||
for now. But it will be important to keep an eye on how the site is changed in January. | ||
|
||
3. Meetings are currently not being held in person but over webex. I've included this information in the time_notes section of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good for now (re: point 3).
Thanks soo much for the detailed write-up here! To answer some of the questions that you've raised, I'll try to answer them below: I've repeated some of your code comments below so that @lcaswell and @PaulR-Docs from the Documenters can pick it up easier. I think for our use case, it would probably continue to split out the Planning Commission and Design Review meetings as separate organizations; (meaning Planning Commission is listed separately at https://cleveland.documenters.org/agencies/cleveland-city-planning-commission-179/ and Design Review Committees at https://cleveland.documenters.org/agencies/cleveland-design-review-advisory-committees-157/ ; Would like to hear @PaulR-Docs thoughts on that ^^ Regarding Point 3 at
Putting the web_ex meeting information in the time_notes field is probably a good short-term solution. The schema that we use to parse out the information about a meeting (start time of meeting, end time, day of meeting, physical location, etc) and assign it to a corresponding field does not have a field for information about how to access a meeting online. Other information that doesn't fall in neatly into one of the meeting fields in the schema has been included in time_notes field for other scrapers. (For example, Cuyahoga County Library - code
@lcaswell, @PaulR-Docs Regarding:
To sum up; what @zabarahms has written: Design Review Committees meets on fixed days (e.g. "the 1st and 3rd Friday of the month") and the specific meeting dates (e.g. January 1, 2022) are not explicitly listed on the website (to scrape). From a technical perspective, there hasn't been a best practice yet emerge of how to automatically scrape these meetings; across other cities' scrapers, even in other cities. #5 As I understand (please correct me if I'm wrong!) The risk with this approach (assuming the code works exactly as intended), as @zabrahams mentioned, is that meeting dates would be listed at https://cleveland.documenters.org/meetings/ cc'ing @PaulR-Docs and @lcaswell would like your thoughts on this. ^^^ Would this be problematic for the Documenters? |
Lastly I know my python is kind of rough so I'm totally open to any stylistic comments! Like with setting the tmp var on line 121 in the scraper - I'm not sure that it's needed but I did it to be safe! Just let me know any changes that would help! (@RaghavRao, interested in your thoughts from a technical perspective on this ^^^ ) |
@zabrahams @skorasaurus Thanks for working on this! I think it is perfectly fine to keep the Design Review Board meetings and Planning commission as separate agencies. We also keep the board of zoning appeals as a separate agency even though they are part of the Planning commission too. As far as the Web-ex meeting links go, that doesn't really need to be scraped, since our assignment-making process includes manually entering instructions and meeting links for the documenters to access the meetings. Also, these are instructions that only assigned documenters can see, so there is less of a risk of zoombombing by bots that could potentially scrape our site for meeting links. For example, some GCRTA meetings, like the CAC meetings, are held via web-ex only and are not streamed on FB live. For those we will add the web-ex links when we turn the scraped meeting into an assignment. The RTA/transit scraper never scrapes the meeting links directly. I would suggest making the meeting description for these Design Review Board meetings say something simple like "Meetings will be held via Web-ex." Then when our team creates those assignments, we can just go to the source site, copy the meeting webex or zoom link and put that directly into the assignment instructions. Let me know if that works. |
@zabrahams @skorasaurus We also check that meetings are happening before we make them into assignments so having the scraper just stick with the published schedule (1st and 3rd friday) is fine for now as far as assigning work to documenters is concerned. Having a line in the description about checking with staff is great too, in case any member of the public looking at our list of public meetings and wants to attend one. |
@skorasaurus @PaulR-Docs Thanks for the comments! The discussion is super helpful! I updated the PR to move what had been in the time notes into the meeting description - and I added a sentence to the upcoming meeting description saying to check with staff! I also removed that |
so I don't forget: I think this is ready to go, but proposed to @zabrahams in slack about abstracting the I totally defer to you and don't want to hold this up much. |
Hi! Sorry I've had a super busy week but I'm actually working on this now! I think it totally makes sense to abstract them out. I think there are 4 candidates I can see for where to put them:
I think my preference would be to go with (4)? (I put my reasoning below) Let me know what you think? And either way it wouldn't be to hard to move it around if someone changes their mind? (1) feels to me like maybe we'd building functionality that only applies to small number of scraper into all our scrapers - it seems better to me to separate the functionality so that only the scrapers that need it pull it in? That way we keep the core api lean and flexible and treat this more like an extension? |
Though maybe it would make sense to make it a utility in core if we think other cities are going to need it too? |
@skorasaurus I think this is ready for rereview with abstracting out the calculator! Let me know what you think! |
Thanks; on initial thought; I like option 4; since it doesn't shove everything into one file; I'll try to take a look at it more in depth tomorrow, or more likely Friday |
Thanks so much for all of this work! We should keep an eye on this to ensure that it's working as intended in 2022. |
Summary
Issue: #15
Checklist
All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.
Questions
I had a couple of questions!
The first is that the page changed in such a way that it's not trivial to figure out upcoming meetings. There are lists of existing agendas and then it just says the meeting is on the 1st and 3rd Tuesday of the month, or something like that. The strategy I adopted was to treat existing links as authoritative and then to calculate out meeting dates for 60 days following the last agenda item for each subcommittee. I'm not sure if that's reliable enough - since there's no way to know if meetings are cancelled until after the next agenda pops up? But maybe it's better then not including future meetings? I don't know?
I think this problem comes up in a few places - the city planning scraper (#12) will have a similar issue, I saw something similar discussed in #30 and #5. If we want to do something like the calculator I wrote, it might make sense to extract it into a mixin or utility package so that we can reuse it in multiple scrapers? But again, I'm not super sure!
I also wasn't sure about how to handle the webex and covid information. I put it in the time notes since that seemed to be the only place for generic notes about how the meeting would occur? But I'm not sure if that will work, especially with how the meeting gets displayed to end users in the calendar?
Lastly I know my python is kind of rough so I'm totally open to any stylistic comments! Like with setting the tmp var on line 121 in the scraper - I'm not sure that it's needed but I did it to be safe! Just let me know any changes that would help!
Thanks!!!!