Skip to content
This repository has been archived by the owner on Mar 17, 2020. It is now read-only.

API scraper #3

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

API scraper #3

wants to merge 5 commits into from

Conversation

siennathesane
Copy link
Owner

@siennathesane siennathesane commented Feb 21, 2019

adding the docs scraper to go through and express all the Windows APIs. it's highly concurrent, but there is a lot of room for optimisation improvements. currently I've only tested it for Windows Desktop, but once I flesh out the last big of bugs, it should be good to go.

@aaronmsft @bitcrazed @erikstmartin this should really help grease the wheels on some of our conversations.

Signed-off-by: Mike Lloyd [email protected]

…s. it's highly concurrent, but there is a lot of room for optimisation improvements.

Signed-off-by: Mike Lloyd <[email protected]>
@todo
Copy link

todo bot commented Feb 21, 2019

(mxplusb): this should stream more efficiently.

https://github.com/mxplusb/windows/blob/9f11f25c4629389c7623d333e1c6f860ff8e6aeb/docs-scraper/main.go#L73-L78


This comment was generated by todo based on a TODO comment in 9f11f25 in #3. cc @mxplusb.

@todo
Copy link

todo bot commented Feb 21, 2019

(mxplusb): figure out why this only works sometimes and not others. it seems to be specific to the DX libraries for some reason.

https://github.com/mxplusb/windows/blob/85a57a03ebc35ed86fe98b3fa63bb66eba96b888/docs-scraper/main.go#L220-L225


This comment was generated by todo based on a TODO comment in 85a57a0 in #3. cc @mxplusb.

@siennathesane siennathesane marked this pull request as ready for review February 21, 2019 08:56
@siennathesane siennathesane added this to the API Template Generation milestone Feb 21, 2019
added nil pointer checks due to dereference panics.
added version regular expressions for later.
added global counter for humanity's sake.
split out version table buffer from code block buffer.
removed remarks regular expression due to garbage data.

there's still an issue with deadlocks/race condition somewhere, but it's
really inconsistent. sometimes it stalls at like ~3k functions found,
other times it stalls at like ~17k functions found. I don't know where
the problem is. I'm wondering if it's on the Microsoft side, due to all
of the crawling traffic. I wonder if it might be a DoS concern.

Signed-off-by: Mike Lloyd <[email protected]>
@siennathesane
Copy link
Owner Author

Welp, I got blacklisted from the docs website, haha.

image

@siennathesane
Copy link
Owner Author

I need to add a rate limiter.

@siennathesane siennathesane self-assigned this Mar 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant