-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetching status blocks for Tweets2011 - hits twitter api limit #13
Comments
This is a longstanding issue that the API-based crawler can't fix. The answer is to use the HTML-based crawlers, which need updating to Twitter's current HTML layout. |
Thank you for the feedback. I have tried the html based crawler, and found that it will not work with users who have closed their accounts, or made their tweets private since the original data was gathered. I am resigned to believe that this approach of fetching status blocks from API or directly from HTML will no longer work, due to API limits and expired or private accounts and tweets. |
We are working on updating the HTML crawler. Stay tuned, or alternatively patches accepted ;-) |
I have submitted pull request #12 , which uses api to fetch. |
Could you commit the HTML scraping code somewhere for reference? From what I understand, there's no way around the fact that private tweets and deleted tweets aren't available. |
@andrewyates @isoboroff |
Hi Nigel, I was trying the link https://github.com/nigel-v-thomas/twitter-tools and still getting the error "Unable to parse text from this, possible change in format"... I am using html scraping.... |
Hi Shakirak, it is quite likely the HTML markup has changed since I updated the code, as I said this solution is far from ideal, it would be much better to use the standard API.
|
I have been trying to get this tool to download the blocks from Tweets2011 collection, unfortunately current implementation hits the twitter api limit each time.
The Twitter limit on the read api is 180 hits per hour, see http://twitter4j.org/en/api-support.html, and 150 for unauthenticated.
I have tried
Given the number of requests generated by this solution, I am not sure how to build the Tweets2011 corpus.
The text was updated successfully, but these errors were encountered: