-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size of Kinetics-600 #28
Comments
Training set is |
Thank you for the reply. What do you mean by "scaling by the number of clips"? |
No problem. I just meant that I had only downloaded the training set videos, so I was estimating the validation and test set sizes using the number of clips as given by the annotation files (i.e. multiplying by 30/392 to get the validation size, and 60/392 to get the test set size). I ended up downloading the validation clips and can confirm they're just over |
Got it. Thank you. |
I see. Thank you so much. |
Note: I did not manage to get clean* videos with that script. Would be nice to see if the effect of that on classification accuracy.
|
@cmhungsteve I think I fixed the issue that @escorciav mentioned with my latest commits. https://github.com/jremmons/ActivityNet/blob/master/Crawler/Kinetics/download.py |
@jremmons Thank you!! |
@jremmons FWIW I sampled about 20 videos downloaded with the old script, and never saw the artifacts referred to in the other post (viewing in QuickTime Player). Any insight into why I might not have had the issue? Am I just getting lucky in sampling videos with no artifacts? |
@chrischute out of curiosity, did you take care of sampling video with t_start significantly different than zero? |
@escorciav I did try sampling a video with t_start greater than 0 ( |
@escorciav I also didn't notice any issues with the first script I wrote. The current version of my script does re-encoding now like the original It would be a huge performance win for most people if the script doesn't have to re-encode. If we can't reproduce this problem it might be worth going back to a version that doesn't do re-encoding. |
TLDR as we agree in the other PR, It's great to have this alternative to download and clip the videos. If anyone wanna start quickly please use it. Take my words as a disclaimer note that comes in the agreements that we usually don't read 😉 I am really happy with your contribution and comments. My original comment was more scientific/engineering question rather than a note to prevent the usage of this script. I don't have too much bandwidth to test it in the next two weeks, I will try to do it but do not promise anything. If you have a docker image or conda environment please share it, that would reduce the amount of work. |
@jremmons I tried your script but all the downloaded files are just like empty text files. I also tried to print "status" at line 137, and it all showed something like this: "('QcVuxQAgrzU_000007_000017', False, b'')". |
@cmhungsteve that is strange... I literally just used the same script today to download kinetics-600 without any issues. Are you sure you are using the most up to date version here? If you have questions about my script though, we should chat on #16. |
I am not really sure why. Is it because my FFmpeg version is not correct or I miss some library? and my youtube-dl version is "2018.6.25", which I think is the newest. |
I know this is a rather open ended question, but I was looking to get some guidance on the time that it takes to download the entire dataset (for e.g if num-jobs=24 on an AWS p2 instance with 8 cores), thank you, @cmhungsteve @jremmons |
Do you manage to download all the clips? Because when I try to download the dataset, around 10% of the clips cannot be downloaded either since video is unavailable, copyright issues or the user closed the account. Is it normal? |
@okankop yeah....there are lots of videos with copyright issues. I think it's normal. |
An update on the stats using @jremmons version of the download script. Training set: 589GB (380802 clips) |
While inspecting downloaded videos, I found out that joblib's parallelism would damage ffmpeg's transcoding (with url stream) and yield corrupted videos. The problem was solved by replacing joblib to python's built-in multiprocessing module. |
Hi, Can you share Kinetics-600 data file? Thanks a lot ! @MannyKayy |
Hi, Can you share Kinetics-600 data file? Thanks a lot ! @MannyKayy |
What is the recommended way of sharing this dataset? It's roughly 0.6 TB and I am having trouble making a torrent of this dataset. |
Maybe, we should reach the CVDF foundation. Probably, they can host the data as they have done for other datasets. Please thumbs up this message if you deem it essential. It would help to make a strong case. |
@sophia-wright-blue were you able to make it run on the p2 instance? I was running into too many requests issue, which I created an issue for #51 (comment) |
Hi @hollowgalaxy , I was able to download it on an Azure VM, good luck! |
Thanks for letting me know, @sophia-wright-blue, |
@hollowgalaxy , I think I played around with that number a bit, I don't remember what finally worked, you might also wanna try this repo: https://github.com/Showmax/kinetics-downloader which worked for me |
Apparently, pretty recently Youtube has started to extensively block large-scale downloading using youtube-dl. I have tried using the Crawler code for Kinetics and am always getting HTTP 429 ERROR. So, it does not matter which approach/code you use, Youtube apparently just does not allow systematic downloading. It would be great if ActivityNet hosts videos on some server so researchers would still be able to use Kinetics. |
@MahdiKalayeh Could you pls confirm if #51 is the same error message that you got? |
@escorciav Yes, the same error. |
Let's track it there. Thanks 😉 |
@MannyKayy where you able to upload your download of kinetics600 so we may download it from there? thanks |
@MStumpp Unfortunately not. @escorciav It may be worth it if the CDVF reaches out to the authors for a copy of the full original kinetics dataset. |
I contacted Kinetics maintainers, and they are aware of the request. The ball is on their side. I will follow up with them by the end of the week. |
It's been a month so I guess that possibility's gone out of the window by now...? |
Any update on this track? |
Regarding ☝️ , I haven't heard back from them officially. My feeling is that the maintainers have knocked multiple doors, and have not found any solution yet. The most viable solutions, that I'm aware, are:
|
I would ask some questions (they might be repeated by someone's).
Thanks in advance. |
I am also working with Kinetics dataset for an academic purpose. Also, I got same error (429). Can you please share with us via Google Drive or something else? @kaiqiangh |
Hi @mahsunaltin, I also have this issue and cannot download the whole dataset. Not sure how to solve it. |
When you wrote like |
Hi, I checked the log files and also found some errors that lead to the incompleteness of val dataset. And then I re-run the codes, and the val dataset has been overrode. I am still working on it. By the way, I tried to download videos from my other server, but still get 429 error. Do you have any solution for that? |
I already tried kind of techniques to download dataset, and as everyone I got 429 error. In the fact, if we can change ip address in every 50 videos, there will be no problem. So that, I have kind of tricky way to download dataset by using Colab. ( I know not very logical :) but so far so good )
|
Regarding Activitynet |
If anyone cannot download samples with error 429, you can use --cookies to download it. URL is https://daveparrish.net/posts/2018-06-22-How-to-download-private-YouTube-videos-with-youtube-dl.html. BTW, it seems that a lot of videos are private videos that cannot be accessed. How can we download the private videos to make the dataset complete? |
Thanks @Katou2! |
I am wondering how large Kinetics-600 is. I am downloading it now and finished around 330G.
I saw someone said Kinetics-400 is around 311G. Does that mean Kinetics-600 is around 470G?
Just curious about that.
Thank you.
The text was updated successfully, but these errors were encountered: