-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert to package #1
Comments
Thanks, we are in the middle of several upgrades mostly pertaining to the front-end, e.g. allowing users to configure which publication venues to support and then only load those. However, I can make the scraper available as a python package, sure, but that will take a few weeks. Can you suggest the kind of API you are expecting the python package to support? |
I was thinking more along the lines of a cli application. I had written something similar to this (as a bet XD) but was a much more naive solution - going through the corssref/doi api - which was prone to blocking my ip. I am just looking to use the metadata you collect to quick search with regex or any other extension. This is the current implementation I have, which is still super jank https://github.com/ahmed-shariff/acm-dl-searcher That being said, when I have some time, I also can help out with a few PR's if you have a roadmap or wishlist of what you want this repo to do. |
I see, the CLI you implemented looks great; we can definitely do something like that for the scrapers here. Also, while we are definitely encouraging contributions from the community going forward, but first, to get that ball rolling, let me discuss with my colleagues about roadmap/milestones and prepare a plan. I will then get back to you here after Thanksgiving! |
@arpitnarechania any updates on this? |
Hi @ahmed-shariff, apologies for not getting back earlier. We have prepared an internal timeline of multiple new features on the user interface as well as the scraper, many of them are currently in the pipeline. Unfortunate for this discussion, a Python-package to scrape data from the command-line was voted a low priority item. However, would you like to collaborate with me on it? I have a major deadline at the end of March but can utilize part of my weekends thereafter to work on it with you. At least with designing the CLI spec and commands and documentation; I am obviously happy to port the actual scraper-related aspects. |
Its no problem, I can certainly relate 😅 Having used your current implementation, I see why it would be voted down. It's quite memory intensive, atleast on the first few steps. The current implementation makes sense in terms of running once in a while to update the back-end's database. It's going to need some optimization to run as a stand-alone cli (and offline gui?) application. I am also fighting against a few deadlines, I'll setup what I had done so far as a PR and we can discuss the details there. I'll also create a separate issue to discuss the possible optimizations. |
I agree, it was designed for a long-term batch update; but that too can be made efficient -- I had plans to move to asynchronous threads or pyspark-based map-reduce operations. I will go through your WIP PR and get back. |
Hey there, awesome paper. And thank you so very much for having all of this available publicly
I was wondering if there is a reason why this was not conceived as a python package? or do you have any plans on dong this in the future?
The text was updated successfully, but these errors were encountered: