Convert to package #1

ahmed-shariff · 2021-11-19T03:42:58Z

Hey there, awesome paper. And thank you so very much for having all of this available publicly

I was wondering if there is a reason why this was not conceived as a python package? or do you have any plans on dong this in the future?

arpitnarechania · 2021-11-19T05:32:44Z

Thanks, we are in the middle of several upgrades mostly pertaining to the front-end, e.g. allowing users to configure which publication venues to support and then only load those.

However, I can make the scraper available as a python package, sure, but that will take a few weeks. Can you suggest the kind of API you are expecting the python package to support?

ahmed-shariff · 2021-11-20T23:27:24Z

I was thinking more along the lines of a cli application. I had written something similar to this (as a bet XD) but was a much more naive solution - going through the corssref/doi api - which was prone to blocking my ip. I am just looking to use the metadata you collect to quick search with regex or any other extension.

This is the current implementation I have, which is still super jank https://github.com/ahmed-shariff/acm-dl-searcher

That being said, when I have some time, I also can help out with a few PR's if you have a roadmap or wishlist of what you want this repo to do.

arpitnarechania · 2021-11-22T14:27:23Z

I see, the CLI you implemented looks great; we can definitely do something like that for the scrapers here.

Also, while we are definitely encouraging contributions from the community going forward, but first, to get that ball rolling, let me discuss with my colleagues about roadmap/milestones and prepare a plan. I will then get back to you here after Thanksgiving!

ahmed-shariff · 2022-03-18T20:34:24Z

@arpitnarechania any updates on this?

arpitnarechania · 2022-03-24T13:03:41Z

Hi @ahmed-shariff, apologies for not getting back earlier. We have prepared an internal timeline of multiple new features on the user interface as well as the scraper, many of them are currently in the pipeline. Unfortunate for this discussion, a Python-package to scrape data from the command-line was voted a low priority item. However, would you like to collaborate with me on it? I have a major deadline at the end of March but can utilize part of my weekends thereafter to work on it with you. At least with designing the CLI spec and commands and documentation; I am obviously happy to port the actual scraper-related aspects.

ahmed-shariff · 2022-03-26T00:54:48Z

Its no problem, I can certainly relate 😅

Having used your current implementation, I see why it would be voted down. It's quite memory intensive, atleast on the first few steps. The current implementation makes sense in terms of running once in a while to update the back-end's database. It's going to need some optimization to run as a stand-alone cli (and offline gui?) application.

I am also fighting against a few deadlines, I'll setup what I had done so far as a PR and we can discuss the details there.

I'll also create a separate issue to discuss the possible optimizations.

arpitnarechania · 2022-03-29T16:32:15Z

I agree, it was designed for a long-term batch update; but that too can be made efficient -- I had plans to move to asynchronous threads or pyspark-based map-reduce operations. I will go through your WIP PR and get back.

ahmed-shariff mentioned this issue Mar 26, 2022

[WIP] Convert to python package with cli #2

Draft

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert to package #1

Convert to package #1

ahmed-shariff commented Nov 19, 2021

arpitnarechania commented Nov 19, 2021

ahmed-shariff commented Nov 20, 2021

arpitnarechania commented Nov 22, 2021

ahmed-shariff commented Mar 18, 2022

arpitnarechania commented Mar 24, 2022

ahmed-shariff commented Mar 26, 2022

arpitnarechania commented Mar 29, 2022

Convert to package #1

Convert to package #1

Comments

ahmed-shariff commented Nov 19, 2021

arpitnarechania commented Nov 19, 2021

ahmed-shariff commented Nov 20, 2021

arpitnarechania commented Nov 22, 2021

ahmed-shariff commented Mar 18, 2022

arpitnarechania commented Mar 24, 2022

ahmed-shariff commented Mar 26, 2022

arpitnarechania commented Mar 29, 2022