Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter sorting #17

Open
mortbauer opened this issue Jan 28, 2012 · 5 comments
Open

chapter sorting #17

mortbauer opened this issue Jan 28, 2012 · 5 comments

Comments

@mortbauer
Copy link

Sometimes, the chapters of the books are sorted alphabetically on the contents page of springerlink, as the script only uses this information for its list order, the chapters are mixed up which isn't very nice.
Maybe there could be a sorting, based on the page numbers of the chapters. I think it should be possible, but I'm not very good on regex, so I can't present a solution myself.

@thriqon
Copy link

thriqon commented Jan 28, 2012

It would be pretty helpful if you could provide us with an example. A simple URL will be enough...

But I am not sure if this is possible. Are you asking to extract page numbers out of the PDFs?

@mortbauer
Copy link
Author

ok, sorry for that. here an example, 978-3-540-23957-4 this is the ISBN of the book Springer Handbook of Robotics, it has 66 chapters and when i download it with the script the are not in the correct order. But i was browsing the contents page of the book, this url: http://www.springerlink.com/content/978-3-540-23957-4/contents/ , and next to the chapters are the pagenumbers of the chapter so i thought i shouldn't be to difficult to make the ordering based on this numbers.

@thriqon
Copy link

thriqon commented Jul 13, 2012

I have implemented something that might handle this... please give it a try and report back if it is what you intended.

@mortbauer
Copy link
Author

The sorting seems to work, but only tryed it with one example so far, but if i try without sorting, I get now following error:

$ python2 springer_download.py -l http://www.springerlink.com/content/978-3-540-77876-9/
fetching book information...
    http://springerlink.com/content/978-3-540-77876-9/contents/

Now Trying to download book 'VDI Heat Atlas'

found 68 chapters
Traceback (most recent call last):
  File "springer_download.py", line 310, in <module>
    main(sys.argv[1:])
  File "springer_download.py", line 194, in main
    chapterLink = baseLink + chapterLink
TypeError: cannot concatenate 'str' and 'tuple' objects

@LinkM
Copy link

LinkM commented Oct 14, 2012

As already commented inline your modification can not handle front-the matter because of it's roman pagenumbers. Additionally there are back-matters with pagenumbers starting at 1. E.g. www.springerlink.com/content/978-3-540-25202-3/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants