Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEXTPY-569 -- Make pdfquery compatible with Python 3.9 and 3.11 #91

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 5 additions & 10 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,18 @@
language: python
python:
- "3.5"
- "3.6"
- "3.7"
- "3.8"
- "3.9"
- "3.11"
env: CFLAGS="-O0"

cache:
directories:
- $HOME/.cache/pip

install:
- if [[ $TRAVIS_PYTHON_VERSION < 3 ]]; then pip install -r requirements_py2.txt; fi
- if [[ $TRAVIS_PYTHON_VERSION > 3 ]]; then pip install -r requirements_py3.txt; fi
- if [[ $TRAVIS_PYTHON_VERSION == '2.6' ]]; then pip install unittest2; fi
script:
python setup.py test
- pip install -e .
script: python setup.py test
after_success:
- coveralls

# See: http://docs.travis-ci.com/user/migrating-from-legacy/
sudo: false
sudo: false
15 changes: 15 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
v0.5.0, 2021-05-03 --
0.5.1 (unreleased)


- Make pdfquery compatible with Python 3.9 and 3.11


0.5.0 (2021-05-04)
- #67 Fix range() page numbers for Python3 & prevent long cache file names
- Remove references to old version of PDFMiner
- Fixed an isort issue
- Update (un)supported Python versions
- Improve performance on large pdfs
- Remove reference to deprecated easy_install
- Fix two broken testcases
v0.4.3, 2016-03-27 -- Add laparams parameter to __init__.
v0.4.2, 2016-02-07 -- Annotations bugfix.
v0.4.1, 2015-12-21 -- Annotations bugfix.
Expand Down
12 changes: 9 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,16 @@ PDFs with as little code as possible.

.. contents:: **Table of Contents**

Installation
============
Installation as a package
=========================

``pip install pdfquery``


Installation for development
============================

``easy_install pdfquery`` or ``pip install pdfquery``.
``pip install -e ".[test,flake8,docs,release]"``

Quick Start
===========
Expand Down
10 changes: 3 additions & 7 deletions appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
environment:
matrix:
# https://www.appveyor.com/docs/windows-images-software/#python
# currently lxml does not successfully install in 3.5 and 3.8
# - PYTHON: "C:\\Python35"
- PYTHON: "C:\\Python36"
- PYTHON: "C:\\Python37"
# - PYTHON: "C:\\Python38"
- PYTHON: "C:\\Python39"
- PYTHON: "C:\\Python311"

build: off

test_script:
- "%PYTHON%\\python.exe setup.py test"
- "%PYTHON%\\python.exe setup.py test"
1 change: 0 additions & 1 deletion dev_requirements.txt

This file was deleted.

Binary file removed dist/pdfquery-0.1.0.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.1.1.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.1.2.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.1.3.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.1.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.2.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.3.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.4.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.5.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.6.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.7.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.2.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.3.0.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.3.1.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.4.0.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.4.1.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.4.2.tar.gz
Binary file not shown.
Binary file removed dist/pdfquery-0.4.3.tar.gz
Binary file not shown.
2 changes: 1 addition & 1 deletion pdfquery/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from .pdfquery import PDFQuery
from .pdfquery import PDFQuery
25 changes: 15 additions & 10 deletions pdfquery/cache.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import hashlib
import zipfile

from lxml import etree

class BaseCache(object):

class BaseCache(object):
def __init__(self):
self.hash_key = None

Expand Down Expand Up @@ -32,30 +33,34 @@ class DummyCache(BaseCache):


class FileCache(BaseCache):

def __init__(self, directory='/tmp/'):
def __init__(self, directory="/tmp/"):
self.directory = directory
super(FileCache, self).__init__()

def get_cache_filename(self, page_range_key):
return "pdfquery_{hash_key}{page_range_key}.xml".format(
hash_key=self.hash_key,
page_range_key=page_range_key
hash_key=self.hash_key, page_range_key=page_range_key
)

def get_cache_file(self, page_range_key, mode):
try:
return zipfile.ZipFile(self.directory+self.get_cache_filename(page_range_key)+".zip", mode)
return zipfile.ZipFile(
self.directory + self.get_cache_filename(page_range_key) + ".zip", mode
)
except IOError:
return None

def set(self, page_range_key, tree):
xml = etree.tostring(tree, encoding='utf-8', pretty_print=False, xml_declaration=True)
cache_file = self.get_cache_file(page_range_key, 'w')
xml = etree.tostring(
tree, encoding="utf-8", pretty_print=False, xml_declaration=True
)
cache_file = self.get_cache_file(page_range_key, "w")
cache_file.writestr(self.get_cache_filename(page_range_key), xml)
cache_file.close()

def get(self, page_range_key):
cache_file = self.get_cache_file(page_range_key, 'r')
cache_file = self.get_cache_file(page_range_key, "r")
if cache_file:
return etree.fromstring(cache_file.read(self.get_cache_filename(page_range_key)))
return etree.fromstring(
cache_file.read(self.get_cache_filename(page_range_key))
)
Loading