Skip to content

Commit

Permalink
Switched from my branch of PAWLs to a new repo for PDF processors usi…
Browse files Browse the repository at this point in the history
…ng the PAWLS preprocessor code. Realized my branch of PAWLs had some changes beyond fixing the issuing processing empty pages. There was some dynamic resizing and scaling that was getting the token coordinate system out of sync with the pdf coordinate system. Some of those improvements I made to processing documents were useful, so those open cv improvements should be re-incorporated later for better parsing of bad quality docs. For now, though, the application is working 100% once again :-)
  • Loading branch information
JSv4 committed Sep 13, 2023
1 parent c2b2902 commit 39c0f49
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 4 deletions.
10 changes: 8 additions & 2 deletions frontend/src/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,13 @@ const onRedirectCallback = (appState: any) => {
);
};

const { REACT_APP_USE_AUTH0, REACT_APP_API_ROOT_URL: api_root_url } =
process.env;
// const { REACT_APP_USE_AUTH0, REACT_APP_API_ROOT_URL: api_root_url } =
// process.env;

const { REACT_APP_USE_AUTH0 } = process.env;


const api_root_url='http://localhost:8000'

console.log("OpenContracts is using Auth0: ", REACT_APP_USE_AUTH0);
console.log("OpenContracts frontend target api root", api_root_url);
Expand All @@ -40,6 +45,7 @@ const authLink = new ApolloLink((operation, forward) => {
return forward(operation);
});

console.log("api_root_url", api_root_url)
const httpLink = createHttpLink({
uri: `${api_root_url}/graphql/`,
});
Expand Down
2 changes: 1 addition & 1 deletion opencontractserver/utils/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def extract_pawls_from_pdfs_bytes(
pdf_bytes: bytes,
) -> list[PawlsPagePythonType]:

from pawls.commands.preprocess import process_tesseract
from pdfpreprocessor.preprocessors.tesseract import process_tesseract

Check warning on line 103 in opencontractserver/utils/pdf.py

View check run for this annotation

Codecov / codecov/patch

opencontractserver/utils/pdf.py#L103

Added line #L103 was not covered by tests

pdf_fragment_folder_path = pathlib.Path("/tmp/user_0/pdf_fragments")
pdf_fragment_folder_path.mkdir(parents=True, exist_ok=True)
Expand Down
2 changes: 1 addition & 1 deletion requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ drf-extra-fields==3.4.1 # https://github.com/Hipo/drf-extra-fields
# ------------------------------------------------------------------------------
# Pawls preprocessors are available as a command line utility in their repo for now
# BUT we can install them from their github repo subdirectory using the syntax below:
git+https://github.com/JSv4/pawls#egg=pawls&subdirectory=cli
git+https://github.com/JSv4/[email protected]
scikit-learn==1.1.3
pdfplumber
pytesseract
Expand Down

0 comments on commit 39c0f49

Please sign in to comment.