Effortlessly extract information trapped in PDF invoices. We are using Adobe PDF Services Extract API for extraction and outputting important data in a CSV format.
main
folder contains all the logical operations.ProductHandler.js
file creates or initialize theProducts.json
file.index.js
file Extracts the pdf data as zip files and stores in theExtractedZip
folder.unzip. js
file unzips the extracted zipped files and stores them in theExtractedUnzip
folder.jsonHandler.js
file process the extracted data and store it inProducts.json
.csvHandler.js
file converts Products.json to CSV.Products.json
file contains the extracted data in JSON format.ExtractedProduct.csv
file contains the final output.
Run in your local machine terminal
- Clone the Repository :
git clone https://github.com/Anand-shreya/AdobeHackathon_pdfExtractor.git
- Go to folder :
cd AdobeHackathon_pdfExtractor
- Install the node Packages :
npm install
-
To Extract data from Invoices, place all the invoices in the
resources
folder. -
Update the pdfservices-api-credentials.json file with your Adobe PDF Services API credentials.
-
Run the script to create or update (if it already exists) the
products.json
file.
npm start
- Run the script to extract JSON data.
npm run createJson
- Run the script to convert JSON data to a CSV file.
npm run createCsv