Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an analyze receipt notebook for v3.0.preview1 #277

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file modified python/FormRecognizer/images/how-to-find-endpoint-and-key.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
262 changes: 262 additions & 0 deletions python/FormRecognizer/quickstart-receipt-analyze-preview.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook gives you an end-to-end example on how to get started using Python SDK (preview) to analyze a receipt with Azure Form Recognizer."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Prerequistes\n",
"- Azure subscription - [Create one for free](https://azure.microsoft.com/en-us/free/cognitive-services/)\n",
"- [Python 3.x](https://www.python.org/) - Your Python installation should include [pip](https://pip.pypa.io/en/stable/). You can check if you have pip installed by running `pip --version` on the command line. Get pip by installing the latest version of Python.\n",
"- Once you have your Azure subscription, [create a Form Recognizer resource](https://ms.portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Azure portal to get your **key** and **endpoint**. After it deploys, click **Go to resource** - You will need the key and endpoint from the resource you create to connect your application to the Form Recognizer API. Later in the quickstart, you will paste your key and endpoint into the code below. You can use the free pricing tier (`F0`) to try the service, and upgrade later to a paid tier (`S0`) for production."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up - Install the client library\n",
"After installing Python, you can install the preview version of Form Recognier client library with:\n",
"`pip install azure-ai-formrecognizer --pre`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install azure-ai-formrecognizer --pre"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get the key and endpoint\n",
"Refer to the screenshot on how to get the key and endpoint of your Form Recognizer resource.\n",
"![How to find endpoint and key](./images/how-to-find-endpoint-and-key.png)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"FORMRECOGNIZER_ENDPOINT = \"{enter your endpoint}\"\n",
"FORMRECOGNIZER_KEY = \"{enter your key}\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Authenticate a FormRecognizerClient for document analysis\n",
"[FormRecognizerClient](https://docs.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.formrecognizerclient?view=azure-python) is used to query the service to recongize document fields and conent like key-value pairs, tables with prebuilt or custom trained models."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from azure.core.exceptions import ResourceNotFoundError\n",
"from azure.ai.formrecognizer import DocumentAnalysisClient\n",
"from azure.core.credentials import AzureKeyCredential\n",
"\n",
"# Initiate client with given endpoint and credential\n",
"document_analysis_client = DocumentAnalysisClient(endpoint=FORMRECOGNIZER_ENDPOINT, credential=AzureKeyCredential(FORMRECOGNIZER_KEY))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start an analyze request for your local files with `begin_analyze_document`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Read the sample image file into memory and begin analyzing the file\n",
"IMAGE_FILE = 'sample-hotel-receipt.png'\n",
"with open(IMAGE_FILE, 'rb') as f:\n",
" poller = document_analysis_client.begin_analyze_document(\n",
" \"prebuilt-receipt\", document=f\n",
" )\n",
"\n",
"# Get the analyze result\n",
"receipts = poller.result()\n",
"print('Status: {}, Document(s): {}'.format(poller.status(), receipts.documents))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## You can also analyze files from the web using `begin_analyze_document_from_url`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"IMAGE_URL = 'https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/contoso-receipt.png'\n",
"\n",
"# Send request to Form Recognizer service to process data\n",
"poller = document_analysis_client.begin_analyze_document_from_url(\n",
" \"prebuilt-receipt\", document_url=IMAGE_URL\n",
")\n",
"\n",
"# Get the analyze result\n",
"receipts = poller.result()\n",
"print('Status: {}, Document(s): {}'.format(poller.status(), receipts.documents))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract information from analyzed result"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------Recognizing receipt #1--------\n",
"Receipt Type: Hotel has confidence: 0.988\n",
"ArrivalDate: 2021-03-27 has confidence 0.986\n",
"Currency: USD has confidence 0.995\n",
"DepartureDate: 2021-03-28 has confidence 0.986\n",
"Receipt Items:\n",
"...Item #1\n",
"......Category: Room has confidence 0.988\n",
"......Date: 2021-03-27 has confidence 0.99\n",
"......Description: Room Charge has confidence 0.989\n",
"......TotalPrice: 88.0 has confidence 0.99\n",
"...Item #2\n",
"......Category: Tax has confidence 0.986\n",
"......Date: 2021-03-27 has confidence 0.99\n",
"......Description: County Tax 6% has confidence 0.987\n",
"......TotalPrice: 5.28 has confidence 0.99\n",
"...Item #3\n",
"......Category: Tax has confidence 0.986\n",
"......Date: 2021-03-27 has confidence 0.99\n",
"......Description: State Tax 6% has confidence 0.985\n",
"......TotalPrice: 5.0 has confidence 0.99\n",
"...Item #4\n",
"......Category: Other has confidence 0.988\n",
"......Date: 2021-03-27 has confidence 0.99\n",
"......Description: Daily Parking has confidence 0.989\n",
"......TotalPrice: 8.0 has confidence 0.989\n",
"...Item #5\n",
"......Category: Tax has confidence 0.988\n",
"......Date: 2021-03-27 has confidence 0.989\n",
"......Description: Parking Tax has confidence 0.989\n",
"......TotalPrice: 0.38 has confidence 0.99\n",
"...Item #6\n",
"......Category: Credit has confidence 0.988\n",
"......Date: 2021-03-28 has confidence 0.99\n",
"......Description: American Express has confidence 0.989\n",
"......TotalPrice: 104.92 has confidence 0.99\n",
"Locale: en-US has confidence 0.992\n",
"MerchantAddress: 5600 148th Ave NE, Redmond, WA 98052 has confidence 0.988\n",
"MerchantAliases: [DocumentField(value_type=string, value='Contoso', content=Contoso, bounding_regions=[BoundingRegion(page_number=1, bounding_box=[Point(x=265.0, y=75.0), Point(x=343.0, y=76.0), Point(x=343.0, y=96.0), Point(x=266.0, y=96.0)])], spans=[DocumentSpan(offset=0, length=7)], confidence=0.813)] has confidence None\n",
"MerchantName: Contoso Inn has confidence 0.986\n",
"MerchantPhoneNumber: +19876544321 has confidence 0.99\n",
"ReceiptType: Hotel has confidence 0.988\n",
"Total: 104.92 has confidence 0.994\n"
]
}
],
"source": [
"for idx, receipt in enumerate(receipts.documents):\n",
" print(\"--------Recognizing receipt #{}--------\".format(idx + 1))\n",
" receipt_type = receipt.fields.get(\"ReceiptType\")\n",
" if receipt_type:\n",
" print(\n",
" \"Receipt Type: {} has confidence: {}\".format(\n",
" receipt_type.value, receipt_type.confidence\n",
" )\n",
" )\n",
" merchant_name = receipt.fields.get(\"MerchantName\")\n",
" for name, field in receipt.fields.items():\n",
" if name == \"Items\":\n",
" print(\"Receipt Items:\")\n",
" for id_item, items in enumerate(field.value):\n",
" print(\"...Item #{}\".format(id_item + 1))\n",
" for item_name, item in items.value.items():\n",
" print(\"......{}: {} has confidence {}\".format(item_name, item.value, item.confidence))\n",
" else:\n",
" print(\"{}: {} has confidence {}\".format(name, field.value, field.confidence))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"- Learn [Receipt concept](https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/concept-receipts)\n",
"- Explore the [different offerings](https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/overview) in Form Recognizer\n",
"- Try Form Recognizer with [Form Recognizer Studio](https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/quickstarts/try-v3-form-recognizer-studio)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"interpreter": {
"hash": "be34584a914411b92112902d7e0134e77b56151a1dd3ca41a681ece390d32dda"
},
"kernelspec": {
"display_name": "Python 3.7.11 64-bit ('fr-demo': conda)",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading