grammar and review comments

openvinotoolkit · Oct 20, 2023 · ffd3e60 · ffd3e60
1 parent e091b64
commit ffd3e60
Showing 1 changed file with 79 additions and 25 deletions.
diff --git a/notebooks/260-pix2struct-docvqa/260-pix2struct-docvqa.ipynb b/notebooks/260-pix2struct-docvqa/260-pix2struct-docvqa.ipynb
@@ -14,16 +14,10 @@
     "* Secondly, DocVQA can handle documents with complex layouts and structures, like tables and diagrams, which can be challenging for traditional OCR systems.\n",
     "* Finally, DocVQA can automate many document-based workflows, like document routing and approval processes, to make employees focus on more meaningful work. The potential applications of DocVQA include automating tasks like information retrieval, document analysis, and document summarization.\n",
     "\n",
-    "[Pix2Struct](https://arxiv.org/pdf/2210.03347.pdf) is a multimodal model for understanding visually-situated language that easily copes with extracting information from images. The model is trained using the novel learning technique to parse masked screenshots of web pages into simplified HTML, providing a significantly well-suited pretraining data source for the range of downstream activities such as OCR, visual question answering and image captioning.\n",
+    "[Pix2Struct](https://arxiv.org/pdf/2210.03347.pdf) is a multimodal model for understanding visually situated language that easily copes with extracting information from images. The model is trained using the novel learning technique to parse masked screenshots of web pages into simplified HTML, providing a significantly well-suited pretraining data source for the range of downstream activities such as OCR, visual question answering, and image captioning.\n",
     "\n",
-    "In this tutorial we consider how to run Pix2Struct model using OpenVINO for solving document visual question answering task. We will use pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum](https://huggingface.co/docs/optimum) library is used to convert the model to OpenVINO™ IR format.\n",
+    "In this tutorial, we consider how to run the Pix2Struct model using OpenVINO for solving document visual question answering task. We will use a pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum](https://huggingface.co/docs/optimum) library is used to convert the model to OpenVINO™ IR format.\n",
     "\n",
-    "The tutorial consist of the following steps:\n",
-    "\n",
-    "- Install prerequisites\n",
-    "- Download and convert model from a public source using the [OpenVINO integration with Hugging Face Optimum](https://huggingface.co/blog/openvino).\n",
-    "- Test model inference\n",
-    "- Launch interactive demo\n",
     "### Table of content:\n",
     "- [About Pix2Struct](#About-Pix2Struct-Uparrow)\n",
     "- [Prerequisites](#Prerequisites-Uparrow)\n",
@@ -63,7 +57,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu\n",
+    "%pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu\n",
     "%pip install -q \"git+https://github.com/huggingface/optimum-intel.git\" \"openvino>=2023.1.0\" transformers onnx gradio"
    ]
   },
@@ -74,12 +68,12 @@
    "source": [
     "## Download and Convert Model [$\\Uparrow$](#Table-of-contents:) [$\\Uparrow$](#Table-of-content:)\n",
     "\n",
-    "Optimum Intel can be used to load optimized models from the [Hugging Face Hub](https://huggingface.co/docs/optimum/intel/hf.co/models) and create pipelines to run an inference with OpenVINO Runtime using Hugging Face APIs. The Optimum Inference models are API compatible with Hugging Face Transformers models.  This means we just need to replace `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.\n",
+    "Optimum Intel can be used to load optimized models from the [Hugging Face Hub](https://huggingface.co/docs/optimum/intel/hf.co/models) and create pipelines to run an inference with OpenVINO Runtime using Hugging Face APIs. The Optimum Inference models are API compatible with Hugging Face Transformers models.  This means we just need to replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.\n",
     "\n",
-    "Model class initialization starts with calling `from_pretrained` method. When downloading and converting Transformers model, the parameter `export=True` should be added. We can save the converted model for the next usage with the `save_pretrained` method. After model saving using `save_pretrained` method, you can load your converted model without `export` parameter, avoiding model conversion for the next time. \n",
+    "Model class initialization starts with calling the `from_pretrained` method. When downloading and converting the Transformers model, the parameter `export=True` should be added. We can save the converted model for the next usage with the `save_pretrained` method. After model saving using the `save_pretrained` method, you can load your converted model without the `export` parameter, avoiding model conversion for the next time. For reducing memory consumption, we can compress model to fp16 using `half()` method.\n",
     "\n",
-    "In this tutorial, we separate model export and loading for demonstration how to work with model in both modes.\n",
-    "We will use [pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) model as example in this tutorial, but the same steps for running is applicable for other models from pix2struct family."
+    "In this tutorial, we separate model export and loading for a demonstration of how to work with the model in both modes.\n",
+    "We will use the [pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) model as an example in this tutorial, but the same steps for running are applicable for other models from pix2struct family."
    ]
   },
   {
@@ -100,10 +94,10 @@
      "output_type": "stream",
      "text": [
       "No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'\n",
-      "2023-10-18 18:10:30.960785: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
-      "2023-10-18 18:10:31.000305: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
+      "2023-10-20 13:49:09.525682: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
+      "2023-10-20 13:49:09.565139: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
       "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
-      "2023-10-18 18:10:31.812766: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
+      "2023-10-20 13:49:10.397504: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
       "/home/ea/work/ov_venv/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations\n",
       "  warnings.warn(\n"
      ]
@@ -144,12 +138,12 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "ba0ee376dbb544b2b84b5e07292f03d8",
+       "model_id": "1678eac140ca4cb1b41dfa624d29ae85",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')"
+       "Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')"
       ]
      },
      "execution_count": 3,
@@ -164,7 +158,7 @@
     "core = ov.Core()\n",
     "\n",
     "device = widgets.Dropdown(\n",
-    "    options=core.available_devices + [\"AUTO\"],\n",
+    "    options=[d for d in core.available_devices if \"GPU\" not in d]  + [\"AUTO\"],\n",
     "    value='AUTO',\n",
     "    description='Device:',\n",
     "    disabled=False,\n",
@@ -180,10 +174,10 @@
    "source": [
     "## Test model inference [$\\Uparrow$](#Table-of-content:)\n",
     "\n",
-    "The diagram bellow demonstrates how model works:\n",
+    "The diagram below demonstrates how the model works:\n",
     "![pix2struct_diagram.png](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/c7456b17-0687-4aa9-851b-267bff3dac79)\n",
     "\n",
-    "For running model inference we should preprocess data first. `Pix2StructProcessor` is responsible for preparing input data and decoding output for original PyTorch model and easily can be reused for running with Optimum Intel model. Then `OVModelForPix2Struct.generate` method will launch answer generation. Finally, generated answer token indices should be decoded in text format by `Pix2StructProcessor.decode`"
+    "For running model inference we should preprocess data first. `Pix2StructProcessor` is responsible for preparing input data and decoding output for the original PyTorch model and easily can be reused for running with the Optimum Intel model. Then `OVModelForPix2Struct.generate` method will launch answer generation. Finally, generated answer token indices should be decoded in text format by `Pix2StructProcessor.decode`"
    ]
   },
   {
@@ -214,7 +208,7 @@
    "id": "83a2ac29-80d9-4df0-b4e5-584a735f09e7",
    "metadata": {},
    "source": [
-    "Let's see model in action. For testing model, we will use screenshot from [OpenVINO documentation](https://docs.openvino.ai/2023.1/get_started.html#openvino-advanced-features)"
+    "Let's see the model in action. For testing the model, we will use a screenshot from [OpenVINO documentation](https://docs.openvino.ai/2023.1/get_started.html#openvino-advanced-features)"
    ]
   },
   {
@@ -305,7 +299,37 @@
    "execution_count": null,
    "id": "fba16902-72dc-41d2-8af1-2683e91d1679",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/tmp/ipykernel_1997913/1327125706.py:23: GradioDeprecationWarning: `enable_queue` is deprecated in `Interface()`, please use it within `launch()` instead.\n",
+      "  demo = gr.Interface(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Running on local URL:  http://127.0.0.1:7860\n",
+      "\n",
+      "To create a public link, set `share=True` in `launch()`.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div><iframe src=\"http://127.0.0.1:7860/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "import gradio as gr\n",
     "\n",
@@ -320,7 +344,7 @@
     "for img_url, image_file in zip(example_images_urls, file_names):\n",
     "    load_image(img_url).save(image_file)\n",
     "\n",
-    "questions = [\"What is Eiffel tower tall?\", \"When does exsibition open?\", \"What the population of Stoddard?\"] \n",
+    "questions = [\"What is Eiffel tower tall?\", \"When is the coffee break?\", \"What the population of Stoddard?\"] \n",
     "\n",
     "examples = [list(pair) for pair in zip(file_names, questions)]\n",
     "\n",
@@ -370,7 +394,37 @@
   },
   "widgets": {
    "application/vnd.jupyter.widget-state+json": {
-    "state": {},
+    "state": {
+     "1678eac140ca4cb1b41dfa624d29ae85": {
+      "model_module": "@jupyter-widgets/controls",
+      "model_module_version": "2.0.0",
+      "model_name": "DropdownModel",
+      "state": {
+       "_options_labels": [
+        "CPU",
+        "AUTO"
+       ],
+       "description": "Device:",
+       "index": 1,
+       "layout": "IPY_MODEL_cb3c0fe397d34c1fb41b7f5c67a04af3",
+       "style": "IPY_MODEL_33404eddef10437ca70e07bacf45b777"
+      }
+     },
+     "33404eddef10437ca70e07bacf45b777": {
+      "model_module": "@jupyter-widgets/controls",
+      "model_module_version": "2.0.0",
+      "model_name": "DescriptionStyleModel",
+      "state": {
+       "description_width": ""
+      }
+     },
+     "cb3c0fe397d34c1fb41b7f5c67a04af3": {
+      "model_module": "@jupyter-widgets/base",
+      "model_module_version": "2.0.0",
+      "model_name": "LayoutModel",
+      "state": {}
+     }
+    },
     "version_major": 2,
     "version_minor": 0
    }