diff --git a/examples/basic-link-1.pdf b/examples/basic-link-1.pdf deleted file mode 100644 index 0553cf3..0000000 Binary files a/examples/basic-link-1.pdf and /dev/null differ diff --git a/examples/parsing_modes/demo_auto_mode.ipynb b/examples/parsing_modes/demo_auto_mode.ipynb index bbf8333..e915c0b 100644 --- a/examples/parsing_modes/demo_auto_mode.ipynb +++ b/examples/parsing_modes/demo_auto_mode.ipynb @@ -8,7 +8,7 @@ "\n", "\"Open\n", "\n", - "Many documents can have varying complexity across pages - some pages have text, and other pages have images. The text-only pages only require cheap parsing modes, whereas the image-based pages require more advanced modes. In this notebook we show you how to take advantage of \"auto-mode\" in LlamaParse which adaptively parses different pages according to different modes, which lets you get optimal performance at the cheapest cost.\n" + "Many documents can have varying complexity across pages - some pages have text, and other pages have images. The text-only pages only require cheap parsing modes, whereas the image-based pages require more advanced modes. In this notebook we show you how to take advantage of \"auto mode\" in LlamaParse which adaptively parses different pages according to different modes, which lets you get optimal performance at the cheapest cost.\n" ] }, { @@ -27,28 +27,10 @@ "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2024-12-08 14:28:09-- https://assets.amazon.science/9f/a3/ae41627f4ab2bde091f1ebc6b830/the-amazon-nova-family-of-models-technical-report-and-model-card.pdf\n", - "Resolving assets.amazon.science (assets.amazon.science)... 18.155.192.66, 18.155.192.102, 18.155.192.84, ...\n", - "Connecting to assets.amazon.science (assets.amazon.science)|18.155.192.66|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 21222963 (20M) [application/pdf]\n", - "Saving to: ‘./data/nova_technical_report.pdf’\n", - "\n", - "./data/nova_technic 100%[===================>] 20.24M 36.1MB/s in 0.6s \n", - "\n", - "2024-12-08 14:28:10 (36.1 MB/s) - ‘./data/nova_technical_report.pdf’ saved [21222963/21222963]\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "!mkdir -p data\n", - "!wget 'https://assets.amazon.science/9f/a3/ae41627f4ab2bde091f1ebc6b830/the-amazon-nova-family-of-models-technical-report-and-model-card.pdf' -O './data/nova_technical_report.pdf'" + "!wget 'https://www.dropbox.com/scl/fi/sterddtajrf844ytvwlim/the-amazon-nova-family-of-models-technical-report-and-model-card.pdf?rlkey=0if0ct5diw70jifr9m8fikpsc&dl=0' -O './data/nova_technical_report.pdf'" ] }, { @@ -79,10 +61,10 @@ "import os\n", "\n", "# API access to llama-cloud\n", - "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\"\n", + "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-xxx\"\n", "\n", "# Using OpenAI API for embeddings/llms\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"" + "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-xxx\"" ] }, { @@ -109,7 +91,7 @@ "source": [ "## Using `LlamaParse` with Auto-Mode\n", "\n", - "We feed the Uber March 2022 10QA into LlamaParse with auto-mode enabled to get back the Markdown representation." + "We feed the Nova technical report into LlamaParse with auto-mode enabled to get back the Markdown representation." ] }, { @@ -121,7 +103,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "Started parsing the file under job_id 1dcfb080-9ee8-4e61-904d-2c94b0dad1cf\n" + "Started parsing the file under job_id 638e9b31-eb09-43d2-bf08-b54cef51ddb1\n", + "......." ] } ], @@ -134,12 +117,19 @@ " result_type=\"markdown\",\n", " auto_mode=True,\n", " auto_mode_trigger_on_image_in_page=True,\n", - " # auto_mode_trigger_on_table_in_page=False,\n", + " auto_mode_trigger_on_table_in_page=True,\n", " # auto_mode_trigger_on_text_in_page=\"\"\n", " # auto_mode_trigger_on_regexp_in_page=\"\"\n", ").load_data(file_path)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll also run the same document through the default markdown mode to compare." + ] + }, { "cell_type": "code", "execution_count": null, @@ -149,33 +139,22 @@ "name": "stdout", "output_type": "stream", "text": [ - "Started parsing the file under job_id 01a5bbdf-744b-4ac1-8fab-ea963d722164\n", - "...." + "Started parsing the file under job_id ddca58f6-4919-4069-b864-616a5956696e\n", + "........" ] } ], "source": [ - "base_documents = LlamaParse(result_type=\"markdown\", invalidate_cache=True).load_data(\n", - " file_path\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# base_documents" + "base_documents = LlamaParse(result_type=\"markdown\").load_data(file_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Show Example Pages\n", + "## Creating page nodes from our parsed documents\n", "\n", - "Here we show example pages that are parsed with auto-mode. " + "This is just a convenience to make it easy to inspect the parsed pages." ] }, { @@ -218,9 +197,136 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Page 11** contains images and tables, and we can see that auto-mode automatically switches to higher-quality parsing vs. the default parsed page.\n", + "## Triggering on images\n", "\n", - "![](page_11.png)" + "Let's look at the first page, which has a complex diagram:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The auto-mode parsed page has automatically converted the diagram into a [Mermaid](https://mermaid.js.org/) chart:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "# The Amazon Nova Family of Models:\n", + "# Technical Report and Model Card\n", + "\n", + "Amazon Artificial General Intelligence\n", + "\n", + "```mermaid\n", + "graph TD\n", + " A[Text] --> B[Nova Lite]\n", + " C[Image] --> B\n", + " D[Video] --> E[Nova Pro]\n", + " F[Code] --> E\n", + " G[Docs] --> E\n", + " B --> H[Text]\n", + " B --> I[Code]\n", + " E --> H\n", + " E --> I\n", + " J[Text] --> K[Nova Micro]\n", + " L[Code] --> K\n", + " K --> M[Text]\n", + " K --> N[Code]\n", + " O[Text] --> P[Nova Canvas]\n", + " Q[Image] --> P\n", + " P --> R[Image]\n", + " S[Text] --> T[Nova Reel]\n", + " U[Image] --> T\n", + " T --> V[Video]\n", + " \n", + " style B fill:#f9f,stroke:#333,stroke-width:2px\n", + " style E fill:#f9f,stroke:#333,stroke-width:2px\n", + " style K fill:#f9f,stroke:#333,stroke-width:2px\n", + " style P fill:#f9f,stroke:#333,stroke-width:2px\n", + " style T fill:#f9f,stroke:#333,stroke-width:2px\n", + " \n", + " classDef input fill:#lightblue,stroke:#333,stroke-width:1px;\n", + " class A,C,D,F,G,J,L,O,Q,S,U input;\n", + " \n", + " classDef output fill:#lightgreen,stroke:#333,stroke-width:1px;\n", + " class H,I,M,N,R,V output;\n", + "```\n", + "\n", + "Figure 1: The Amazon Nova family of models\n", + "\n", + "## Abstract\n", + "\n", + "We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.\n" + ] + } + ], + "source": [ + "print(page_nodes[0].get_content())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chart renders the diagram accurately:\n", + "\n", + "![Mermaid chart](./mermaid_render.png)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For comparison, standard mode does not address the chart at all:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "# The Amazon Nova Family of Models: Technical Report and Model Card\n", + "\n", + "# Amazon Artificial General Intelligence\n", + "\n", + "# Abstract\n", + "\n", + "We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.\n" + ] + } + ], + "source": [ + "print(base_page_nodes[0].get_content())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Triggering on tables\n", + "\n", + "**Page 11** contains a table, and we can see that auto-mode automatically switches to higher-quality parsing vs. the default parsed page.\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Auto mode has accurately got table 6 with headings and subheadings:" ] }, { @@ -237,7 +343,6 @@ "| Nova Micro | Nova Lite | Nova Pro |\n", "|------------|-----------|----------|\n", "| Nova Micro performance chart | Nova Lite performance chart | Nova Pro performance chart |\n", - "| Context Length | Context Length | Context Length |\n", "\n", "Figure 2: Text Needle-in-a-Haystack recall performance for Nova Micro (up-to 128k), Nova Lite (up-to 300k) and Nova Pro (up-to 300k) models.\n", "\n", @@ -269,6 +374,13 @@ "print(page_nodes[10].get_content())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "While in standard mode the subheadings end up merged into the first row:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -282,37 +394,33 @@ "\n", "| |Nova Micro|Nova Lite|Nova Pro|\n", "|---|---|---|---|\n", - "|10|10|10|100|\n", + "|10|10|10| |\n", "|20|20|20| |\n", - "|30|30|30|75|\n", + "|30|30|30| |\n", + "|2| | |75|\n", "|40|40|40| |\n", - "|50|50|50|50|\n", - "|60|60|60| |\n", - "|70|70|70|25|\n", - "|80|80|80| |\n", - "|90|90|90| |\n", + "|[ 50|50|50|50|\n", + "|1 60|60|60| |\n", + "|70|70|70| |\n", + "| |80|80|80|\n", + "| |90|90|90|\n", "|100|100|100| |\n", - "\n", - "Context Length\n", + "|3 3 8|3 3 4 %|3 8|3 3 8 8 3 8|\n", "\n", "Figure 2: Text Needle-in-a-Haystack recall performance for Nova Micro (up-to 128k), Nova Lite (up-to 300k) and Nova Pro (up-to 300k) models.\n", "\n", - "# SQuALITY\n", - "\n", - "# LVBench\n", - "\n", - "| |ROUGE-L|accuracy|\n", - "|---|---|---|\n", - "|Nova Pro|19.8 ±8.7|41.6 ±2.5|\n", - "|Nova Lite|19.2 ±8.6|40.4 ±2.4|\n", - "|Nova Micro|18.8 ±8.6|-|\n", - "|Claude 3.5 Sonnet (Jun)|13.4 ±7.5|-|\n", - "|Gemini 1.5 Pro (001)|-|33.1 ±2.3|\n", - "|Gemini 1.5 Pro (002)|19.1 ±8.6 M|-|\n", - "|Gemini 1.5 Flash (002)|18.1 ±8.4 M|-|\n", - "|GPT-4o|18.8 ±8.6|30.8 ±2.3|\n", - "|Llama 3 - 70B|16.4 ±8.1|-|\n", - "|Llama 3 - 8B|15.3 ±7.9|-|\n", + "| |SQuALITY|LVBench| |\n", + "|---|---|---|---|\n", + "|ROUGE-L|Nova Pro|19.8 ±8.7|41.6 ±2.5|\n", + "| |Nova Lite|19.2 ±8.6|40.4 ±2.4|\n", + "| |Nova Micro|18.8 ±8.6|-|\n", + "| |Claude 3.5 Sonnet (Jun)|13.4 ±7.5|-|\n", + "| |Gemini 1.5 Pro (001)|-|33.1 ±2.3|\n", + "| |Gemini 1.5 Pro (002)|19.1 ±8.6 M|-|\n", + "| |Gemini 1.5 Flash (002)|18.1 ±8.4 M|-|\n", + "| |GPT-4o|18.8 ±8.6|30.8 ±2.3|\n", + "| |Llama 3 - 70B|16.4 ±8.1|-|\n", + "| |Llama 3 - 8B|15.3 ±7.9|-|\n", "\n", "Table 6: Text and Multimodal long context performance on SQuALITY (ROUGE-L) and LVBench (Accuracy). For SQuALITY, measurements for Claude 3.5 Sonnet, GPT-4o, Llama 3 70B and Llama 3 8B are taken from the Llama 3 report [45]. Gemini results were measured by us2 (M). For LVBench, Gemini and GPT-4o numbers were taken from the corresponding benchmark leaderboard [77].\n", "\n", @@ -320,7 +428,9 @@ "\n", "# 2.4 Functional expertise\n", "\n", - "In addition to core capabilities, foundation models must perform well in particular specialties and domains. Across our many areas of performance analyses, we have selected four domains for which to present benchmarking results: Software engineering, financial analysis, and retrieval-augmented generation. Prompt templates for all benchmarks can be found in Appendix B.3.\n" + "In addition to core capabilities, foundation models must perform well in particular specialties and domains. Across our many areas of performance analyses, we have selected four domains for which to present benchmarking results: Software engineering, financial analysis, and retrieval-augmented generation. Prompt templates for all benchmarks can be found in Appendix B.3.\n", + "\n", + "11\n" ] } ], @@ -332,6 +442,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "## Rendering charts\n", + "\n", "**Page 14** contains all charts. Auto-mode detects these charts and uses premium processing to convert these charts into both tabular and mermaid format. Whereas the markdown mode has a few more challenges in converting the chart to markdown.\n", "\n", "![](page_14.png)" @@ -427,6 +539,21 @@ "print(page_nodes[13].get_content())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This table renders the three charts very neatly as a single Markdown table:\n", + "\n", + "| Model Family | Meta | Amazon | Google | Mistral AI | OpenAI | Anthropic |\n", + "|--------------|------|--------|--------|------------|--------|-----------|\n", + "| Time to First Token (sec) | 0.72 | 0.37 | 0.35 | 0.53 | 0.62 | 0.98 |\n", + "| Output Tokens per Second | 58 | 115 | 190 | 73 | 64 | 29 |\n", + "| Total Response Time (sec) | 2.9 | 1.4 | 0.9 | 2.4 | 2.7 | 4.0 |\n", + "\n", + "While standard mode does not do nearly as well:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -480,9 +607,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "## Text-only pages\n", + "\n", "**Page 3** is fully text, and we can see there's no difference between the auto-mode parsed page vs. the default markdown-mode parsed page. \n", "\n", - "![](page_3.png)" + "\n" ] }, { @@ -565,6 +694,226 @@ "print(base_page_nodes[2].get_content())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Triggering on strings\n", + "\n", + "Instead of trigger on specific structures, we can specify specific strings of interest to switch to Premium mode for. In this case, we'll re-parse the document but this time look for the word \"agents\" and switch to Premium mode for any page that contains that word." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Started parsing the file under job_id dae5b4bf-25c4-4373-98bf-91325d2de5e2\n" + ] + } + ], + "source": [ + "file_path = \"data/nova_technical_report.pdf\"\n", + "agent_parser = LlamaParse(\n", + " result_type=\"markdown\",\n", + " auto_mode=True,\n", + " # auto_mode_trigger_on_image_in_page=True,\n", + " # auto_mode_trigger_on_table_in_page=True,\n", + " auto_mode_trigger_on_text_in_page=\"agents\"\n", + " # auto_mode_trigger_on_regexp_in_page=\"\"\n", + ")\n", + "agent_documents = agent_parser.load_data(file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, these pages aren't going to be that different when parsed, but we can verify which pages triggered auto-made by looking at the [JSON output](https://github.com/run-llama/llama_parse/blob/main/examples/demo_json_tour.ipynb) of LlamaParse:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "json_output = agent_parser.get_json_result(file_path)[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the pages that contain the word \"agents\" are marked as `triggeredAutoMode=True`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Page 1: False\n", + "Page 2: False\n", + "Page 3: True\n", + "Page 4: False\n", + "Page 5: False\n", + "Page 6: False\n", + "Page 7: False\n", + "Page 8: True\n", + "Page 9: True\n", + "Page 10: False\n", + "Page 11: False\n", + "Page 12: False\n", + "Page 13: False\n", + "Page 14: False\n", + "Page 15: False\n", + "Page 16: False\n", + "Page 17: False\n", + "Page 18: False\n", + "Page 19: False\n", + "Page 20: False\n", + "Page 21: True\n", + "Page 22: False\n", + "Page 23: True\n", + "Page 24: False\n", + "Page 25: False\n", + "Page 26: False\n", + "Page 27: True\n", + "Page 28: False\n", + "Page 29: False\n", + "Page 30: False\n", + "Page 31: False\n", + "Page 32: False\n", + "Page 33: False\n", + "Page 34: False\n", + "Page 35: False\n", + "Page 36: False\n", + "Page 37: False\n", + "Page 38: False\n", + "Page 39: False\n", + "Page 40: False\n", + "Page 41: False\n", + "Page 42: False\n", + "Page 43: False\n" + ] + } + ], + "source": [ + "for page in json_output[\"pages\"]:\n", + " print(f\"Page {page['page']}: {page['triggeredAutoMode']}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Triggering on regular expressions\n", + "\n", + "Finally, if we have a more complicated pattern of interest, we can specify a regular expression to trigger on. In this case, we'll look for any page that contains the word \"agents\" or \"agentic\" in the text:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Started parsing the file under job_id 1a1749ca-5fb0-430e-a2cf-23a4f34f8b09\n" + ] + } + ], + "source": [ + "file_path = \"data/nova_technical_report.pdf\"\n", + "agentic_parser = LlamaParse(\n", + " result_type=\"markdown\",\n", + " auto_mode=True,\n", + " # auto_mode_trigger_on_image_in_page=True,\n", + " # auto_mode_trigger_on_table_in_page=True,\n", + " # auto_mode_trigger_on_text_in_page=\"agents\"\n", + " auto_mode_trigger_on_regexp_in_page=\"/(A|a)gent(s|ic)/g\",\n", + ")\n", + "agentic_json_output = agentic_parser.get_json_result(file_path)[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And if we once again examine the JSON output, you can see that a different set of pages have been upgraded:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Page 1: True\n", + "Page 2: True\n", + "Page 3: True\n", + "Page 4: False\n", + "Page 5: False\n", + "Page 6: False\n", + "Page 7: False\n", + "Page 8: True\n", + "Page 9: True\n", + "Page 10: True\n", + "Page 11: False\n", + "Page 12: False\n", + "Page 13: False\n", + "Page 14: False\n", + "Page 15: False\n", + "Page 16: False\n", + "Page 17: False\n", + "Page 18: False\n", + "Page 19: False\n", + "Page 20: False\n", + "Page 21: True\n", + "Page 22: False\n", + "Page 23: True\n", + "Page 24: False\n", + "Page 25: False\n", + "Page 26: False\n", + "Page 27: True\n", + "Page 28: False\n", + "Page 29: False\n", + "Page 30: False\n", + "Page 31: False\n", + "Page 32: False\n", + "Page 33: False\n", + "Page 34: False\n", + "Page 35: False\n", + "Page 36: False\n", + "Page 37: False\n", + "Page 38: False\n", + "Page 39: False\n", + "Page 40: False\n", + "Page 41: False\n", + "Page 42: False\n", + "Page 43: False\n" + ] + } + ], + "source": [ + "for page in agentic_json_output[\"pages\"]:\n", + " print(f\"Page {page['page']}: {page['triggeredAutoMode']}\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -636,9 +985,9 @@ ], "metadata": { "kernelspec": { - "display_name": "llama_parse", + "display_name": "Python 3", "language": "python", - "name": "llama_parse" + "name": "python3" }, "language_info": { "codemirror_mode": { diff --git a/examples/parsing_modes/mermaid_render.png b/examples/parsing_modes/mermaid_render.png new file mode 100644 index 0000000..6635ff8 Binary files /dev/null and b/examples/parsing_modes/mermaid_render.png differ diff --git a/examples/parsing_modes/page_1.png b/examples/parsing_modes/page_1.png new file mode 100644 index 0000000..efcae51 Binary files /dev/null and b/examples/parsing_modes/page_1.png differ