Skip to content

Latest commit

 

History

History
298 lines (263 loc) · 10.5 KB

README.md

File metadata and controls

298 lines (263 loc) · 10.5 KB

DeepDive

A command-line tool for conducting in-depth research on a topic or question, leveraging the power of web searches and AI-driven insights.

Overview

DeepDive is designed to help users gain a deeper understanding of a subject by automatically running multiple web searches, aggregating information, and generating a comprehensive knowledge graph. This tool is perfect for researchers, students, and anyone looking to explore complex topics in a structured and efficient manner.

Getting Started

Installation

To install DeepDive, simply clone this repository and build the executable using the following command:

go build bin/single-search/single-search.go 

Usage

Run DeepDive with the following command:

./single-search [options]

Options

  • --engine-url: Specify the LLM API endpoint (default: http://localhost:7999/v1/chat/completions)
  • --model: Choose the AI model to use (default: qwen2-72b-32k:latest)
  • --page-cache: Set the path to the web cache file (default: page-cache.db)
  • --token: Provide the token for the endpoint (optional)
  • --question: Specify the question or topic to research (default: OSINT analytics)
  • --output-graph-path: Set the path to the generated knowledge graph (default: tree.md)

Example Usage

./single-search --question "What are the best seaside beaches near Milan, Italy?" --output-graph-path milan_beaches.md

This command will generate a knowledge graph for the specified question and save it to milan_beaches.md.

Example Output

Here's an example of the output generated by DeepDive:

Got these questions in 6.656039333s
0000. Q: What is considered the most scenic coastal beach within a 2-hour drive from Milan?
0001. Q: Which lesser-known seaside gems near Milan offer pristine waters and peaceful surroundings for visitors seeking solitude?
0002. Q: How do expert travelers recommend accessing the Italian Riviera's finest beaches from central Milan by public transport?
...
Generated 30 search queries in 9.108865709s
0003. best Ligurian Riviera beaches for day trips from Milan
0001. hidden coastal treasures close to Milan for tranquility
0002. unexplored beach spots near Milan with serene atmosphere
0000. lesser-known seaside destinations near Milan with clear waters
        Search took: 1.225090709s, got 43 results.
        Search took: 1.242248292s, got 12 results.
        Search took: 1.380481584s, got 9 results.

Example Knowledge Graph

Here's an example of the knowledge graph generated by DeepDive:

graph LR
    node0[Alassio]
    node1[Bagni Beta Beach]
    node2[Baia dei Saraceni]
    node3[Camogli]
    node4[Cinque Terre]
    node5[Excelsior Palace]
    node6[Excelsior Palace hotel]
    node7[Finale Ligure]
    node8[Italian Riviera]
    node9[Italy]
    node10[Lake Como]
    node11[Liguria]
    node12[Ligurian Riviera]
    node13[Milan]
    node14[Portofino]
    node15[Portofino Bay]
    node16[Sestri Levante]
    node17[Spiaggia di San Fruttuoso]
    node18[Spiaggia di Varigotti]
    node19[Ticino]
    node20[Trebbia]
    node21[Varazze]
    node22[a tiny peninsula]
    node23[best seaside beaches]
    node24[country]
    node25[crystal clear green/blue waters]
    node26[just north of Cinque Terre]
    node27[luxury 5-star hotel]
    node28[mix of sand and rocks]
    node29[nice restaurant and bar]
    node30[one of the best beach towns near Milan]
    node31[palm-lined beaches]
    node32[part of Italian Riviera]
    node33[region]
    node34[respite]
    node35[sea beaches]
    node36[seaside beach]
    node37[seaside beaches]
    node38[serene atmosphere]
    node39[train]
    node12 -->|near| node7
    node12 -->|has beaches| node1
    node12 -->|is| node32
    node12 -->|contains| node18
    node12 -->|near| node13
    node12 -->|has beaches| node17
    node12 -->|part of| node9
    node12 -->|location of| node21
    node12 -->|location of| node3
    node12 -->|location of| node0
    node12 -->|has beaches| node2
    node18 -->|near| node9
    node18 -->|part of| node12
    node18 -->|part of| node8
    node18 -->|provides| node34
    node18 -->|is a| node36
    node18 -->|near| node13
    node7 -->|home to| node2
    node7 -->|part of| node12
    node7 -->|part of| node8
    node8 -->|related to| node7
    node8 -->|part of| node12
    node8 -->|includes| node2
    node8 -->|includes| node18
    node8 -->|near| node13
    node8 -->|near| node9
    node11 -->|close to| node13
    node11 -->|part of| node9
    node11 -->|known for| node35
    node11 -->|is| node33
    node11 -->|has| node3
    node14 -->|has| node5
    node14 -->|is| node36
    node14 -->|near| node13
    node14 -->|near| node9
    node14 -->|part of| node15
    node13 -->|in country| node9
    node13 -->|near| node12
    node13 -->|nearby water bodies| node20
    node13 -->|nearby beaches| node11
    node13 -->|near| node21
    node13 -->|near| node2
    node13 -->|nearby beaches| node18
    node13 -->|related to| node8
    node13 -->|near| node14
    node13 -->|near| node5
    node13 -->|near| node1
    node13 -->|nearby beaches| node16
    node13 -->|distance to| node4
    node13 -->|near bay| node15
    node13 -->|near| node17
    node13 -->|near| node0
    node13 -->|near hotel| node6
    node13 -->|nearby water bodies| node10
    node13 -->|nearby water bodies| node19
    node13 -->|near| node3
    node2 -->|near| node13
    node2 -->|near| node9
    node2 -->|is a| node36
    node2 -->|located in| node7
    node2 -->|part of| node12
    node2 -->|known for| node25
    node2 -->|known for| node28
    node2 -->|known for| node29
    node15 -->|located in| node14
    node15 -->|near| node13
    node15 -->|near| node9
    node15 -->|overlooked by| node5
    node15 -->|is| node36
    node3 -->|located in| node11
    node3 -->|near| node13
    node3 -->|type| node36
    node3 -->|requires travel by| node39
    node16 -->|involves| node22
    node16 -->|near| node13
    node16 -->|part of| node9
    node16 -->|location| node26
    node16 -->|offers| node38
    node16 -->|offers| node31
    node16 -->|considered| node30
    node4 -->|near| node13
    node4 -->|located just north of| node16
    node4 -->|part of| node9
    node9 -->|is home to| node23
    node9 -->|home to| node7
    node9 -->|includes| node6
    node9 -->|is| node24
    node9 -->|nearby locations| node16
    node9 -->|home to| node12
    node9 -->|home to| node8
    node9 -->|contains| node19
    node9 -->|contains| node20
    node9 -->|location for| node37
    node9 -->|contains| node4
    node9 -->|home to| node2
    node9 -->|home to| node18
    node9 -->|home to| node14
    node9 -->|has coastline in| node11
    node9 -->|includes| node13
    node9 -->|home to| node15
    node9 -->|home to| node5
    node9 -->|contains| node3
    node9 -->|contains| node10
    node5 -->|part of| node14
    node5 -->|near| node13
    node5 -->|described as| node27
    node5 -->|overlooks| node15
Loading

How it Works

DeepDive uses a combination of web searches and AI-driven insights to generate a comprehensive knowledge graph. The tool:

  1. Conducts multiple web searches using the specified question or topic, leveraging the power of Searxng, a local search meta-engine that aggregates results from multiple search engines.
  2. Aggregates the search results and extracts relevant information.
  3. Uses the chosen AI model to analyze the extracted information and generate insights.
  4. Constructs a knowledge graph representing the relationships between the extracted concepts.

Prompt Compiler

DeepDive includes a prompt compiler, mk-prompt, which is used to generate prompts for the AI model. The prompt compiler takes a task description and generates a prompt template in XML format.

mk-prompt Options

  • --engine-url: Specify the LLM API endpoint (default: http://localhost:7999/v1/chat/completions)
  • --model: Choose the AI model to use (default: qwen2:72b-instruct-q6_K)
  • --token: Provide the token for the endpoint (optional)
  • --task-description: Specify the task description (default: Bot will be provided with two variables - ContextandName. The context is a list of names of different persons in various forms and variations. The name is a name of a person. The goal is to provide a JSON list of all forms and variations of the person's name mentioned in context.)

Example Prompt Output

Here's an example of the prompt output generated by mk-prompt:

<!-- Begin Prompt Template -->
<instructions>
To complete this task effectively, follow these steps:
1. Begin by analyzing the `Context`, which is a list of names in various forms.
2. Identify all occurrences of variations or exact matches of `Name` within the context.
3. Compile a comprehensive JSON representation of all found forms and variations of the name.
4. Ensure that your output does not contain any XML tags, maintaining clean JSON format.
5. To avoid repetition, only include each unique variation once in the final list.

</instructions>
<!-- Define Examples to Enhance Understanding -->
<examples>
    <!-- Example 1 -->
    <example>
        Context: ["John Doe", "Doe John", "doe john", "Jane Doe", "[email protected]", "JohN"]
        Name: "John Doe"
        
        Resulting JSON list should contain: ['John Doe', 'Doe John', 'doe john', 'JohN']
    </example>
    
    <!-- Example 2 -->
    <example>
        Context: ["William Shakespeare", "shakespeare", "Shakespear"]
        Name: "William Shakespeare"
        
        Resulting JSON list should contain: ['William Shakespeare', 'shakespeare']
    </example>
    
    <!-- Example 3 -->
    <example>
        Context: ["Madonna Louise Ciccone", "MDNNA", "MADONNA", "madonna"]
        Name: "Madonna"
        
        Resulting JSON list should contain: ['Madonna Louise Ciccone', 'MDNNA', 'MADONNA', 'madonna']
    </example>
</examples>

<!-- End of Prompt Template -->

Acknowledgments

We would like to thank the developers of Dify for their work on prompt engineering, which inspired our prompt compiler. Dify is released under the Apache 2.0 license.

Contributing

We welcome contributions to DeepDive! If you'd like to report a bug, suggest a feature, or submit a pull request, please see our CONTRIBUTING.md file for guidelines.

License

DeepDive is released under the MIT License.