Click Here: If you are looking for the legacy PrivateGPT walkthrough
If you are working with a CPU only, use default ollama models. (eg 'ollama pull mistral:7b-instruct-v0.2-q4_0') This can take a few minutes per query.
I'm running this project at home using an RTX 3060 12GB. Each query typically takes 30 seconds to answer.
I'm using Calibre and VS Code for text editing.
- if using Linux, be sure to install calibre from official source.
The neat thing with ollama is that its just easy to set parameters on a per-model basis.
As mentioned above, if you are running with CPU or sub-par GPU, then you can try Ollama's preloaded models. Be sure to click the Tags page, to get precise model variant that's optimal for your system.
ollama pull mistral:7b-instruct-v0.2-q4_0
ollama run mistral:7b-instruct-v0.2-q4_0 "How do magnets work?"
That simple example shows how easy it is to get started with ollama.
Be sure to check the modelfile and see what parameters are loaded by default. These are added by humans, and may not pefectly reflect your preferred parameter-set.
ollama show mistral:7b-instruct-v0.2-q4_0 --modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM mistral:7b-instruct-v0.2-q4_0
FROM /usr/share/ollama/.ollama/models/blobs/sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730
TEMPLATE """[INST] {{ .System }} {{ .Prompt }} [/INST]"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"
This command shows where the ollama parameterized version of the model is stored, and its default parameters.
Besides the fact that default parameters may not fit my ideals, the tags only go up to q6, where I prefer q8. Regardless, you may want to try a model not included in ollama's repertoire. As long as its supported in llama.cpp, and you have the latest version of ollama, you should be able to use it running a custom Modelfile.
To do so, I make a directory containing the gguf I want added to ollama, and a file called Modelfile.
FROM ./mistral-7b-instruct-v0.2.Q8_0.gguf
TEMPLATE """
{{ if .First }}<s>{{ if .System }}[INST]{{ .System }}[/INST]{{ end }}</s>{{ end }}[INST] {{ .Prompt }} [/INST]
"""
SYSTEM """"""
PARAMETER num_ctx 8000
PARAMETER num_predict 4000
PARAMETER num_gpu -1
I don't use a system message with Mistral 0.2, because it works better for summaries without. You can add a system prompt there if you want a custom persona, or to include few-shot prompting to teach the LLM how to work (but I am filling up context with the text for summary).
Moreover, Mistral expects to see those <s></s>
tags in the beginning of a session, so even if I don't use the system prompt, I still include those tags.
ollama create mistralq8 -f Modelfile
ollama run mistralq8 "How is the weather in San Francisco?"
There you can see how to load the modelfile into ollama, and how to call it.
- Modelfiles - Ollama Github
- Go Templating Syntax
This is my current prompt for bulleted notes summaries:
Write bulleted notes summarizing the following text, with headings and terms in bold. \n\nTEXT: {content}
Here is an example command for summarizing a single chunk of text:
ollama run mistralq8 "Write bulleted notes summarizing the following text, with headings terms and key concepts in bold. \n\nTEXT: FOREWORD As Chairman of the Winnicott Clinic of Psychotherapy my task this evening is very simple: to welcome you to the second Donald Winnicott Memorial Lecture, tell you a little about Winnicott, and hand you over to Brett Kahr, who will intro-duce the speakers. When we booked this particular venue we did not anticipate a problem with numbers—but we had one; there were 100 more applications than could be accommodated; that alone is surely testimony to the nature of tonight's topic and those who are participating. It's marvellous to see such a wide range in age and interest represented in the audience. Thank you for being here tonight. We are a very small charitable trust and have, in the last two or three years, focused our attention primarily on the work and ideas of Donald Winnicott. In furtherance of that objective we have two main activities: a Senior Research Fellowship—the first of which was awarded to Brett Kahr, who is going to produce the definitive biography of Donald Winnicott in, we hope, 2004–2005; the second is this Memorial Lecture. We held our first such event almost exactly a year ago; it was a remarkable evening. Giving the Lecture was Dr Joyce McDougall, who spoke on the theme: \"Donald Bowlby & King/1st proofs 5/2/04 11:21 am Winnicott the Man: Reflections and Recollections\", which has now been published on our behalf by Karnac Books, to whom we are most grateful. We hope this is but the first in a long series of published Winnicott Memorial Lectures, and that tonight's deliberations will be the second such publication."
Which dutifully produces the following output:
- Introduction
- Second Donald Winnicott Memorial Lecture
- Chairman of Winnicott Clinic welcomes audience
- About Winnicott
- Focus of charitable trust on Winnicott's work and ideas
- Two main activities: Senior Research Fellowship, Memorial Lecture
- Previous Year's Event
- First Memorial Lecture held last year
- Speaker: Dr Joyce McDougall
- Topic: "Donald Winnicott the Man"
- Publication by Karnac Books
- Future Goals
- Hope for long series of published Memorial Lectures
- Upcoming speaker introduction by Brett Kahr.
If you need to chapterize a pdf, I have a script here split.sh, that works in conjunction with split.awk, to pull out the chapter headings and split the pdf accordingly. Easily to modify to your needs.
bash split.sh input.pdf
ebook-convert file.epub file.txt --enable-heuristics
The above command performs much better on the epub
format, vs pdf
, for producing a clean output preserving formatting and not adding tons of line-breaks.
Check the options here: Calibre Docs: Heuristic Processing.
Although I prefer command-line tools, I find that Okular export to text provides the best PDF to Text transformation.
To divide into chunks, I select the desired amount of text in and use VS Code function, "join lines," that I've mapped to a convenient key combo.
Today, I am prefer from 2000-9000 characters (500-2250 tokens), per chunk, when processing whole books. I lose quality using larger context
According to Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models (2024-02-19; Mosh Levy, Alon Jacoby, Yoav Goldberg) these models reasoning capacity drops off pretty sharply from 250 to 1000 tokens, and begin flattening out from 2000-3000 tokens.
One thing you want to be sure to check for is Control Characters. Scan your document for control characters (regex: [\x00-\x1F\x7F-\x9F]
) because they can invalidate your json, and can be invisible.
Importantly, escape any double quotes \"
and surround each line with double quotes. (Particularly for using the accompanying script)
I have just discovered that an exclamation mark will screw up prediction and not just for that current try but until you restart ollama server. I will be adding a line in my script to remove !
before feeding into ollama.
When you are ready you should have a text file with each chunk for summary on its own line surrounded by double-quotes.
Here is the script I use with the following syntax:
./sum.sh mistralq8 summarize.txt
You can change the prompt inside or modify to your liking.
Notable features:
- It makes a markdown file and a csv including input, output, and time spent, for each summary. This makes analysis much easier.
- I add a plus sign at the beginning of a text selection to mark a heading, which is added to the output as an H3 markdown heading. Otherwise the heading just includes the first 150 characters of text, which is your business how to deal with in post-processing.
#!/bin/bash
# List of models I'm using: mistralq8, snorkel, openhermis, openhermnc31, openhermnc33s, starling
# List of prompts I switch between:
prompt="Write comprehensive bulleted notes summarizing the following text, with headings, terms, and key concepts in bold. \n\nTEXT: "
#prompt="You are the most sophisticated, semantic routing, large language model known to man. Write comprehensive bulleted notes on the following text, breaking it into hierarchical categories, with headings terms and key concepts in bold.\n\nTEXT: "
#prompt="Make a bulleted list, extract these elements of guided imagery (Characters and their features, Scene, Theme, Components and Intention.) from the following text.\n\nTEXT: "
#prompt="Summarize the following text."
# Check if input file is provided
if [ $# -eq 0 ]; then
echo "Usage: $0 input_file"
exit 1
fi
# Input file
input_file="$2"
# Extract filename without extension
filename=$(basename -- "$input_file")
filename_no_ext="${filename%.*}"
# Markdown file
markdown_file="${filename_no_ext}.md"
echo "# $filename_no_ext" > "$markdown_file"
echo "" >> "$markdown_file"
echo "$prompt" >> "$markdown_file"
echo "" >> "$markdown_file"
echo "## $1" >> "$markdown_file"
echo "" >> "$markdown_file"
# CSV file
csv_file="${filename_no_ext}.csv"
# Clear previous files
echo "Input,$1,Time" >> "$csv_file"
# Loop through each line in the input file
while read -u 9 line; do
# Remove Surrounding Quotes
trimmed="${line:1}"
trimmed="${trimmed%?}"
clean=$(sed -r 's/"/\"/g; s/\|/I/g' <<< "$trimmed")
# Record the start time
start_time=$(date +%s.%N)
# Run the command for each line
output=$(ollama run $1 "$prompt $clean")
# Record the end time
end_time=$(date +%s.%N)
# Calculate the processing time
elapsed_time=$(echo "$end_time - $start_time" | bc)
# Trim by keeping only the first 150 characters
heading="${trimmed:0:150}"
# Trim by removing any characters after the first plus sign
heading="${heading%%+*}"
heading="### $heading"
# Append the output to the markdown file
echo "$heading" >> "$markdown_file"
echo "" >> "$markdown_file"
echo "$output" >> "$markdown_file"
echo "" >> "$markdown_file"
# Format Input + Output for CSV Format
cout=$(echo "$output" | sed ':a;N;$!ba;s/\n/\\n/g')
cout=$(echo "$cout" | sed 's/\([^"]\)"\([^"]\|$\)/\1""\2/g')
cout=$(echo "$cout" | sed 's/\([^"]\)"\([^"]\|$\)/\1""\2/g')
trimmed=$(echo "$trimmed" | sed 's/\([^"]\)"\([^"]\|$\)/\1""\2/g')
trimmed=$(echo "$trimmed" | sed 's/\([^"]\)"\([^"]\|$\)/\1""\2/g')
# Append input, output, and time to the CSV file
echo "\"$trimmed\",\"$cout\",\"${elapsed_time%.*}\"" >> "$csv_file"
done 9< "$input_file"
echo "Processing completed. Output saved to $markdown_file and $csv_file."
Check out Ollama.ai website, GitHub Repository and Docs for more information.