explosion · svlandeg · Jan 5, 2024 · Dec 27, 2023 · Dec 27, 2023 · Dec 29, 2023
diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx
@@ -9,8 +9,8 @@ menu:
   - ['Various Functions', 'various-functions']
 ---
 
-[The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large
-Language Models (LLMs) into spaCy, featuring a modular system for **fast
+[The `spacy-llm` package](https://github.com/explosion/spacy-llm) integrates
+Large Language Models (LLMs) into spaCy, featuring a modular system for **fast
 prototyping** and **prompting**, and turning unstructured responses into
 **robust outputs** for various NLP tasks, **no training data** required.
 
@@ -202,13 +202,82 @@ not require labels.
 
 ## Tasks {id="tasks"}
 
-### Task implementation {id="task-implementation"}
+In `spacy-llm`, a _task_ defines an NLP problem or question and its solution
+using an LLM. It does so by implementing the following responsibilities:
 
-A _task_ defines an NLP problem or question, that will be sent to the LLM via a
-prompt. Further, the task defines how to parse the LLM's responses back into
-structured information. All tasks are registered in the `llm_tasks` registry.
+1. Loading a prompt template and injecting documents' data into the prompt.
+   Optionally, include fewshot examples in the prompt.
+2. Splitting the prompt into several pieces following a map-reduce paradigm,
+   _if_ the prompt is too long to fit into the model's context and the task
+   supports sharding prompts.
+3. Parsing the LLM's responses back into structured information and validating
+   the parsed output.
 
-#### task.generate_prompts {id="task-generate-prompts"}
+Two different task interfaces are supported: `ShardingLLMTask` and
+`NonShardingLLMTask`. Only the former supports the sharding of documents, i. e.
+splitting up prompts if they are too long.
+
+All tasks are registered in the `llm_tasks` registry.
+
+### On Sharding {id="task-sharding"}
+
+"Sharding" describes, generally speaking, the process of distributing parts of a
+dataset across multiple storage units for easier processing and lookups. In
+`spacy-llm` we use this term (synonymously: "mapping") to describe the splitting
+up of prompts if they are too long for a model to handle, and "fusing"
+(synonymously: "reducing") to describe how the model responses for several shars
+are merged back together into a single document.
+
+Prompts are broken up in a manner that _always_ keeps the prompt in the template
+intact, meaning that the instructions to the LLM will always stay complete. The
+document content however will be split, if the length of the fully rendered
+prompt exceeds a model context length.
+
+A toy example: let's assume a model has a context window of 25 tokens and the
+prompt template for our fictional, sharding-supporting task looks like this:
+
+```
+Estimate the sentiment of this text:
+"{text}"
+Estimated entiment:
+```
+
+Depening on how tokens are counted exactly (this is a config setting), we might
+come up with `n = 12` tokens for the number of tokens in the prompt
+instructions. Furthermore let's assume that our `text` is "This has been
+amazing - I can't remember the last time I left the cinema so impressed." -
+which has roughly 19 tokens.
+
+Considering we only have 13 tokens to add to our prompt before we hit the
+context limit, we'll have to split our prompt into two parts. Thus `spacy-llm`,
+assuming the task used supports sharding, will split the prompt into two (the
+default splitting strategy splits by tokens, but alternative splitting
+strategies splitting e. g. by sentences can be configured):
+
+_(Prompt 1/2)_
+
+```
+Estimate the sentiment of this text:
+"This has been amazing - I can't remember "
+Estimated entiment:
+```
+
+_(Prompt 2/2)_
+
+```
+Estimate the sentiment of this text:
+"the last time I left the cinema so impressed."
+Estimated entiment:
+```
+
+The reduction step is task-specific - a sentiment estimation task might e. g. do
+a weighted average of the sentiment scores. Note that prompt sharding introduces
+potential inaccuracies, as the LLM won't have access to the entire document at
+once. Depending on your use case this might or might not be problematic.
+
+### `NonShardingLLMTask` {id="task-nonsharding"}
+
+#### task.generate_prompts {id="task-nonsharding-generate-prompts"}
 
 Takes a collection of documents, and returns a collection of "prompts", which
 can be of type `Any`. Often, prompts are of type `str` - but this is not
@@ -219,7 +288,7 @@ enforced to allow for maximum flexibility in the framework.
 | `docs`      | The input documents. ~~Iterable[Doc]~~   |
 | **RETURNS** | The generated prompts. ~~Iterable[Any]~~ |
 
-#### task.parse_responses {id="task-parse-responses"}
+#### task.parse_responses {id="task-non-sharding-parse-responses"}
 
 Takes a collection of LLM responses and the original documents, parses the
 responses into structured information, and sets the annotations on the
@@ -230,19 +299,44 @@ defined fields.
 The `responses` are of type `Iterable[Any]`, though they will often be `str`
 objects. This depends on the return type of the [model](#models).
 
-| Argument    | Description                                |
-| ----------- | ------------------------------------------ |
-| `docs`      | The input documents. ~~Iterable[Doc]~~     |
-| `responses` | The generated prompts. ~~Iterable[Any]~~   |
-| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ |
+| Argument    | Description                                            |
+| ----------- | ------------------------------------------------------ |
+| `docs`      | The input documents. ~~Iterable[Doc]~~                 |
+| `responses` | The responses received from the LLM. ~~Iterable[Any]~~ |
+| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~             |
 
-### Raw prompting {id="raw"}
+### `ShardingLLMTask` {id="task-sharding"}
 
-Different to all other tasks `spacy.Raw.vX` doesn't provide a specific prompt,
-wrapping doc data, to the model. Instead it instructs the model to reply to the
-doc content. This is handy for use cases like question answering (where each doc
-contains one question) or if you want to include customized prompts for each
-doc.
+#### task.generate_prompts {id="task-sharding-generate-prompts"}
+
+Takes a collection of documents, breaks them up into shards if necessary to fit
+all content into the model's context, and returns a collection of collections of
+"prompts" (i. e. each doc can have multiple shards, each of which have exactly
+one prompt), which can be of type `Any`. Often, prompts are of type `str` - but
+this is not enforced to allow for maximum flexibility in the framework.
+
+| Argument    | Description                                        |
+| ----------- | -------------------------------------------------- |
+| `docs`      | The input documents. ~~Iterable[Doc]~~             |
+| **RETURNS** | The generated prompts. ~~Iterable[Iterable[Any]]~~ |
+
+#### task.parse_responses {id="task-sharding-parse-responses"}
+
+Receives a collection of collections of LLM responses (i. e. each doc can have
+multiple shards, each of which have exactly one prompt / prompt response) and
+the original shards, parses the responses into structured information, sets the
+annotations on the shards, and merges back doc shards into single docs. The
+`parse_responses` function is free to set the annotations in any way, including
+`Doc` fields like `ents`, `spans` or `cats`, or using custom defined fields.
+
+The `responses` are of type `Iterable[Iterable[Any]]`, though they will often be
+`str` objects. This depends on the return type of the [model](#models).
+
+| Argument    | Description                                                      |
+| ----------- | ---------------------------------------------------------------- |
+| `shards`    | The input document shards. ~~Iterable[Iterable[Doc]]~~           |
+| `responses` | The responses received from the LLM. ~~Iterable[Iterable[Any]]~~ |
+| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~                       |
 
 ### Translation {id="translation"}
 
@@ -295,6 +389,14 @@ target_lang = "Spanish"
 path = "translation_examples.yml"
 ```
 
+### Raw prompting {id="raw"}
+
+Different to all other tasks `spacy.Raw.vX` doesn't provide a specific prompt,
+wrapping doc data, to the model. Instead it instructs the model to reply to the
+doc content. This is handy for use cases like question answering (where each doc
+contains one question) or if you want to include customized prompts for each
+doc.
+
 #### spacy.Raw.v1 {id="raw-v1"}
 
 Note that since this task may request arbitrary information, it doesn't do any
@@ -1239,9 +1341,15 @@ A _model_ defines which LLM model to query, and how to query it. It can be a
 simple function taking a collection of prompts (consistent with the output type
 of `task.generate_prompts()`) and returning a collection of responses
 (consistent with the expected input of `parse_responses`). Generally speaking,
-it's a function of type `Callable[[Iterable[Any]], Iterable[Any]]`, but specific
+it's a function of type
+`Callable[[Iterable[Iterable[Any]]], Iterable[Iterable[Any]]]`, but specific
 implementations can have other signatures, like
-`Callable[[Iterable[str]], Iterable[str]]`.
+`Callable[[Iterable[Iterable[str]]], Iterable[Iterable[str]]]`.
+
+Note: the model signature expects a nested iterable so it's able to deal with
+sharded docs. Unsharded docs (i. e. those produced by (nonsharding
+tasks)[/api/large-language-models#task-nonsharding]) are reshaped to fit the
+expected data structure.
 
 ### Models via REST API {id="models-rest"}
 

diff --git a/website/docs/usage/large-language-models.mdx b/website/docs/usage/large-language-models.mdx
@@ -340,15 +340,45 @@ A _task_ defines an NLP problem or question, that will be sent to the LLM via a
 prompt. Further, the task defines how to parse the LLM's responses back into
 structured information. All tasks are registered in the `llm_tasks` registry.
 
-Practically speaking, a task should adhere to the `Protocol` `LLMTask` defined
-in [`ty.py`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/ty.py).
-It needs to define a `generate_prompts` function and a `parse_responses`
-function.
+Practically speaking, a task should adhere to the `Protocol` named `LLMTask`
+defined in
+[`ty.py`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/ty.py). It
+needs to define a `generate_prompts` function and a `parse_responses` function.
 
-| Task                                                                        | Description                                                                                                                                                  |
-| --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| [`task.generate_prompts`](/api/large-language-models#task-generate-prompts) | Takes a collection of documents, and returns a collection of "prompts", which can be of type `Any`.                                                          |
-| [`task.parse_responses`](/api/large-language-models#task-parse-responses)   | Takes a collection of LLM responses and the original documents, parses the responses into structured information, and sets the annotations on the documents. |
+Tasks may support prompt sharding (for more info see the API docs on
+[sharding](/api/large-language-models#task-sharding) and
+[non-sharding](/api/large-language-models#task-nonsharding) tasks). The function
+signatures for `generate_prompts` and `parse_responses` depend on whether they
+do.
+
+| _For tasks *not supporting* sharding:_ | Task | Description |     |
+| -------------------------------------- | ---- | ----------- | --- |
+
+---
+
+| |
+[`task.generate_prompts`](/api/large-language-models#task-nonsharding-generate-prompts)
+| Takes a collection of documents, and returns a collection of prompts, which
+can be of type `Any`. | |
+[`task.parse_responses`](/api/large-language-models#task-nonsharding-parse-responses)
+| Takes a collection of LLM responses and the original documents, parses the
+responses into structured information, and sets the annotations on the
+documents. |
+
+| _For tasks *supporting* sharding:_ | Task | Description |     |
+| ---------------------------------- | ---- | ----------- | --- |
+
+---
+
+| |
+[`task.generate_prompts`](/api/large-language-models#task-sharding-generate-prompts)
+| Takes a collection of documents, and returns a collection of collections of
+prompt shards, which can be of type `Any`. | |
+[`task.parse_responses`](/api/large-language-models#task-sharding-parse-responses)
+| Takes a collection of collections of LLM responses (one per prompt shard) and
+the original documents, parses the responses into structured information, sets
+the annotations on the doc shards, and merges those doc shards back into a
+single doc instance. |
 
 Moreover, the task may define an optional [`scorer` method](/api/scorer#score).
 It should accept an iterable of `Example` objects as input and return a score
@@ -370,7 +400,9 @@ evaluate the component.
 | [`spacy.TextCat.v2`](/api/large-language-models#textcat-v2)             | Version 2 builds on v1 and includes an improved prompt template.                                                  |
 | [`spacy.TextCat.v1`](/api/large-language-models#textcat-v1)             | Version 1 of the built-in TextCat task supports both zero-shot and few-shot prompting.                            |
 | [`spacy.Lemma.v1`](/api/large-language-models#lemma-v1)                 | Lemmatizes the provided text and updates the `lemma_` attribute of the tokens accordingly.                        |
+| [`spacy.Raw.v1`](/api/large-language-models#raw-v1)                     | Executes raw doc content as prompt to LLM.                                                                        |
 | [`spacy.Sentiment.v1`](/api/large-language-models#sentiment-v1)         | Performs sentiment analysis on provided texts.                                                                    |
+| [`spacy.Translation.v1`](/api/large-language-models#translation-v1)     | Translates doc content into the specified target language.                                                        |
 | [`spacy.NoOp.v1`](/api/large-language-models#noop-v1)                   | This task is only useful for testing - it tells the LLM to do nothing, and does not set any fields on the `docs`. |
 
 #### Providing examples for few-shot prompts {id="few-shot-prompts"}