Skip to content

Commit

Permalink
Merge branch 'main' into update-badges
Browse files Browse the repository at this point in the history
  • Loading branch information
maykcaldas authored Jan 18, 2025
2 parents 22dfeb9 + a1c53d3 commit 1db194a
Show file tree
Hide file tree
Showing 13 changed files with 336 additions and 61 deletions.
163 changes: 163 additions & 0 deletions docs/tutorials/querying_with_clinical_trials.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# PaperQA2 for Clinical Trials

PaperQA2 now natively supports querying clinical trials in addition to any documents supplied by the user. It
uses a new tool, the aptly named `clinical_trials_search` tool. Users don't have to provide any clinical
trials to the tool itself, it uses the `clinicaltrials.gov` API to retrieve them on the fly. As of
January 2025, the tool is not enabled by default, but it's easy to configure. Here's an example
where we query only clinical trials, without using any documents:

```python
from paperqa import Settings, agent_query

answer_response = await agent_query(
query="What drugs have been found to effectively treat Ulcerative Colitis?",
settings=Settings.from_name("search_only_clinical_trials"),
)

print(answer_response.session.answer)
```

### Output

Several drugs have been found to effectively treat Ulcerative Colitis (UC),
targeting different mechanisms of the disease.

Golimumab, a tumor necrosis factor (TNF) inhibitor marketed as Simponi®, has demonstrated efficacy
in treating moderate-to-severe UC. Administered subcutaneously, it was shown to maintain clinical
response through Week 54 in patients, as assessed by the Partial Mayo Score (NCT02092285).

Mesalazine, an anti-inflammatory drug, is commonly used for UC treatment. In a study comparing
mesalazine enemas to faecal microbiota transplantation (FMT) for left-sided UC,
mesalazine enemas (4g daily) were effective in inducing clinical remission (Mayo score ≤ 2) (NCT03104036).

Antibiotics have also shown potential in UC management. A combination of doxycycline,
amoxicillin, and metronidazole induced remission in 60-70% of patients with moderate-to-severe
UC in prior studies. These antibiotics are thought to alter gut microbiota, reducing pathobionts
and promoting beneficial bacteria (NCT02217722, NCT03986996).

Roflumilast, a phosphodiesterase-4 (PDE4) inhibitor, is being investigated for mild-to-moderate UC.
Preliminary findings suggest it may improve disease severity and biochemical markers when
added to conventional treatments (NCT05684484).

These treatments highlight diverse therapeutic approaches, including immunosuppression,
microbiota modulation, and anti-inflammatory mechanisms.

You can see the in-line citations for each clinical trial used as a response for each query. If you'd like
to see more data on the specific contexts that were used to answer the query:

```python
print(answer_response.session.contexts)
```

[Context(context='The excerpt mentions that a search on ClinicalTrials.gov for clinical trials related to drugs
treating Ulcerative Colitis yielded 689 trials. However, it does not provide specific information about which
drugs have been found effective for treating Ulcerative Colitis.', text=Text(text='', name=...

Using `Settings.from_name('search_only_clinical_trials')` is a shortcut, but note that you can easily
add `clinical_trial_search` into any custom `Settings` by just explicitly naming it as a tool:

```python
from pathlib import Path
from paperqa import Settings, agent_query, AgentSetting
from paperqa.agents.tools import DEFAULT_TOOL_NAMES

# you can start with the default list of PaperQA tools
print(DEFAULT_TOOL_NAMES)
# >>> ['paper_search', 'gather_evidence', 'gen_answer', 'reset', 'complete'],

# we can start with a directory with a potentially useful paper in it
print(list(Path("my_papers").iterdir()))

# now let's query using standard tools + clinical_trials
answer_response = await agent_query(
query="What drugs have been found to effectively treat Ulcerative Colitis?",
settings=Settings(
paper_directory="my_papers",
agent={"tool_names": DEFAULT_TOOL_NAMES + ["clinical_trials_search"]},
),
)

# let's check out the formatted answer (with references included)
print(answer_response.session.formatted_answer)
```

Question: What drugs have been found to effectively treat Ulcerative Colitis?

Several drugs have been found effective in treating Ulcerative Colitis (UC), with treatment
strategies varying based on disease severity and extent. For mild-to-moderate UC, 5-aminosalicylic
acid (5-ASA) is the first-line therapy. Topical 5-ASA, such as mesalazine suppositories (1 g/day),
is effective for proctitis or distal colitis, inducing remission in 31-80% of patients. Oral mesalazine
at higher doses (e.g., 4.8 g/day) can accelerate clinical improvement in more extensive disease
(meier2011currenttreatmentof pages 1-2; meier2011currenttreatmentof pages 3-4).

For moderate-to-severe cases, corticosteroids are commonly used. Oral steroids like prednisolone
(40-60 mg/day) or intravenous steroids such as methylprednisolone (60 mg/day) and hydrocortisone
(400 mg/day) are standard for inducing remission (meier2011currenttreatmentof pages 3-4). Tumor
necrosis factor (TNF)-α blockers, such as infliximab, are effective for steroid-refractory cases
(meier2011currenttreatmentof pages 2-3; meier2011currenttreatmentof pages 3-4).

Immunosuppressive agents, including azathioprine and 6-mercaptopurine, are used for maintenance
therapy in steroid-dependent or refractory cases (meier2011currenttreatmentof pages 2-3;
meier2011currenttreatmentof pages 3-4). Antibiotics, such as combinations of penicillin,
tetracycline, and metronidazole, have shown promise in altering the microbiota and inducing
remission in some patients, though their efficacy varies (NCT02217722).

References

1. (meier2011currenttreatmentof pages 2-3): Johannes Meier and Andreas Sturm. Current treatment
of ulcerative colitis. World journal of gastroenterology, 17 27:3204-12, 2011.
URL: https://doi.org/10.3748/wjg.v17.i27.3204, doi:10.3748/wjg.v17.i27.3204.

2. (meier2011currenttreatmentof pages 3-4): Johannes Meier and Andreas Sturm. Current treatment
of ulcerative colitis. World journal of gastroenterology, 17 27:3204-12, 2011. URL:
https://doi.org/10.3748/wjg.v17.i27.3204, doi:10.3748/wjg.v17.i27.3204.

3. (NCT02217722): Prof. Arie Levine. Use of the Ulcerative Colitis Diet for Induction of
Remission. Prof. Arie Levine. 2014. ClinicalTrials.gov Identifier: NCT02217722

4. (meier2011currenttreatmentof pages 1-2): Johannes Meier and Andreas Sturm. Current
treatment of ulcerative colitis. World journal of gastroenterology, 17 27:3204-12, 2011.
URL: https://doi.org/10.3748/wjg.v17.i27.3204, doi:10.3748/wjg.v17.i27.3204.

We now see both papers and clinical trials cited in our response. For convenience, we have a
`Settings.from_name` that works as well:

```python
from paperqa import Settings, agent_query

answer_response = await agent_query(
query="What drugs have been found to effectively treat Ulcerative Colitis?",
settings=Settings.from_name("clinical_trials"),
)
```

And, this works with the `pqa` cli as well:

```bash
>>> pqa --settings 'search_only_clinical_trials' ask 'what is Ibuprofen effective at treating?'
```

...
[13:29:50] Completing 'what is Ibuprofen effective at treating?' as 'certain'.
Answer: Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) effective
in treating various conditions, including pain, inflammation, and fever.
It is widely used for tension-type
headaches, with studies showing that ibuprofen sodium provides significant
pain relief and reduces pain intensity compared to standard ibuprofen and placebo
over a 3-hour period (NCT01362491).
Intravenous ibuprofen is effective in managing postoperative pain, particularly
in orthopedic surgeries, and helps control the inflammatory process. When combined
with opioids, it reduces opioid
consumption and associated side effects, making it a key component of
multimodal analgesia (NCT05401916, NCT01773005).

Ibuprofen is also effective in pediatric populations as a first-line
anti-inflammatory and antipyretic agent due to its relatively
low adverse effects compared to other NSAIDs (NCT01478022).
Additionally, it has been studied for its potential use in managing
chronic periodontitis through subgingival irrigation with a 2% ibuprofen
mouthwash, which reduces periodontal pocket depth and
bleeding on probing, improving periodontal health (NCT02538237).

These findings highlight ibuprofen's versatility in treating pain, inflammation,
fever, and specific conditions like tension headaches, postoperative pain, and periodontal diseases.
6 changes: 5 additions & 1 deletion paperqa/agents/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
from .models import AgentStatus, AnswerResponse, SimpleProfiler
from .search import SearchDocumentStorage, SearchIndex, get_directory_index
from .tools import (
DEFAULT_TOOL_NAMES,
Complete,
EnvironmentState,
GatherEvidence,
Expand Down Expand Up @@ -117,7 +118,10 @@ async def run_agent(
)

# Build the index once here, and then all tools won't need to rebuild it
await get_directory_index(settings=settings)
# only build if the a search tool is requested
if PaperSearch.TOOL_FN_NAME in (settings.agent.tool_names or DEFAULT_TOOL_NAMES):
await get_directory_index(settings=settings)

if isinstance(agent_type, str) and agent_type.lower() == FAKE_AGENT_TYPE:
session, agent_status = await run_fake_agent(
query, settings, docs, **runner_kwargs
Expand Down
21 changes: 21 additions & 0 deletions paperqa/configs/clinical_trials.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"answer": {
"evidence_k": 15,
"answer_max_sources": 5,
"max_concurrent_requests": 10
},
"agent": {
"tool_names": [
"gather_evidence",
"search_papers",
"gen_answer",
"clinical_trials_search",
"complete"
]
},
"parsing": {
"use_doc_details": true,
"chunk_size": 9000,
"overlap": 750
}
}
20 changes: 20 additions & 0 deletions paperqa/configs/search_only_clinical_trials.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"answer": {
"evidence_k": 15,
"answer_max_sources": 5,
"max_concurrent_requests": 10
},
"agent": {
"tool_names": [
"gather_evidence",
"gen_answer",
"clinical_trials_search",
"complete"
]
},
"parsing": {
"use_doc_details": true,
"chunk_size": 9000,
"overlap": 750
}
}
4 changes: 2 additions & 2 deletions paperqa/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import json
import re
from collections.abc import Callable
from collections.abc import Callable, Sequence
from typing import Any

from paperqa.llms import PromptRunner
Expand Down Expand Up @@ -41,7 +41,7 @@ async def map_fxn_summary(
prompt_runner: PromptRunner | None,
extra_prompt_data: dict[str, str] | None = None,
parser: Callable[[str], dict[str, Any]] | None = None,
callbacks: list[Callable[[str], None]] | None = None,
callbacks: Sequence[Callable[[str], None]] | None = None,
) -> tuple[Context, LLMResult]:
"""Parses the given text and returns a context object with the parser and prompt runner.
Expand Down
37 changes: 13 additions & 24 deletions paperqa/docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import re
import tempfile
import urllib.request
from collections.abc import Callable
from collections.abc import Callable, Sequence
from datetime import datetime
from functools import partial
from io import BytesIO
Expand Down Expand Up @@ -42,6 +42,7 @@
from paperqa.settings import MaybeSettings, get_settings
from paperqa.types import Doc, DocDetails, DocKey, PQASession, Text
from paperqa.utils import (
citation_to_docname,
gather_with_concurrency,
get_loop,
maybe_is_html,
Expand Down Expand Up @@ -306,23 +307,7 @@ async def aadd( # noqa: PLR0912
):
citation = f"Unknown, {os.path.basename(path)}, {datetime.now().year}"

if docname is None:
# get first name and year from citation
match = re.search(r"([A-Z][a-z]+)", citation)
if match is not None:
author = match.group(1)
else:
# panicking - no word??
raise ValueError(
f"Could not parse docname from citation {citation}. "
"Consider just passing key explicitly - e.g. docs.py "
"(path, citation, key='mykey')"
)
year = ""
match = re.search(r"(\d{4})", citation)
if match is not None:
year = match.group(1)
docname = f"{author}{year}"
docname = citation_to_docname(citation) if docname is None else docname
docname = self._get_unique_name(docname)

doc = Doc(docname=docname, citation=citation, dockey=dockey)
Expand Down Expand Up @@ -549,7 +534,7 @@ def get_evidence(
query: PQASession | str,
exclude_text_filter: set[str] | None = None,
settings: MaybeSettings = None,
callbacks: list[Callable] | None = None,
callbacks: Sequence[Callable] | None = None,
embedding_model: EmbeddingModel | None = None,
summary_llm_model: LLMModel | None = None,
partitioning_fn: Callable[[Embeddable], int] | None = None,
Expand All @@ -571,7 +556,7 @@ async def aget_evidence(
query: PQASession | str,
exclude_text_filter: set[str] | None = None,
settings: MaybeSettings = None,
callbacks: list[Callable] | None = None,
callbacks: Sequence[Callable] | None = None,
embedding_model: EmbeddingModel | None = None,
summary_llm_model: LLMModel | None = None,
partitioning_fn: Callable[[Embeddable], int] | None = None,
Expand Down Expand Up @@ -668,7 +653,7 @@ def query(
self,
query: PQASession | str,
settings: MaybeSettings = None,
callbacks: list[Callable] | None = None,
callbacks: Sequence[Callable] | None = None,
llm_model: LLMModel | None = None,
summary_llm_model: LLMModel | None = None,
embedding_model: EmbeddingModel | None = None,
Expand All @@ -690,12 +675,16 @@ async def aquery( # noqa: PLR0912
self,
query: PQASession | str,
settings: MaybeSettings = None,
callbacks: list[Callable] | None = None,
callbacks: Sequence[Callable] | None = None,
llm_model: LLMModel | None = None,
summary_llm_model: LLMModel | None = None,
embedding_model: EmbeddingModel | None = None,
partitioning_fn: Callable[[Embeddable], int] | None = None,
) -> PQASession:
# TODO: remove list cast after release of https://github.com/Future-House/llm-client/pull/36
callbacks = cast(
list[Callable] | None, list(callbacks) if callbacks else callbacks
)

query_settings = get_settings(settings)
answer_config = query_settings.answer
Expand Down Expand Up @@ -797,8 +786,8 @@ async def aquery( # noqa: PLR0912
answer_text = answer_result.text
session.add_tokens(answer_result)
# it still happens
if prompt_config.EXAMPLE_CITATION in answer_text:
answer_text = answer_text.replace(prompt_config.EXAMPLE_CITATION, "")
if (ex_citation := prompt_config.EXAMPLE_CITATION) in answer_text:
answer_text = answer_text.replace(ex_citation, "")
for c in filtered_contexts:
name = c.text.name
citation = c.text.doc.formatted_citation
Expand Down
2 changes: 1 addition & 1 deletion paperqa/llms.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
qdrant_installed = False

PromptRunner = Callable[
[dict, list[Callable[[str], None]] | None, str | None],
[dict, Sequence[Callable[[str], None]] | None, str | None],
Awaitable[LLMResult],
]

Expand Down
9 changes: 5 additions & 4 deletions paperqa/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import pathlib
import warnings
from collections.abc import Callable, Mapping
from collections.abc import Callable, Mapping, Sequence
from enum import StrEnum
from pydoc import locate
from typing import Any, ClassVar, Self, TypeAlias, assert_never, cast
Expand Down Expand Up @@ -194,7 +194,7 @@ class ParsingSettings(BaseModel):
),
)
chunking_algorithm: ChunkingOptions = ChunkingOptions.SIMPLE_OVERLAP
doc_filters: list[dict] | None = Field(
doc_filters: Sequence[Mapping[str, Any]] | None = Field(
default=None,
description=(
"Optional filters to only allow documents that match this filter. This is a"
Expand Down Expand Up @@ -259,6 +259,7 @@ def get_formatted_variables(s: str) -> set[str]:
class PromptSettings(BaseModel):
model_config = ConfigDict(extra="forbid", validate_assignment=True)

# MLA parenthetical in-text citation, SEE: https://nwtc.libguides.com/citations/MLA#s-lg-box-707489
EXAMPLE_CITATION: ClassVar[str] = "(Example2012Example pages 3-4)"

summary: str = summary_prompt
Expand Down Expand Up @@ -498,7 +499,7 @@ class AgentSettings(BaseModel):
description="If set to true, run the search tool before invoking agent.",
)

tool_names: set[str] | None = Field(
tool_names: set[str] | Sequence[str] | None = Field(
default=None,
description=(
"Optional override on the tools to provide the agent. Leaving as the"
Expand All @@ -521,7 +522,7 @@ class AgentSettings(BaseModel):
)
index: IndexSettings = Field(default_factory=IndexSettings)

callbacks: Mapping[str, list[Callable[[_EnvironmentState], Any]]] = Field(
callbacks: Mapping[str, Sequence[Callable[[_EnvironmentState], Any]]] = Field(
default_factory=dict,
description="""
A mapping that associates callback names with lists of corresponding callable functions.
Expand Down
Loading

0 comments on commit 1db194a

Please sign in to comment.