Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Conversations and Generative AI in OpenSearch #1150

Closed
austintlee opened this issue Jul 20, 2023 · 92 comments
Closed

[RFC] Conversations and Generative AI in OpenSearch #1150

austintlee opened this issue Jul 20, 2023 · 92 comments
Labels
RFC Request For Comments from the OpenSearch Community v2.12.0 Issues targeting release v2.12.0

Comments

@austintlee
Copy link
Collaborator

austintlee commented Jul 20, 2023

Introduction

The recent advances in Large Language Models (LLMs) have enabled developers to utilize natural language in their applications with better quality and ability. As ChatGPT has shown, these LLMs strongly enable use cases involving summarization and conversation. However, when prompting LLMs to answer fact-based questions (applications we call “conversational search”), we find that there are significant shortcomings for enterprise-grade applications.

First, the major LLMs are not trained on datasets that are not exposed to the internet, and therefore do not have the context to answer questions on private data. Most enterprise data falls into this category. Second, the way in which LLMs answer questions based on their training data gives rise to “hallucinations” and false answers, which are not acceptable in applications for mission critical use cases.

End-users love the ability to converse using colloquial language with an application to get answers to questions or find interesting search results, but require up-to-date information and accuracy. A solution to this problem is through Retrieval Augmented Generation (RAG), where an application sends an LLM a superset of correct information in response to a prompt, and the LLM is used to summarize and extract information from this set (instead of probabilistically determining an answer).

We believe OpenSearch could be a great platform for building conversational search applications, and aligns well with the RAG approach. It already offers semantic search capabilities using its vector database and k-NN plug-in, alongside enterprise-grade security and scalability. This is a great building block for the “source of truth” information retrieval component of RAG. However, it currently lacks the primitives and crisp APIs to easily enable the conversational element.

Although there are libraries that allow for building this functionality at the application layer (e.g. LangChain), we believe the best developer experience would be to enable this directly in OpenSearch. We consider the “G” in a RAG pipeline as LLM-based post-processing to enable direct question answering, summarization, and a conversational experience on top of OpenSearch semantic search. This enables end-users to interact with their data in OpenSearch in new ways. Furthermore, we believe developers may want to use different LLMs, and that the choice of model should be pluggable.

Through using plugins and search pipelines, we propose an architecture in this RFC to expose easily consumable APIs for conversational search, history, and storage. We segment it into a few components, including: 1/search query rewriting using generative AI and conversational context, 2/question answering and summarization of OpenSearch semantic search queries using generative AI, and 3/a concept of “conversational memory” to easily store the state of conversations and add additional interactions. Conversational Memory will also support conversational applications that have multiple agents operating together, giving a single source of truth for conversation state.

Goals

1/ Developers can easily build conversational search applications (e.g. knowledge-base search, informational chatbot, etc.) using OpenSearch and their choice of generative AI model using well-defined REST APIs. Some of these applications will be an ongoing conversation, while others will be one-shot (and the history of interactions is not important).

2/ Developers can use OpenSearch to support multi-agent conversational architectures, which require a single “source of truth” for conversational history. Multi-agent architectures will have other agents besides that for semantic search with OpenSearch (e.g. an agent that queries the public internet). These developers need an easy API to manage conversational history, both in adding interactions to conversations and exploring history of those conversations.

3/ Developers can easily obtain OpenSearch (semantic) search results alongside the generative AI question answering, so they can show the source documents and enable the end user to explore the source material.

Non-Goals

1/ Building a general LLM application toolkit in OpenSearch. Our goal is just to enable conversational search and the related dependency of conversational memory.

2/ LLM hosting. LLMs take significant resources and should be operated outside of an OpenSearch cluster. We also hope to use the ML-Commons remote inference feature rather than implement our own connectors.

3/ A conversational search application platform. Our goal is to expose crisp APIs to make building applications that use conversational search easy, but not create the end application itself.

Proposed Architecture

  • An addition to the ml-commons plugin that provides a CRUD API to store and access conversation history (”memory”).
  • An addition of a new Search Pipeline implementation to ml-commons that uses conversational memory and large language models for question answering.
  • An addition to ml-commons that enables users to have conversations through a new Conversation API.

Aryn Conversation Plugins v2

Conversational Memory API (Chat History)

Conversational memory is the storage for conversations, which are an ordered list of interactions. Conversational memory makes it easy to add new interactions to a conversation or explore previous interactions. For example, you would need conversational memory to write a chatbot, since it takes the previous interactions in a conversation as part of the context for generating a future response. At a high level, this mostly resembles a generic read/write store, and we will use an OpenSearch index for it. However, the interesting nuance is in the data itself, which we will describe next.

A conversation is represented as a list of interactions, ordered chronologically. Each conversation will also include some metadata, like the start time and the number of interactions.

The basic elements of an interaction are an input and a response, representing the human input to an AI agent and that agent’s response. We’ll also include any additional prompting that was used in the interaction, the agent that was used in this interaction, and possible arbitrary metadata that the agent may want to include. For example, a conversational search agent may include the actual search results as metadata for a user search query (which is an interaction).

Each ConversationMetadata and Interaction will have access controls linked to the specific user that creates them. Only Alice can add to and read from conversations that Alice owns. The main rationale for this is that Alice’s conversation will potentially include information from all documents Alice has access to, so her conversations’ access controls are maximally the intersection of Alice’s access rights. We plan to leverage OpenSearch’s existing access control mechanisms for this.

The plan is to maintain 2 indices - 1 for ConversationMetadata and 1 for Interaction.

structure ConversationMetadata {
    conversationId: ConversationId
    numInteractions: Integer
    createTime: Timestamp
    lastInteractionTime: Timestamp
    name: String
}

structure Interaction {
    conversationId: ConversationId
    interactionId: InteractionId
    input: String
    prompt: String
    response: String
    agent: String
    time: Timestamp
    attributes: InteractionAttributes
}

API

The operations for conversational memory are similar to the usual CRUD operations for a datastore. CreateInteraction will update the appropriate ConversationMetadata to have a correct lastInteractionTime and numInteractions

/// Creates a new conversation and returns its id
operation CreateConversation {
    input: CreateConversationInput
    output: CreateConversationOutput
}

@input
structure CreateConversationInput {
    name: String
}

@output
structure CreateConversationOutput {
    conversationId: ConversationId
}

/// Returns the list of all conversations
operation GetConversations {
    input: GetConversationsInput
    output: GetConversationsOutput
}

@input
structure GetConversationsInput {
    nextToken: String
    maxResults: Integer
}

@output 
structure GetConversationsOutput {
    conversations: List[ConversationMetadata]
    nextToken: String
}

/// Adds an interaction to a conversation and returns its id
operation CreateInteraction {
    input: CreateInteractionInput
    output: CreateInteractionOutput
}

@input 
structure CreateInteractionInput  {
    @required
    @httpLabel
    conversationId: ConversationId
    input: String
    prompt: String
    response: String
    agent: String
    attributes: InteractionAttributes
}

@output
structure CreateInteractionOutput {
    interactionId: InteractionId
}

/// Returns the list of interactions associated with a conversation
operation GetInteractions {
    input: GetInteractionsInput
    output: GetInteractionsOutput
}

@input
structure GetInteractionsInput {
    @required 
    @httpLabel
    conversationId: ConversationId
    nextToken: String
    maxResults: Integer
}

@output
structure GetInteractionsOutput {
    metadata: ConversationMetadata
    interactions: List[Interaction]
    nextToken: String
}

operation DeleteConversation {
    input: DeleteConversationInput
    output: DeleteConversationOutput
}

@input
structure DeleteConversationInput {
    @required
    @httpLabel
    conversationId: ConversationId
}

@output
structure DeleteConversationOutput {
    success: Boolean
}

We do not propose having an update API for conversation metadata, and we treat this as immutable. We believe that users would prefer to just create a new conversation than update parameters on an existing one.

Search Pipeline extension

The conversational search path essentially consists of an OpenSearch query, with some pre- and post-processing. Search Pipelines, introduced in 2.8, are a tool for pre- and post-processing in the query path, so we have chosen to use that mechanism to implement conversational search.

We have chosen to implement the question answering component of RAG in the form of query result rewrites. We are introducing a new response processor that sends the top search results, and optionally some previous conversation history to the LLM to generate a response in the conversation. We are also introducing a new response processor that iterates over search hits and interacts with an LLM to produce an answer for each result with a score. Finally, we are introducing a request processor to rephrase the user’s query, taking into account the conversation history. We will rely on the remote inference feature proposed in #882 for answer generation.

Based on different patterns we have seen with applications, we designed this API to support “one-off” and “multi-shot” conversations. Users can have “one-off” question answering interactions, where the prior context is not included, via a search pipeline that uses this new question answering processor. Users can also have “multi-shot” conversations where interactions are stored in conversational memory and are used as additional context that is sent to the model along with each search query. Users will need to use the Conversational Search plugin to create a conversation and pass the conversationId to the search pipeline in order to retain all the interactions associated with it.

In addition to the conversation ID, users can also pass a “prompt” parameter for any prompt engineering alongside their search query.

GET wiki-simple-paras/_search?search_pipeline=convo_qa_pipeline
{
  "_source": ["title", "text"],
  "query" : {
    "neural": {
      "text_vector": {
         "query_text": "When was Abraham Lincoln born?",
         "k": 10,
         "model_id": "<text-embedding-model-id>"
      }
    }
  },
  "ext": {
     "question_answering_parameters": {
         "question": "When was Abraham Lincoln born?"
     },
     "conversation" : {
         "id": "...",
         "prompt": "..."
		}
  }
}

The search pipeline includes pre and post processing steps. The pre-processing step uses generative AI to rewrite the search query submitted by the user, taking into account the conversation history if a conversation was specified. This allows things like antecedent replacement (”When was he born?” → “When was Abraham Lincoln born?”, if the prior question was “Who was Abraham Lincoln?”).

The post-processing step is a processor that takes the search results, optionally performs a lookup against the conversational memory, and then sends this data to the LLM configured by the user. We believe different users will want to use different LLMs, so this will be pluggable.

Conversation API

The point of this API is to provide conversational search as a relatively simple endpoint, hooking pieces together such that the user can easily build an application with it. It takes a search query (or some other kind of human input), performs a search against OpenSearch, and then feeds those search results into an LLM and returns the answer. All of this work is done in the search pipeline underneath - so the API is just a wrapper - but we feel this kind of an API would be helpful to developers who just want an easy REST API.

We would like to return search results as well as the LLM response. This differs from most existing systems that return only answers, and it allows clients to perform validations or additional downstream processing.

/// Ask a question and get a GenAI response grounded in search results
operation Query {
    input: QueryInput
    output: QueryOutput
}

structure QueryInput {
    index: String
    conversationId: ConversationId
    query: String
    prompt: String
    filter: String
    numResults: Integer
}

structure QueryOutput {
    response: String
		rewrittenQuery: String
    searchResults: DocList
    interactionId: InteractionId
}

/// List of docs used to answer the question
list DocList {
    member: Document
}

Discussion

  1. Performance: LLM inference takes on the order of seconds; if you have sufficiently high traffic, that can increase to minutes or more as an LLM hosting service rate-limits or a hosted model becomes resource constrained. Five people using this at the same time could have the potential to completely stall each other out. We’ll try to be fault-tolerant as regards this, but a lot of the onus may fall on the users and the LLM hosters to work out how to get higher LLM throughput.
  2. Ordering: Since LLM inference can take a while, a user might get impatient and ask a bunch of search queries before the first search query has returned an answer; and the answers might come back from the LLM out of order. We will write only complete interactions, meaning the order that messages come back from the LLM. The client should disallow multiple queries at once (in a conversation) to prevent this.
  3. Dependencies: This relies on the relatively new search pipeline and remote inference features. Accordingly, this probably only works for OpenSearch ≥ 2.9, with the appropriate ML-Commons installation. We’re also hoping to get the pipelines themselves into Search-Processors; in which case that plugin also becomes a dependency. Lastly, the high-level Conversational API depends on the Conversational Pipeline, and they both depend on the Conversational Memory plugin, which we think should be its own plugin. We’ll put out some resources on building once we figure it out.

Summary

In this RFC we gave a proposal for bringing conversational search into OpenSearch. Our proposal consists of three components: 1/ an API for conversational memory stored in OpenSearch, 2/ an OpenSearch search pipeline for Retrieval-Augmented Generation (RAG), and 3/ an API that provides a simple one-shot API for conversational search applications. We would appreciate any feedback, suggestions, and comments towards integrating this cleanly with the rest of the OpenSearch ecosystem and making it the best it can be.

Thanks!

Requested Feedback

  • Does this feature set cover the set of use cases for generative AI applications that you want to build? We have been focused on search applications and we’re interested in how much the community wants to go beyond exposing conversational search and conversational memory building blocks at this time.
  • We believe the search pipeline is great mechanism to define a RAG pipelines, but we also felt that a conversational API that invokes this pipeline would be helpful for developers to more easily build conversational search applications. We’d love feedback on if we should add more to this API, or conversely if it’s even needed in providing an easy developer experience.
  • This approach for RAG introduces a several cross plugin dependencies. There has been talk in the community about moving away from the plugin architecture for OpenSearch, and we want to make sure this approach is aligned with the higher-level architectural goals of the project. We’d appreciate feedback on this topic.
@austintlee austintlee changed the title [RFC] Conversations in OpenSearch [RFC] Conversations and Generative AI in OpenSearch Jul 20, 2023
@davidlago
Copy link

Each ConversationMetadata and Interaction will have access controls linked to the specific user that creates them. Only Alice can add to and read from conversations that Alice owns. The main rationale for this is that Alice’s conversation will potentially include information from all documents Alice has access to, so her conversations’ access controls are maximally the intersection of Alice’s access rights. We plan to leverage OpenSearch’s existing access control mechanisms for this.

There is an important nuance to this statement: her conversations’ access controls are maximally the intersection of Alice’s access rights at the time of the interaction.

If Alice's permissions change from the time of the interaction in a way that makes some of the captured information off-limits to her, this access control will no longer be appropriate.

We don't currently have the needed security primitives/functionality to support this level of access control on derived data natively (it's in our radar though!), so limiting access to the interactions to the owning user is the best we can do without them.

With that said, am I correct in interpreting that the indices that will hold the Conversations and Interactions will be restricted to just the plugin and all access to the data gated by the new API? If so, they are missing a field with the user who owns them so that we can enforce that access control.

@macohen
Copy link

macohen commented Jul 20, 2023

This is a very thorough RFC. Thanks, Austin.

Dependencies: This relies on the relatively new search pipeline and remote inference features. Accordingly, this probably only works for OpenSearch ≥ 2.9, with the appropriate ML-Commons installation. We’re also hoping to get the pipelines themselves into Search-Processors; in which case that plugin also becomes a dependency. Lastly, the high-level Conversational API depends on the Conversational Pipeline, and they both depend on the Conversational Memory plugin, which we think should be its own plugin. We’ll put out some resources on building once we figure it out.

Confirming that Search Pipelines is only available in 2.9+. When you say "...hoping to get the pipelines themselves into Search-Processors..." do you mean the search-processor GH repo? That repo has two processors that we will eventually factor out into separate repos. Our current thinking on search processors is that they can be included in core (https://github.com/opensearch-project/OpenSearch) if they have no external dependencies. If there are dependencies, a separate repo as a self-install plugin is the right approach. Some of this may belong in ml-commons, but I would leave that up to the maintainers of this repo.

You may also need to build a search processor that is ALSO a plugin to gain access to resources via the plugin interface. It may be necessary to build a processor that is also a plugin to access the conversation memory, for example. One analogy for search pipelines is to think about it like piping together *NIX commands. Each command (processor in pipeline speak) can be as complex as needed, but still really only does one thing, and then you compose functionality by sending the stdin (request in pipeline speak) or stdout (response in pipeline speak) from one processor to the next. Some of the ones needed may end up in core; some may end up in a separate repo.

cc: @msfroh

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 20, 2023

About Conversation API, it looks like only wrap up a search pipeline inside, I think the APIs of search pipeline work well already. Meanwhile I think conversation function can support not only search application but also others applications like chatbot etc. That means the conversation function can help users build any conversational applications.

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 20, 2023

About performance, not only traffic (for multi-users) but also the latency (for single user) we should consider. The search experience is latency sensitive, if we introduced llm interactions in pre-processors and post-processors, the latency could be not acceptable. We should reduce the rounds of interactions with llm to improve latency and save the cost, as the llm api call is expensive.

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 20, 2023

Actually we also have a RFC about conversation plugin in OpenSearch to support conversational application building.

@HenryL27
Copy link
Collaborator

Thanks @davidlago.

I agree that limiting access to the interactions to the owning user is the best we can do currently. We would love to collaborate on building the necessary security primitives to support access control on derived data. Please keep us informed on any future RFC on this.

Yes, we're planning on restricting access to the conversational memory indices to the plugin / API. We'll be keeping track of the user under the covers in the index.

@ylwu-amzn ylwu-amzn added RFC Request For Comments from the OpenSearch Community and removed untriaged labels Jul 21, 2023
@austintlee
Copy link
Collaborator Author

@macohen Thanks for the clarification and suggestions.

@austintlee
Copy link
Collaborator Author

@jngz-es We agree that latency is important, and we will certainly look for ways to reduce unnecessary round-trips. That said, we have seen good results in some cases by using an LLM to both rewrite the query and summarize the response. We believe that some users will be okay with the additional latency for better results, and we want that to at least be an option.

@macohen
Copy link

macohen commented Jul 21, 2023

In a conversational search application I also think there can be some expected latency from users vs a keyword search. Think time for conversations are acceptable in general, right?

@jonfritz
Copy link

Thanks to the folks who have responded to the RFC we posted a couple days ago for Conversations and Generative AI in OpenSearch. @jngz-es - I noticed that you recently posted an RFC on the same topic (#1151). I'm concerned that the overlap will cause confusion in the community and make it difficult to align our development.

We would love to find a process where we can work together. The process that I’m used to in open source communities is to start with one RFC and then iterate and add feedback rather than creating multiple RFCs. This process has some benefits - it drives alignment in the open, enables the community to share and iterate on ideas, and makes the end product easy to understand and use.

My suggestion is that we adopt this approach to work together on the RFC for conversational features in OpenSearch. We greatly appreciate the feedback you've already given this original RFC, and we'd be happy to do the work to update this RFC and continue to iterate to incorporate any other technical suggestions you have. Let us know what you think! We are excited to find ways to work together to make OpenSearch the best platform for building conversational applications.

@dblock
Copy link
Member

dblock commented Jul 21, 2023

Love seeing multiple proposals for similar outcomes! Personally I don't think there's anything wrong with two competing implementations that potentially converge into the best in class. Without diving too much into details, @austintlee and @jngz-es, what are the similarities and differences between the two proposals? What do you think is better in the one you didn't write?

@dylan-tong-aws
Copy link

dylan-tong-aws commented Jul 21, 2023

Hi Austin,

I love how you intuitively architected your application with the components in the intended way with the new building blocks like search pipelines, AI connectors and vector database capabilities. I expected that we needed to better document this.

With that said, we are working on the next iteration of the framework to simplify and improve the developer experience. Some concepts that we're considering:

  1. We like to introduce the notion of use case templates. Imagine a single declarative interface to describe a use case like semantic search or RAG with prescriptive default configurations for search pipelines (eg. RAG), prompt engineering routines and AI service connectors which developers can selectively reconfigure.

  2. We're exploring the idea in (1) to provide a no-code interface like LangFlow or Flowise, but in the scope of OpenSearch powered AI apps. You'll have the option of a no code interface to configure and prime OpenSearch for your specific use case to modify or generate your use case template.

What are your thoughts?

@austintlee
Copy link
Collaborator Author

@macohen @msfroh Do you have any suggestions for how we might return answers generated by LLMs in the SearchResponse?

I think there are largely three approaches.

1/ The most "intrusive" approach would be to introduce a new field in the SearchResponse, e.g.

{
  "conversation": {
    "id": "...",
    "answer": "..."
   },
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3205,
      "relation": "eq"
    },
    "max_score": 3.641852,
    "hits": [
      {
        ...
       }
    ]
  }
}

2/ We can return it as one of the SearchHits by inserting the answer into the Hits array in the response processor (which means we would need reconstruct the response object on the way out).

3/ A middle ground would be an extension ("ext") to the response that can be customized by Search Pipelines:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3205,
      "relation": "eq"
    },
    "max_score": 3.641852,
    "hits": [
      {
        ...
       }
    ]
  },
  "ext": {
    "conversation": {
      "id": "...",
      "answer": "..."
     },
  }
}

Would this option be made possible as part of perhaps this work - opensearch-project/OpenSearch#8635?

@msfroh
Copy link

msfroh commented Jul 25, 2023

I really like the the proposal and have a few questions / comments, mostly around the conversation memory:

Data store

we will use an OpenSearch index for it

Is this a hard requirement? It does feel like the most obvious place for it (since we're already running on OpenSearch, it adds no additional dependencies), but maybe someone might benefit from some other data store? Each conversation is an append-only log, if I'm understanding correctly, so another data store might be a good fit. (Of course, I hear that a lot of people like storing their append-only logs in OpenSearch indices, so maybe it really is the best option.)

Metadata

structure ConversationMetadata {
    conversationId: ConversationId
    numInteractions: Integer
    createTime: Timestamp
    lastInteractionTime: Timestamp
    name: String
}

If numInteractions and lastInteractionTime are left out of the explicit schema of the persisted entity, then ConversationMetadata is immutable, which is nice. I'll kind of contradict my comment above and say that they're "pretty cheap" to compute on the fly if the interactions are stored in an index. Maybe computing those fields dynamically at read time was mentioned in the RFC and I missed it -- I still have some brain fog from jet lag after vacation.

Other uses?

At the risk of opening a can of worms, I'm wondering if such a proposal could help for other "session-based" search refinements. I'm thinking of an e-commerce application where someone searches for "black shoes", doesn't click on any search results, and then searches for "nike basketball shoes" -- you may want to rank the black shoes higher, on the assumption that the two queries are related.

If you include a user identifier in the conversation metadata, the system could provide a more personalized experience based on prior conversations with the user (subject to the usual privacy concerns where you would need to let the user delete some or all conversations). Probably out of scope, though.

There's been some discussion around interaction logging, incorporating the user's "post-search" actions (see opensearch-project/OpenSearch#4619), which feels like it overlaps a bit, though a "one size fits all" solution probably wouldn't be ideal. Still, I'm wondering if there's some opportunity for reuse or at least sharing lessons learned.

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 26, 2023

Comparing with RFC-1151, the common part is a new plugin to store chat history.

The differences from RFC-1151 are:

  1. Having generic conversation APIs to support all conversational applications building, like conversational search, Chotbots etc.
  2. Using ReAct once to get results, the latency could be more under control, the search pipelines can be used as a tool in ReAct.

Basically I don't see major conflicts between these two RFCs from the implementation perspective, we can have both. We can have a new conversation plugin to store chat history, meanwhile provide chat API for applications. We can also have a new ml processor to run conversation/ml-commons APIs in search pipelines. Users can build conversational search in either way.

@jonfritz
Copy link

@jngz-es thanks for sharing this, and excited to get more into the details in the GenAI meeting on Friday. I would encourage the community to have one way to build a conversational application, unless we saw a true need to have multiple approaches. It'll make the developer experience easier to learn for users interested in building applications.

From proposal #1151, it seems like perhaps it could be split to make the idea more crisp (and renamed). For items that relate to building conversational search, we can use the comments on #1150 and iterate on that RFC to create the approach. It seems like the big, net new question in #1151 is whether OpenSearch should add the ability to create multi-agent architectures (in a similar direction to what LangChain does). I think this warrants a deeper discussion, as I wonder if OpenSearch should be trying to incorporate this versus having customers do this in their application stack, and let OpenSearch focus on a different set of primitives. By repurposing #1151 (and renaming it to channel this theme), I think we'd be able to more crisply outline each area and themes in the RFCs. Thoughts?

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 26, 2023

@jonfritz I agree we should have one way to build a conversational application. I believe conversational search is one of them. Looks like #1150 is specific for conversational search, what about others applications like Chatbots? If customers want to build Chatbots on OpenSearch, should we provide another framework to support it? I don't think so, as we should have one way to build conversational applications. What do you think?

@jonfritz
Copy link

@jngz-es clarifying question - how do you define a "chatbot" and how is that different than a conversational search interaction? From a customer perspective, I see customers wanting a natural language way to interact with their data stored in OpenSearch and leverage the generative aspects of LLMs to enrich and summarize those interactions and better understand the search query submitted (e.g. rewrites). We use the term "conversational search" to describe this, and a customer application could be considered a "chatbot" because it's a conversation with a natural language application. What use cases for natural language/chat interactions do you think would make sense for OpenSearch outside of this pattern?

@HenryL27
Copy link
Collaborator

Comparing with #1151, another thing we'd like to have in common: prompt template management.
Pretty much every conversational application will need some kind of prompt engineering, and this presents a good way to manage that at scale, so we'd love to incorporate some version of that into #1150.

I'll flesh out what I'm imagining in a little more detail than I think either RFC gives.

  1. Prompt templates are essentially just f-strings, so let's not overcomplicate
  2. Template lifecycle: first register the template like its a model. Then various components (pipeline, ml-predict) will invoke them. The invocations are specified in either the configuration or the parameters of the components. Templates can be updated as prompt engineering (being more art than science) should be highly iterable.
  3. The prompt invocation may be hidden from the user, so the user must know what placeholders to include. We can probably just publish this (or borrow an existing protocol if one exists)

example template: "Summarize this list of documents from opensearch: {doc_list}"

Am I missing anything here?

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 27, 2023

@jonfritz the use case I image is like a e-commerce customer using OpenSearch wants to build a chatbot for their customers to use to improve their customer experience on their e-commerce platform. It would be easy for OpenSearch user to build a chabot if we could support conversation-based application building.

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 27, 2023

@HenryL27 I agree. Actually whether we only support conversational search or other conversational applications, probably we need something similar with LangChain as a framework to support building conversational applications including conversational search. So from the implementation perspective, I don't see major conflicts.

@jonfritz
Copy link

@jngz-es interesting idea. I'm more interested in the specifics of how you see this chatbot being different from a conversational search interaction, though. Can you share a more detailed vision on what an eCommerce chatbot would do (e.g. what questions or commands it would respond to, and with what information)? FWIW - for me, it feels like a general chatbot application platform is outside the scope of how most customers would want to use OpenSearch. An arbitrary chat application (e.g. one that generates poetry) that's decoupled from the core OpenSearch purpose (accessing unstructured data) may be best suited for a different application stack. On the other hand, conversational search is more closely tied to OpenSearch, because it's a different way for customers to to interact with their data on the platform (through natural language search queries). I'd love to learn more about what your customers are asking for (and get into the details of a "chatbot"), and if they do want to build these types of apps in OpenSearch versus other methods - it'll be a good discussion for Friday's meeting.

@jngz-es
Copy link
Collaborator

jngz-es commented Jul 27, 2023

@jonfritz a chatbot not only could provide conversational search results but also improve entire shopping experiences from different perspectives. On top of search results, customers could have questions about products comparing, any coupons combination recommendation, products combination discount, return/refund policy, etc., basically something like specific knowledge stored in OpenSearch.

HenryL27 pushed a commit to HenryL27/ml-commons that referenced this issue Sep 7, 2023
… enable Retrieval Augmented Generation (RAG) (opensearch-project#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (opensearch-project#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
dhrubo-os pushed a commit that referenced this issue Sep 7, 2023
* Conversational Memory for GenAI Apps (#1196)

* moved code over

Signed-off-by: HenryL27 <[email protected]>

* added actions to MLPlugin; fixed io lib stuff

Signed-off-by: HenryL27 <[email protected]>

* fixed copyrights again

Signed-off-by: HenryL27 <[email protected]>

* Fix nullptr exception in .equals

Signed-off-by: HenryL27 <[email protected]>

* preserve thread context across action calls

Signed-off-by: HenryL27 <[email protected]>

* remove MissingResourceException from CreatInteractionRequest in favor of IOException

Signed-off-by: HenryL27 <[email protected]>

* move ConversationMet, Interaction, and Constants to common/conversational

Signed-off-by: HenryL27 <[email protected]>

* Sequentialize createInteraction to remove data race

Signed-off-by: HenryL27 <[email protected]>

* allow disorder when conversations have same timestamp

Signed-off-by: HenryL27 <[email protected]>

* lombokify

Signed-off-by: HenryL27 <[email protected]>

* add some unit testing

Signed-off-by: HenryL27 <[email protected]>

* Increase unit test coverage

Signed-off-by: HenryL27 <[email protected]>

* fix naming

Signed-off-by: HenryL27 <[email protected]>

* finish code coverage for actions

Signed-off-by: HenryL27 <[email protected]>

* Leave null values out of XContent per #1196 (comment)

Signed-off-by: HenryL27 <[email protected]>

* Add integration tests for rest actions

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* Complete unit testing for Index classes

Signed-off-by: HenryL27 <[email protected]>

* update build.gradle

Signed-off-by: HenryL27 <[email protected]>

* Finish unit tests

Signed-off-by: HenryL27 <[email protected]>

* Fail closed on missing convo access

Signed-off-by: HenryL27 <[email protected]>

* address code review/walkthrough comments

Signed-off-by: HenryL27 <[email protected]>

* re-add prompt temlplate and metadata fields at interaction level

Signed-off-by: HenryL27 <[email protected]>

* parse request body, not params, for post requests

Signed-off-by: HenryL27 <[email protected]>

* restructure with memory as higher-level term

Signed-off-by: HenryL27 <[email protected]>

* clean up build.gradle

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* change interaction field names
timestamp -> create_time
metadata -> additional_info

Signed-off-by: HenryL27 <[email protected]>

* fix GetInteractionsResponse xcontent tests

Signed-off-by: HenryL27 <[email protected]>

* propagate name change to variables and parameters

Signed-off-by: HenryL27 <[email protected]>

* clean logging and fix typos

Signed-off-by: HenryL27 <[email protected]>

* fix final convtructor according to find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* append plugin-ml- to index names

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Feature/conversation memory feature flag (#1271)

* add feature flag and checks to transport actions

Signed-off-by: HenryL27 <[email protected]>

* add feature flag tests

Signed-off-by: HenryL27 <[email protected]>

* fix typos for real with find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* rename conversational-memory directory to memory

Signed-off-by: HenryL27 <[email protected]>

* fix settings.gradle with new dir name

Signed-off-by: HenryL27 <[email protected]>

* re-add feature flag checks and tests to transport layer

Signed-off-by: HenryL27 <[email protected]>

* fix feature flag with updateConsumer

Signed-off-by: HenryL27 <[email protected]>

* remove redundant settings update

Signed-off-by: HenryL27 <[email protected]>

* clean up feature var initialization to avoid unchecked conversion warning

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) (#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* [Feature] Add Retrieval Augmented Generation search processors (#1275)

* Put RAG pipeline behind a feature flag.

Signed-off-by: Austin Lee <[email protected]>

* Add support for chat history in RAG using the Conversational Memory API

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless

Signed-off-by: Austin Lee <[email protected]>

* Fix RAG feature flag enablement.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments and suggestions.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Add unit tests for MachineLearningPlugin

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* Allow RAG pipeline feature flag to be enabled and disabled dynamically (#1293)

* Allow RAG pipeline feature flag to be enabled and disabled dynamically.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Add negative test cases for RAG feature flag being turned off.

Signed-off-by: Austin Lee <[email protected]>

* Improve error checking.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this issue Sep 7, 2023
* Conversational Memory for GenAI Apps (#1196)

* moved code over

Signed-off-by: HenryL27 <[email protected]>

* added actions to MLPlugin; fixed io lib stuff

Signed-off-by: HenryL27 <[email protected]>

* fixed copyrights again

Signed-off-by: HenryL27 <[email protected]>

* Fix nullptr exception in .equals

Signed-off-by: HenryL27 <[email protected]>

* preserve thread context across action calls

Signed-off-by: HenryL27 <[email protected]>

* remove MissingResourceException from CreatInteractionRequest in favor of IOException

Signed-off-by: HenryL27 <[email protected]>

* move ConversationMet, Interaction, and Constants to common/conversational

Signed-off-by: HenryL27 <[email protected]>

* Sequentialize createInteraction to remove data race

Signed-off-by: HenryL27 <[email protected]>

* allow disorder when conversations have same timestamp

Signed-off-by: HenryL27 <[email protected]>

* lombokify

Signed-off-by: HenryL27 <[email protected]>

* add some unit testing

Signed-off-by: HenryL27 <[email protected]>

* Increase unit test coverage

Signed-off-by: HenryL27 <[email protected]>

* fix naming

Signed-off-by: HenryL27 <[email protected]>

* finish code coverage for actions

Signed-off-by: HenryL27 <[email protected]>

* Leave null values out of XContent per #1196 (comment)

Signed-off-by: HenryL27 <[email protected]>

* Add integration tests for rest actions

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* Complete unit testing for Index classes

Signed-off-by: HenryL27 <[email protected]>

* update build.gradle

Signed-off-by: HenryL27 <[email protected]>

* Finish unit tests

Signed-off-by: HenryL27 <[email protected]>

* Fail closed on missing convo access

Signed-off-by: HenryL27 <[email protected]>

* address code review/walkthrough comments

Signed-off-by: HenryL27 <[email protected]>

* re-add prompt temlplate and metadata fields at interaction level

Signed-off-by: HenryL27 <[email protected]>

* parse request body, not params, for post requests

Signed-off-by: HenryL27 <[email protected]>

* restructure with memory as higher-level term

Signed-off-by: HenryL27 <[email protected]>

* clean up build.gradle

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* change interaction field names
timestamp -> create_time
metadata -> additional_info

Signed-off-by: HenryL27 <[email protected]>

* fix GetInteractionsResponse xcontent tests

Signed-off-by: HenryL27 <[email protected]>

* propagate name change to variables and parameters

Signed-off-by: HenryL27 <[email protected]>

* clean logging and fix typos

Signed-off-by: HenryL27 <[email protected]>

* fix final convtructor according to find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* append plugin-ml- to index names

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Feature/conversation memory feature flag (#1271)

* add feature flag and checks to transport actions

Signed-off-by: HenryL27 <[email protected]>

* add feature flag tests

Signed-off-by: HenryL27 <[email protected]>

* fix typos for real with find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* rename conversational-memory directory to memory

Signed-off-by: HenryL27 <[email protected]>

* fix settings.gradle with new dir name

Signed-off-by: HenryL27 <[email protected]>

* re-add feature flag checks and tests to transport layer

Signed-off-by: HenryL27 <[email protected]>

* fix feature flag with updateConsumer

Signed-off-by: HenryL27 <[email protected]>

* remove redundant settings update

Signed-off-by: HenryL27 <[email protected]>

* clean up feature var initialization to avoid unchecked conversion warning

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) (#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* [Feature] Add Retrieval Augmented Generation search processors (#1275)

* Put RAG pipeline behind a feature flag.

Signed-off-by: Austin Lee <[email protected]>

* Add support for chat history in RAG using the Conversational Memory API

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless

Signed-off-by: Austin Lee <[email protected]>

* Fix RAG feature flag enablement.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments and suggestions.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Add unit tests for MachineLearningPlugin

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* Allow RAG pipeline feature flag to be enabled and disabled dynamically (#1293)

* Allow RAG pipeline feature flag to be enabled and disabled dynamically.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Add negative test cases for RAG feature flag being turned off.

Signed-off-by: Austin Lee <[email protected]>

* Improve error checking.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
(cherry picked from commit 1112612)
dhrubo-os pushed a commit that referenced this issue Sep 7, 2023
* Conversational Memory for GenAI Apps (#1196)

* moved code over

Signed-off-by: HenryL27 <[email protected]>

* added actions to MLPlugin; fixed io lib stuff

Signed-off-by: HenryL27 <[email protected]>

* fixed copyrights again

Signed-off-by: HenryL27 <[email protected]>

* Fix nullptr exception in .equals

Signed-off-by: HenryL27 <[email protected]>

* preserve thread context across action calls

Signed-off-by: HenryL27 <[email protected]>

* remove MissingResourceException from CreatInteractionRequest in favor of IOException

Signed-off-by: HenryL27 <[email protected]>

* move ConversationMet, Interaction, and Constants to common/conversational

Signed-off-by: HenryL27 <[email protected]>

* Sequentialize createInteraction to remove data race

Signed-off-by: HenryL27 <[email protected]>

* allow disorder when conversations have same timestamp

Signed-off-by: HenryL27 <[email protected]>

* lombokify

Signed-off-by: HenryL27 <[email protected]>

* add some unit testing

Signed-off-by: HenryL27 <[email protected]>

* Increase unit test coverage

Signed-off-by: HenryL27 <[email protected]>

* fix naming

Signed-off-by: HenryL27 <[email protected]>

* finish code coverage for actions

Signed-off-by: HenryL27 <[email protected]>

* Leave null values out of XContent per #1196 (comment)

Signed-off-by: HenryL27 <[email protected]>

* Add integration tests for rest actions

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* Complete unit testing for Index classes

Signed-off-by: HenryL27 <[email protected]>

* update build.gradle

Signed-off-by: HenryL27 <[email protected]>

* Finish unit tests

Signed-off-by: HenryL27 <[email protected]>

* Fail closed on missing convo access

Signed-off-by: HenryL27 <[email protected]>

* address code review/walkthrough comments

Signed-off-by: HenryL27 <[email protected]>

* re-add prompt temlplate and metadata fields at interaction level

Signed-off-by: HenryL27 <[email protected]>

* parse request body, not params, for post requests

Signed-off-by: HenryL27 <[email protected]>

* restructure with memory as higher-level term

Signed-off-by: HenryL27 <[email protected]>

* clean up build.gradle

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* change interaction field names
timestamp -> create_time
metadata -> additional_info

Signed-off-by: HenryL27 <[email protected]>

* fix GetInteractionsResponse xcontent tests

Signed-off-by: HenryL27 <[email protected]>

* propagate name change to variables and parameters

Signed-off-by: HenryL27 <[email protected]>

* clean logging and fix typos

Signed-off-by: HenryL27 <[email protected]>

* fix final convtructor according to find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* append plugin-ml- to index names

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Feature/conversation memory feature flag (#1271)

* add feature flag and checks to transport actions

Signed-off-by: HenryL27 <[email protected]>

* add feature flag tests

Signed-off-by: HenryL27 <[email protected]>

* fix typos for real with find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* rename conversational-memory directory to memory

Signed-off-by: HenryL27 <[email protected]>

* fix settings.gradle with new dir name

Signed-off-by: HenryL27 <[email protected]>

* re-add feature flag checks and tests to transport layer

Signed-off-by: HenryL27 <[email protected]>

* fix feature flag with updateConsumer

Signed-off-by: HenryL27 <[email protected]>

* remove redundant settings update

Signed-off-by: HenryL27 <[email protected]>

* clean up feature var initialization to avoid unchecked conversion warning

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) (#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* [Feature] Add Retrieval Augmented Generation search processors (#1275)

* Put RAG pipeline behind a feature flag.

Signed-off-by: Austin Lee <[email protected]>

* Add support for chat history in RAG using the Conversational Memory API

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless

Signed-off-by: Austin Lee <[email protected]>

* Fix RAG feature flag enablement.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments and suggestions.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Add unit tests for MachineLearningPlugin

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* Allow RAG pipeline feature flag to be enabled and disabled dynamically (#1293)

* Allow RAG pipeline feature flag to be enabled and disabled dynamically.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Add negative test cases for RAG feature flag being turned off.

Signed-off-by: Austin Lee <[email protected]>

* Improve error checking.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
(cherry picked from commit 1112612)

Co-authored-by: HenryL27 <[email protected]>
@DarshitChanpura
Copy link
Member

@austintlee Should this issue be moved to 2.11?

@austintlee
Copy link
Collaborator Author

Let me just quickly highlight what is being released in 2.10.

  1. A new CRUD API for conversational memory. You can create and store "conversations" and "interactions" in an OpenSearch cluster.
    • Access control on conversations and interactions is at the conversation level.
    • Currently, we only support "private" mode meaning access to a conversation is tied to the owner/creator of the conversation.
  2. A new search processor that performs Retrieval Augmented Generation using search query results and a remote inference service (e.g. OpenAI) and conversational memory.

So, most of what we mentioned above in the RFC should be coming out in 2.10 as an experimental feature. It is being made available via the ml-commons plugin so it should be fairly easy for people to try out. We will have a tutorial to go with this release on how to use this feature.

Our work is not done. We want to make sure this feature goes GA by 2.11. We have some improvements we have in mind. We are excited to make this available in 2.10 and are looking forward to feedback and suggestions. There are a lot of interesting things people are doing in the RAG space and we would love to work with the community to bring these ideas to OpenSearch!

@dylan-tong-aws
Copy link

@austintlee, can you clarify the purpose of this query block in the example that you provided in the RFC?

"ext": {
"question_answering_parameters": {
"question": "When was Abraham Lincoln born?"
},

It's not clear why it repeats the query context:
"query_text": "When was Abraham Lincoln born?",

Reference:

GET wiki-simple-paras/_search?search_pipeline=convo_qa_pipeline
{
"_source": ["title", "text"],
"query" : {
"neural": {
"text_vector": {
"query_text": "When was Abraham Lincoln born?",
"k": 10,
"model_id": ""
}
}
},
"ext": {
"question_answering_parameters": {
"question": "When was Abraham Lincoln born?"
},
"conversation" : {
"id": "...",
"prompt": "..."
}
}
}

@austintlee
Copy link
Collaborator Author

austintlee commented Sep 14, 2023

@dylan-tong-aws

Oftentimes, you may want to customize your query to OpenSearch (hybrid search, e.g.) and feed the result as additional context to an LLM so the current interface allows applications to construct the OS query and the LLM question as two inputs.

In trying to keep the example simple, I may have made it a bit confusing since it repeats the same question twice. But let's say you want to ask a follow-up question - "when did he die?" In this case, you won't want to pass that question as-is to OpenSearch as it won't know what you mean by "he". But the LLM will figure it out based on the chat history.

Using the 2.10 Release Candidate, I made some sample queries to demonstrate the point:

Query 1 (BM25 + KNN)

POST demo/_search?size=5&search_pipeline=demo_pipeline
{
  "_source": ["title", "text"], 
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text":  "american presidents"
          }
        },
        {
          "neural": {
            "text_vector": {
              "query_text": "Was Abraham Lincoln a good politician?",
              "k": 10,
              "model_id": "<...>"
            }
          }
        }
      ],
      "boost": 1
    }
  },
  "ext": {
      "generative_qa_parameters": {
        "llm_model": "gpt-3.5-turbo",
        "llm_question": "Was Abraham Lincoln a good politician"
      }
  }
}

Query 2 (Term only)

POST demo/_search?size=5&search_pipeline=demo_pipeline
{
  "_source": ["title", "text"], 
  "query": {
    "hybrid": {
      "queries": [
        {
          "term": {
            "text": {
              "value": "president",
              "boost": 1
            }
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "text": {
                    "value": "character",
                    "boost": 1
                  }
                }
              },
              {
                "term": {
                  "text": {
                    "value": "politician",
                    "boost": 1
                  }
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "boost": 1
    }
  },
  "ext": {
      "generative_qa_parameters": {
        "llm_model": "gpt-3.5-turbo",
        "llm_question": "Was Abraham Lincoln a good politician"
      }
  }
}

We can introduce question rewriting (when did he die -> when did Abraham Lincoln die), but this may require some new work in SearchQueryBuilder, maybe an extension similar to what neural search and hybrid search did (e.g. ConversationalSearchQueryBuilder).

@dylan-tong-aws
Copy link

@austintlee, so the query clause is the retriever part of the RAG workflow, correct? So, when a neural search query is being used with this pipeline, the initial query will probably be redundant. Is the idea that the subsequent queries like "when did he die" will be passed via "llm_question" and the neural search query will keep the original query context like "what was Abraham Lincoln's life like?"

What controls do I have around history context. I see examples where you can provide a conversation (session) id. Can I dynamically specify the context history like "last=N" exchanges?

Also, have you thought about extending the neural search interface so that we can avoid repeated questions in the query syntax?

@austintlee
Copy link
Collaborator Author

Also, have you thought about extending the neural search interface so that we can avoid repeated questions in the query syntax?

Yes, we want to tackle this in the next iteration. This will simplify the experience. I think confusion here is coming from the fact that you have to enter each question twice when it doesn't have to be that way. As I stated above, I am considering a new search query type that gives the user the flexibility to ask one question or one question + an OpenSearch query (I gave two examples of this above).

HenryL27 added a commit to HenryL27/ml-commons that referenced this issue Oct 3, 2023
* Conversational Memory for GenAI Apps (opensearch-project#1196)

* moved code over

Signed-off-by: HenryL27 <[email protected]>

* added actions to MLPlugin; fixed io lib stuff

Signed-off-by: HenryL27 <[email protected]>

* fixed copyrights again

Signed-off-by: HenryL27 <[email protected]>

* Fix nullptr exception in .equals

Signed-off-by: HenryL27 <[email protected]>

* preserve thread context across action calls

Signed-off-by: HenryL27 <[email protected]>

* remove MissingResourceException from CreatInteractionRequest in favor of IOException

Signed-off-by: HenryL27 <[email protected]>

* move ConversationMet, Interaction, and Constants to common/conversational

Signed-off-by: HenryL27 <[email protected]>

* Sequentialize createInteraction to remove data race

Signed-off-by: HenryL27 <[email protected]>

* allow disorder when conversations have same timestamp

Signed-off-by: HenryL27 <[email protected]>

* lombokify

Signed-off-by: HenryL27 <[email protected]>

* add some unit testing

Signed-off-by: HenryL27 <[email protected]>

* Increase unit test coverage

Signed-off-by: HenryL27 <[email protected]>

* fix naming

Signed-off-by: HenryL27 <[email protected]>

* finish code coverage for actions

Signed-off-by: HenryL27 <[email protected]>

* Leave null values out of XContent per opensearch-project#1196 (comment)

Signed-off-by: HenryL27 <[email protected]>

* Add integration tests for rest actions

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* Complete unit testing for Index classes

Signed-off-by: HenryL27 <[email protected]>

* update build.gradle

Signed-off-by: HenryL27 <[email protected]>

* Finish unit tests

Signed-off-by: HenryL27 <[email protected]>

* Fail closed on missing convo access

Signed-off-by: HenryL27 <[email protected]>

* address code review/walkthrough comments

Signed-off-by: HenryL27 <[email protected]>

* re-add prompt temlplate and metadata fields at interaction level

Signed-off-by: HenryL27 <[email protected]>

* parse request body, not params, for post requests

Signed-off-by: HenryL27 <[email protected]>

* restructure with memory as higher-level term

Signed-off-by: HenryL27 <[email protected]>

* clean up build.gradle

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* change interaction field names
timestamp -> create_time
metadata -> additional_info

Signed-off-by: HenryL27 <[email protected]>

* fix GetInteractionsResponse xcontent tests

Signed-off-by: HenryL27 <[email protected]>

* propagate name change to variables and parameters

Signed-off-by: HenryL27 <[email protected]>

* clean logging and fix typos

Signed-off-by: HenryL27 <[email protected]>

* fix final convtructor according to find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* append plugin-ml- to index names

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Feature/conversation memory feature flag (opensearch-project#1271)

* add feature flag and checks to transport actions

Signed-off-by: HenryL27 <[email protected]>

* add feature flag tests

Signed-off-by: HenryL27 <[email protected]>

* fix typos for real with find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* rename conversational-memory directory to memory

Signed-off-by: HenryL27 <[email protected]>

* fix settings.gradle with new dir name

Signed-off-by: HenryL27 <[email protected]>

* re-add feature flag checks and tests to transport layer

Signed-off-by: HenryL27 <[email protected]>

* fix feature flag with updateConsumer

Signed-off-by: HenryL27 <[email protected]>

* remove redundant settings update

Signed-off-by: HenryL27 <[email protected]>

* clean up feature var initialization to avoid unchecked conversion warning

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) (opensearch-project#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (opensearch-project#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* [Feature] Add Retrieval Augmented Generation search processors (opensearch-project#1275)

* Put RAG pipeline behind a feature flag.

Signed-off-by: Austin Lee <[email protected]>

* Add support for chat history in RAG using the Conversational Memory API

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless

Signed-off-by: Austin Lee <[email protected]>

* Fix RAG feature flag enablement.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments and suggestions.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Add unit tests for MachineLearningPlugin

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* Allow RAG pipeline feature flag to be enabled and disabled dynamically (opensearch-project#1293)

* Allow RAG pipeline feature flag to be enabled and disabled dynamically.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Add negative test cases for RAG feature flag being turned off.

Signed-off-by: Austin Lee <[email protected]>

* Improve error checking.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
HenryL27 added a commit to HenryL27/ml-commons that referenced this issue Oct 3, 2023
* Conversational Memory for GenAI Apps (opensearch-project#1196)

* moved code over

Signed-off-by: HenryL27 <[email protected]>

* added actions to MLPlugin; fixed io lib stuff

Signed-off-by: HenryL27 <[email protected]>

* fixed copyrights again

Signed-off-by: HenryL27 <[email protected]>

* Fix nullptr exception in .equals

Signed-off-by: HenryL27 <[email protected]>

* preserve thread context across action calls

Signed-off-by: HenryL27 <[email protected]>

* remove MissingResourceException from CreatInteractionRequest in favor of IOException

Signed-off-by: HenryL27 <[email protected]>

* move ConversationMet, Interaction, and Constants to common/conversational

Signed-off-by: HenryL27 <[email protected]>

* Sequentialize createInteraction to remove data race

Signed-off-by: HenryL27 <[email protected]>

* allow disorder when conversations have same timestamp

Signed-off-by: HenryL27 <[email protected]>

* lombokify

Signed-off-by: HenryL27 <[email protected]>

* add some unit testing

Signed-off-by: HenryL27 <[email protected]>

* Increase unit test coverage

Signed-off-by: HenryL27 <[email protected]>

* fix naming

Signed-off-by: HenryL27 <[email protected]>

* finish code coverage for actions

Signed-off-by: HenryL27 <[email protected]>

* Leave null values out of XContent per opensearch-project#1196 (comment)

Signed-off-by: HenryL27 <[email protected]>

* Add integration tests for rest actions

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* Complete unit testing for Index classes

Signed-off-by: HenryL27 <[email protected]>

* update build.gradle

Signed-off-by: HenryL27 <[email protected]>

* Finish unit tests

Signed-off-by: HenryL27 <[email protected]>

* Fail closed on missing convo access

Signed-off-by: HenryL27 <[email protected]>

* address code review/walkthrough comments

Signed-off-by: HenryL27 <[email protected]>

* re-add prompt temlplate and metadata fields at interaction level

Signed-off-by: HenryL27 <[email protected]>

* parse request body, not params, for post requests

Signed-off-by: HenryL27 <[email protected]>

* restructure with memory as higher-level term

Signed-off-by: HenryL27 <[email protected]>

* clean up build.gradle

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* change interaction field names
timestamp -> create_time
metadata -> additional_info

Signed-off-by: HenryL27 <[email protected]>

* fix GetInteractionsResponse xcontent tests

Signed-off-by: HenryL27 <[email protected]>

* propagate name change to variables and parameters

Signed-off-by: HenryL27 <[email protected]>

* clean logging and fix typos

Signed-off-by: HenryL27 <[email protected]>

* fix final convtructor according to find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* append plugin-ml- to index names

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Feature/conversation memory feature flag (opensearch-project#1271)

* add feature flag and checks to transport actions

Signed-off-by: HenryL27 <[email protected]>

* add feature flag tests

Signed-off-by: HenryL27 <[email protected]>

* fix typos for real with find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* rename conversational-memory directory to memory

Signed-off-by: HenryL27 <[email protected]>

* fix settings.gradle with new dir name

Signed-off-by: HenryL27 <[email protected]>

* re-add feature flag checks and tests to transport layer

Signed-off-by: HenryL27 <[email protected]>

* fix feature flag with updateConsumer

Signed-off-by: HenryL27 <[email protected]>

* remove redundant settings update

Signed-off-by: HenryL27 <[email protected]>

* clean up feature var initialization to avoid unchecked conversion warning

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) (opensearch-project#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (opensearch-project#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* [Feature] Add Retrieval Augmented Generation search processors (opensearch-project#1275)

* Put RAG pipeline behind a feature flag.

Signed-off-by: Austin Lee <[email protected]>

* Add support for chat history in RAG using the Conversational Memory API

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless

Signed-off-by: Austin Lee <[email protected]>

* Fix RAG feature flag enablement.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments and suggestions.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Add unit tests for MachineLearningPlugin

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* Allow RAG pipeline feature flag to be enabled and disabled dynamically (opensearch-project#1293)

* Allow RAG pipeline feature flag to be enabled and disabled dynamically.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Add negative test cases for RAG feature flag being turned off.

Signed-off-by: Austin Lee <[email protected]>

* Improve error checking.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
(cherry picked from commit 1112612)
Signed-off-by: HenryL27 <[email protected]>
dhrubo-os pushed a commit that referenced this issue Oct 4, 2023
* Feature/conversation backport to 2.x (#1286)

* Conversational Memory for GenAI Apps (#1196)

* moved code over

Signed-off-by: HenryL27 <[email protected]>

* added actions to MLPlugin; fixed io lib stuff

Signed-off-by: HenryL27 <[email protected]>

* fixed copyrights again

Signed-off-by: HenryL27 <[email protected]>

* Fix nullptr exception in .equals

Signed-off-by: HenryL27 <[email protected]>

* preserve thread context across action calls

Signed-off-by: HenryL27 <[email protected]>

* remove MissingResourceException from CreatInteractionRequest in favor of IOException

Signed-off-by: HenryL27 <[email protected]>

* move ConversationMet, Interaction, and Constants to common/conversational

Signed-off-by: HenryL27 <[email protected]>

* Sequentialize createInteraction to remove data race

Signed-off-by: HenryL27 <[email protected]>

* allow disorder when conversations have same timestamp

Signed-off-by: HenryL27 <[email protected]>

* lombokify

Signed-off-by: HenryL27 <[email protected]>

* add some unit testing

Signed-off-by: HenryL27 <[email protected]>

* Increase unit test coverage

Signed-off-by: HenryL27 <[email protected]>

* fix naming

Signed-off-by: HenryL27 <[email protected]>

* finish code coverage for actions

Signed-off-by: HenryL27 <[email protected]>

* Leave null values out of XContent per #1196 (comment)

Signed-off-by: HenryL27 <[email protected]>

* Add integration tests for rest actions

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* Complete unit testing for Index classes

Signed-off-by: HenryL27 <[email protected]>

* update build.gradle

Signed-off-by: HenryL27 <[email protected]>

* Finish unit tests

Signed-off-by: HenryL27 <[email protected]>

* Fail closed on missing convo access

Signed-off-by: HenryL27 <[email protected]>

* address code review/walkthrough comments

Signed-off-by: HenryL27 <[email protected]>

* re-add prompt temlplate and metadata fields at interaction level

Signed-off-by: HenryL27 <[email protected]>

* parse request body, not params, for post requests

Signed-off-by: HenryL27 <[email protected]>

* restructure with memory as higher-level term

Signed-off-by: HenryL27 <[email protected]>

* clean up build.gradle

Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

* change interaction field names
timestamp -> create_time
metadata -> additional_info

Signed-off-by: HenryL27 <[email protected]>

* fix GetInteractionsResponse xcontent tests

Signed-off-by: HenryL27 <[email protected]>

* propagate name change to variables and parameters

Signed-off-by: HenryL27 <[email protected]>

* clean logging and fix typos

Signed-off-by: HenryL27 <[email protected]>

* fix final convtructor according to find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* append plugin-ml- to index names

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Feature/conversation memory feature flag (#1271)

* add feature flag and checks to transport actions

Signed-off-by: HenryL27 <[email protected]>

* add feature flag tests

Signed-off-by: HenryL27 <[email protected]>

* fix typos for real with find-and-replace

Signed-off-by: HenryL27 <[email protected]>

* rename conversational-memory directory to memory

Signed-off-by: HenryL27 <[email protected]>

* fix settings.gradle with new dir name

Signed-off-by: HenryL27 <[email protected]>

* re-add feature flag checks and tests to transport layer

Signed-off-by: HenryL27 <[email protected]>

* fix feature flag with updateConsumer

Signed-off-by: HenryL27 <[email protected]>

* remove redundant settings update

Signed-off-by: HenryL27 <[email protected]>

* clean up feature var initialization to avoid unchecked conversion warning

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>

* Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) (#1195)

* Use Search Pipeline processors, Remote Inference and HttpConnector to
enable Retrieval Augmented Generation (RAG) (#1150)

Signed-off-by: Austin Lee <[email protected]>

* Address test coverage.

Signed-off-by: Austin Lee <[email protected]>

* Fix/update imports due to changes coming from core.

Signed-off-by: Austin Lee <[email protected]>

* Update license header.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Use List for context fields so we can pull contexts from multiple fields when constructing contexts for LLMs.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless issue.

Signed-off-by: Austin Lee <[email protected]>

* Update README.

Signed-off-by: Austin Lee <[email protected]>

* Fix ml-client shadowJar implicit dependency issue.

Signed-off-by: Austin Lee <[email protected]>

* Add a wrapper client for ML predict.

Signed-off-by: Austin Lee <[email protected]>

* Add tests for the internal ML client.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* [Feature] Add Retrieval Augmented Generation search processors (#1275)

* Put RAG pipeline behind a feature flag.

Signed-off-by: Austin Lee <[email protected]>

* Add support for chat history in RAG using the Conversational Memory API

Signed-off-by: Austin Lee <[email protected]>

* Fix spotless

Signed-off-by: Austin Lee <[email protected]>

* Fix RAG feature flag enablement.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments and suggestions.

Signed-off-by: Austin Lee <[email protected]>

* Address comments.

Signed-off-by: Austin Lee <[email protected]>

* Add unit tests for MachineLearningPlugin

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* Allow RAG pipeline feature flag to be enabled and disabled dynamically (#1293)

* Allow RAG pipeline feature flag to be enabled and disabled dynamically.

Signed-off-by: Austin Lee <[email protected]>

* Address review comments.

Signed-off-by: Austin Lee <[email protected]>

* Add negative test cases for RAG feature flag being turned off.

Signed-off-by: Austin Lee <[email protected]>

* Improve error checking.

Signed-off-by: Austin Lee <[email protected]>

---------

Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: HenryL27 <[email protected]>

* apply spotless

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
(cherry picked from commit 1112612)
Signed-off-by: HenryL27 <[email protected]>

* fix http library version

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Signed-off-by: Austin Lee <[email protected]>
Co-authored-by: Austin Lee <[email protected]>
@khoaisohd
Copy link

Hi @austintlee, since the conversational memory will grow over time according to the number of conversational searches customers made. Do we have any idea about conversation memory scalability?

@HenryL27
Copy link
Collaborator

HenryL27 commented Jan 9, 2024

@ylwu-amzn how's the appsec review going? Are we gonna hit GA for 2.12? thx

@mashah
Copy link

mashah commented Jan 9, 2024

Folks,

We know that the conversational memory feature is experimental because of internal AWS processes needed to test the integrity of the feature.

Can we get a status on how your review and testing process is going? I believe these features were scheduled to be GA in 2.12. Since 2.12 is not delayed until 20 Feb, I am assuming that we have almost cleared the hurdle.

@sean-zheng-amazon @ylwu-amzn

@sean-zheng-amazon
Copy link
Contributor

@mashah yes we are on track to GA the feature in 2.12. The pentest is scheduled to start 23 Jan, and finish by 31 Jan. we still have a couple of week's time to fix if any security issues caught in the test.

@ylwu-amzn
Copy link
Collaborator

Yes, hope we don't have much issues for PenTest. If they find any issue, we will share to your team .

@mashah
Copy link

mashah commented Jan 19, 2024

Is the pentest on track for next week on 23 Jan for the conversational memory features?

@sean-zheng-amazon
Copy link
Contributor

yes we are on track

@dagneyb dagneyb added the v2.12.0 Issues targeting release v2.12.0 label Jan 22, 2024
@dblock
Copy link
Member

dblock commented Mar 12, 2024

Can this be closed?

@austintlee
Copy link
Collaborator Author

It's GA in 2.12. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments from the OpenSearch Community v2.12.0 Issues targeting release v2.12.0
Projects
Status: 2.12.0 (Launched)
Status: Released
Development

No branches or pull requests