What is the difference between LlamaIndex text splitters and node parsers? #13271

SinghCoder · 2024-05-04T15:49:30Z

SinghCoder
May 4, 2024

Aren't they both the same thing - given a document, chunks them down into nodes.

@dosu · 2024-05-04T16:05:25Z

dosubot[bot]
bot May 4, 2024

LlamaIndex text splitters and node parsers serve different purposes in document processing. Text splitters primarily focus on dividing text into smaller segments without directly considering the document's metadata. They operate by taking a string of text and outputting a list of smaller text segments. This process is abstract and requires specific implementation details to be defined in subclasses.

On the other hand, node parsers are designed to parse documents into nodes, which involves not just splitting text but also understanding the document's structure, semantics, and metadata. Node parsers work by parsing documents to establish node relationships, update metadata, and consider the document's overall structure. This process is more complex and involves methods that handle the parsing of nodes from documents, including considering metadata and relationships between nodes.

The key difference lies in the approach to handling text and metadata: text splitters are simpler and focus solely on breaking down text, while node parsers take a more holistic approach, considering the document's structure and metadata to parse it into nodes. The MetadataAwareTextSplitter subclass bridges the gap between these two by using metadata to inform how text is split, offering a nuanced approach that leans towards the complexity of node parsing.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

1 reply

jay-joshy Jun 3, 2024

Is it possible to combine text splitters and node parsers?
For example, I have a set of pdf documents - some which include tables. I've loaded the data using Llama parse.
I like the functionality of the node parser MarkdownElementNodeParser() specifically for its ability to extract out tables, but I find the chunk sizes otherwise for the main text is too large and 'pollutes' the context I want to send to another model with too much extra information.

In contrast, the regular text splitters (ie Sentence Splitter) can be configured to have smaller more granular chunks of context, but do not have the capability to extract a table accurately...

Can these two approaches be combined? @dosu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference between LlamaIndex text splitters and node parsers? #13271

{{title}}

Replies: 1 comment 1 reply

{{title}}

Details

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What is the difference between LlamaIndex text splitters and node parsers? #13271

SinghCoder May 4, 2024

Replies: 1 comment · 1 reply

dosubot[bot] bot May 4, 2024

Details

jay-joshy Jun 3, 2024

SinghCoder
May 4, 2024

Replies: 1 comment 1 reply

dosubot[bot]
bot May 4, 2024