-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/loaders #1116
Merged
Merged
Refactor/loaders #1116
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
collindutter
force-pushed
the
refactor/artifacts
branch
from
August 30, 2024 15:27
164ffde
to
3685f77
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
August 30, 2024 15:28
7e97c48
to
63469be
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
from
September 3, 2024 19:23
3685f77
to
15bb112
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
September 3, 2024 19:23
b04b2a8
to
6c5062d
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
from
September 4, 2024 00:04
15bb112
to
af07071
Compare
collindutter
force-pushed
the
refactor/loaders
branch
2 times, most recently
from
September 4, 2024 17:05
a568189
to
ded5cd9
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
11 times, most recently
from
September 4, 2024 23:01
96b9752
to
9669ba8
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
September 4, 2024 23:30
87ea7fa
to
cd88b5c
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
from
September 4, 2024 23:33
9669ba8
to
2904e50
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
September 4, 2024 23:58
cd88b5c
to
62c7f7a
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
from
September 5, 2024 15:44
2904e50
to
d6eb8ac
Compare
collindutter
force-pushed
the
refactor/loaders
branch
3 times, most recently
from
September 5, 2024 15:45
019ebbd
to
62b6e17
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
from
September 5, 2024 18:21
d6eb8ac
to
8e8b8e3
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
September 5, 2024 18:21
6b3bc8b
to
bb1ac3d
Compare
collindutter
force-pushed
the
refactor/artifacts
branch
2 times, most recently
from
September 5, 2024 20:55
57fa0f7
to
31ba036
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
September 5, 2024 21:00
0b731fc
to
a087355
Compare
collindutter
force-pushed
the
refactor/loaders
branch
from
September 17, 2024 16:49
3c5374a
to
d940a98
Compare
collindutter
force-pushed
the
refactor/loaders
branch
4 times, most recently
from
October 1, 2024 17:05
13da4e8
to
506fe6a
Compare
collindutter
force-pushed
the
refactor/loaders
branch
6 times, most recently
from
October 3, 2024 17:55
d9c8c86
to
3a0e653
Compare
dylanholmes
previously approved these changes
Oct 3, 2024
collindutter
force-pushed
the
refactor/loaders
branch
from
October 4, 2024 20:28
e877f21
to
0418e7c
Compare
dylanholmes
approved these changes
Oct 4, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes
Added
BaseFileLoader
for Loaders that load from a path.BaseLoader.fetch()
method for fetching data from a source.BaseLoader.parse()
method for parsing fetched data.BaseFileManager.encoding
to specify the encoding when loading and saving files.BaseWebScraperDriver.extract_page()
method for extracting data from an already scraped web page.TextLoaderRetrievalRagModule.chunker
for specifying the chunking strategy.file_utils.get_mime_type
utility for getting the MIME type of a file.Changed
BaseFileManager.default_loader
andBaseFileManager.loaders
.fileutils.load_file
andfileutils.load_files
.loaders-dataframe
andloaders-audio
extras as they are no longer needed.TextLoader
,PdfLoader
,ImageLoader
, andAudioLoader
now take astr | PathLike
instead ofbytes
.DataframeLoader
.LocalFileManagerDriver.workdir
is now optional.filetype
is now a core dependency.FileManagerTool
now usesfiletype
for more accurate file type detection.BaseFileLoader.load_file()
will now either return aTextArtifact
or aBlobArtifact
depending on whetherBaseFileManager.encoding
is set.The purpose of this PR was to clean up the Loader interface, and their define purpose.
Loaders fetch data from a source, and parse it into Artifacts. Loaders do not chunk data, that is the role of Chunkers.
We provide 4 top level Loaders that provide from a variety of sources:
BaseFileLoader then has subclasses that provide file-type specific parsing logic:
In a future PR, I'd like this parsing logic to live in a new class of Driver, File Parser Driver(?), instead of Loaders.
Issue ticket number and link
Closes #1102
📚 Documentation preview 📚: https://griptape--1116.org.readthedocs.build//1116/