Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[connectors] - refactor(google_drive): streamline
file.ts
(#6210)
* [connectors/google_drive] - refactor: streamline file processing for Google Drive - Extract Google document export logic into a separate function to improve modularity - Implement generic file download handling based on MIME type with dedicated functions - Introduce specialized functions for handling PDF, DOCX, PPTX, and CSV file conversions - Ensure proper error handling and logging across different file processing scenarios * [connectors] - refactor: remove PDF.js dependency and switch to pdftotext for PDF handling - Removed the @types/pdfjs-dist and pdfjs-dist packages from dependencies, as they are no longer used - Extracting PDF text is now handled using the pdftotext command-line utility instead of PDF.js - Updated the `handlePdfFile` function to use the new dpdf2text method for PDF text extraction - Created a new dpdf2text utility function that spawns a child process to run pdftotext and handle its output - Removed the deprecated createPdfTextStream and extractTextFromPDF functions as they were part of the old PDF.js-based implementation - Added error handling for the pdftotext child process in dpdf2text, resolving promise based on the exit code [connectors] - chore: clean up dependencies and mark some as peers - Marked multiple dependencies as "peer" to presumably align with package requirements and ensure correct installation in consumer packages - Performed cleanup in package-lock.json to reflect the removal of the pdfjs-dist package and its types - Removed unused path2d package from package-lock.json as it was likely related to the previous PDF handling method * [google_drive/temporal] - fix: return empty section on csv file upsert success - Updated the handleCsvFile function to return an "empty" CoreAPIDataSourceDocumentSection upon successful table upsert instead of null - This change allows distinguishing between failed and successful table upsert operations for CSV files * [connectors] - refactor: update logger import path in google_drive connector - Replace the direct pino logger import with a custom logger module path - Ensure consistency in logger usage across the connectors module [connectors] - fix: add newline at end of package-lock.json - Conform to standard JSON formatting by ensuring a newline at the end of package-lock.json * [connectors/google_drive] - refactor: rename functions to reflect broader export scope - Renamed handleGoogleDocExport to handleGoogleDriveExport to generalize export function usage beyond Google Docs - Changed handleFileDownload to handleFileExport to match renaming convention and clarify functionality * [types] - refactor: remove cheerio dependency from package - Cheerio and its related dependencies have been removed reducing the package complexity and potential security issues - This change likely indicates a shift away from HTML/XML parsing within this scope [front] - refactor: synchronize package-lock with types scope changes - Alignment of front end package-lock.json with the recent removal of cheerio in the types scope * [types] - refactor: remove cheerio dependency from package - Cheerio and its related dependencies have been removed reducing the package complexity and potential security issues - This change likely indicates a shift away from HTML/XML parsing within this scope [front] - refactor: synchronize package-lock with types scope changes - Alignment of front end package-lock.json with the recent removal of cheerio in the types scope * fix: lint/format * [connectors] - feature: improve handling of unsupported mimeTypes for text extraction - Added a check to skip text extraction for files with mimeTypes not supported by the text extraction service - Enhanced logging for cases when an unexpected mimeType is encountered to aid in debugging issues * fix: lint/format --------- Co-authored-by: Jules <[email protected]> Co-authored-by: Flavien David <[email protected]>
- Loading branch information