Skip to content

Commit

Permalink
Staging to main - bug fixes (#959)
Browse files Browse the repository at this point in the history
* Update README.md

* Dev to staging  (#859)

* processing count updated on cancel

* format fixes

* remove whitespace for enviroment variable which due to an error "xxx may not contain whitespace" (#707)

* updated disconnected nodes

* updated disconnected nodes

* fix: Processed count update on failed condition

* added disconnected and up nodes

* removed __Entity__ labels

* removed graph_object

* removed graph object in the function

* resetting the alert message on success scenario

* Modified queries

* populate graph schema

* not clearing the password when there is error scenario

* fixed the vector index loading issue

* fix: empty credentials payload for recreate vector index api

* chatbot status (#676)

* chatbot status

* connection status check for ASK button

* refresh disable check

* review comment resolved

* format fixes

* added properties and modified to entity labels

* Post processing call after all files completion (#716)

* Dev To STAGING (#532)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>

* Dev (#535)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>

* Dev (#537)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>

* Fix typo: correct 'josn_obj' to 'json_obj' (#697)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: Pravesh Kumar <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-authored-by: Morgan Senechal <[email protected]>
Co-authored-by: karanchellani <[email protected]>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: aashipandya <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-authored-by: Morgan Senechal <[email protected]>
Co-authored-by: karanchellani <[email protected]>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: aashipandya <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-authored-by: Morgan Senechal <[email protected]>
Co-authored-by: karanchellani <[email protected]>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unus…
  • Loading branch information
1 parent 4f743dd commit e5ca90a
Show file tree
Hide file tree
Showing 37 changed files with 793 additions and 586 deletions.
21 changes: 2 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,32 +31,14 @@ If you are using Neo4j Desktop, you will not be able to use the docker-compose b
### Local deployment
#### Running through docker-compose
By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations.
Accoroding to enviornment we are configuring the models which is indicated by VITE_LLM_MODELS_PROD variable we can configure model based on our need.
According to enviornment we are configuring the models which is indicated by VITE_LLM_MODELS_PROD variable we can configure model based on our need.
EX:
```env
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
```

if you only want OpenAI:
```env
VITE_LLM_MODELS_PROD="diffbot,openai-gpt-3.5,openai-gpt-4o"
```

Backend ENV
```env
OPENAI_API_KEY="your-openai-key"
```
if you only want Diffbot:
```env
VITE_LLM_MODELS_PROD="diffbot"
```

Backend ENV
```env
DIFFBOT_API_KEY="your-diffbot-key"
```

You can then run Docker Compose to build and start all components:
```bash
docker-compose up --build
Expand Down Expand Up @@ -89,6 +71,7 @@ VITE_CHAT_MODES=""
If however you want to specify the only vector mode or only graph mode you can do that by specifying the mode in the env:
```env
VITE_CHAT_MODES="vector,graph"
VITE_CHAT_MODES="vector,graph"
```

#### Running Backend and Frontend separately (dev environment)
Expand Down
5 changes: 3 additions & 2 deletions backend/example.env
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ NEO4J_USER_AGENT=""
ENABLE_USER_AGENT = ""
LLM_MODEL_CONFIG_model_version=""
ENTITY_EMBEDDING="" True or False
DUPLICATE_SCORE_VALUE = ""
DUPLICATE_TEXT_DISTANCE = ""
DUPLICATE_SCORE_VALUE =0.97
DUPLICATE_TEXT_DISTANCE =3
DEFAULT_DIFFBOT_CHAT_MODEL="openai_gpt_4o" #whichever model specified here , need to add config for that model in below format)
#examples
LLM_MODEL_CONFIG_openai_gpt_3.5="gpt-3.5-turbo-0125,openai_api_key"
LLM_MODEL_CONFIG_openai_gpt_4o_mini="gpt-4o-mini-2024-07-18,openai_api_key"
Expand Down
51 changes: 28 additions & 23 deletions backend/score.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,19 @@
from src.communities import create_communities
from src.neighbours import get_neighbour_nodes
import json
from typing import List, Mapping, Union
from typing import List
from starlette.middleware.sessions import SessionMiddleware
import google_auth_oauthlib.flow
from google.oauth2.credentials import Credentials
import os
from src.logger import CustomLogger
from datetime import datetime, timezone
import time
import gc
from Secweb import SecWeb
from Secweb.StrictTransportSecurity import HSTS
from Secweb.ContentSecurityPolicy import ContentSecurityPolicy
from Secweb.XContentTypeOptions import XContentTypeOptions
from Secweb.XFrameOptions import XFrame
from fastapi.middleware.gzip import GZipMiddleware
from src.ragas_eval import *
from starlette.types import ASGIApp, Message, Receive, Scope, Send
import gzip
from langchain_neo4j import Neo4jGraph

logger = CustomLogger()
Expand Down Expand Up @@ -493,11 +488,13 @@ async def connect(uri=Form(), userName=Form(), password=Form(), database=Form())
start = time.time()
graph = create_graph_database_connection(uri, userName, password, database)
result = await asyncio.to_thread(connection_check_and_get_vector_dimensions, graph, database)
gcs_file_cache = os.environ.get('GCS_FILE_CACHE')
end = time.time()
elapsed_time = end - start
json_obj = {'api_name':'connect','db_url':uri, 'userName':userName, 'database':database,'status':result, 'count':1, 'logging_time': formatted_time(datetime.now(timezone.utc)), 'elapsed_api_time':f'{elapsed_time:.2f}'}
logger.log_struct(json_obj, "INFO")
result['elapsed_api_time'] = f'{elapsed_time:.2f}'
result['gcs_file_cache'] = gcs_file_cache
return create_api_response('Success',data=result)
except Exception as e:
job_status = "Failed"
Expand Down Expand Up @@ -571,6 +568,8 @@ async def generate():
uri = url
if " " in url:
uri= url.replace(" ","+")
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
while True:
try:
if await request.is_disconnected():
Expand All @@ -579,8 +578,6 @@ async def generate():
# get the current status of document node

else:
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
if len(result) > 0:
status = json.dumps({'fileName':file_name,
Expand Down Expand Up @@ -968,22 +965,30 @@ async def fetch_chunktext(
gc.collect()


@app.post("/backend_connection_configuation")
async def backend_connection_configuation():
@app.post("/backend_connection_configuration")
async def backend_connection_configuration():
try:
graph = Neo4jGraph()
logging.info(f'login connection status of object: {graph}')
if graph is not None:
graph_connection = True
isURI = os.getenv('NEO4J_URI')
isUsername= os.getenv('NEO4J_USERNAME')
isDatabase= os.getenv('NEO4J_DATABASE')
isPassword= os.getenv('NEO4J_PASSWORD')
encoded_password = encode_password(isPassword)
graphDb_data_Access = graphDBdataAccess(graph)
gds_status = graphDb_data_Access.check_gds_version()
write_access = graphDb_data_Access.check_account_access(database=isDatabase)
return create_api_response('Success',message=f"Backend connection successful",data={'graph_connection':graph_connection,'uri':isURI,'user_name':isUsername,'database':isDatabase,'password':encoded_password,'gds_status':gds_status,'write_access':write_access})
uri = os.getenv('NEO4J_URI')
username= os.getenv('NEO4J_USERNAME')
database= os.getenv('NEO4J_DATABASE')
password= os.getenv('NEO4J_PASSWORD')
gcs_file_cache = os.environ.get('GCS_FILE_CACHE')
if all([uri, username, database, password]):
print(f'uri:{uri}, usrName:{username}, database :{database}, password: {password}')
graph = Neo4jGraph()
logging.info(f'login connection status of object: {graph}')
if graph is not None:
graph_connection = True
encoded_password = encode_password(password)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.connection_check_and_get_vector_dimensions(database)
result["graph_connection"] = graph_connection
result["uri"] = uri
result["user_name"] = username
result["database"] = database
result["password"] = encoded_password
result['gcs_file_cache'] = gcs_file_cache
return create_api_response('Success',message=f"Backend connection successful",data=result)
else:
graph_connection = False
return create_api_response('Success',message=f"Backend connection is not successful",data=graph_connection)
Expand Down
6 changes: 6 additions & 0 deletions backend/src/entities/source_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ class sourceNode:
gcsBucketFolder:str=None
gcsProjectId:str=None
awsAccessKeyId:str=None
chunkNodeCount:int=None
chunkRelCount:int=None
entityNodeCount:int=None
entityEntityRelCount:int=None
communityNodeCount:int=None
communityRelCount:int=None
node_count:int=None
relationship_count:str=None
model:str=None
Expand Down
111 changes: 61 additions & 50 deletions backend/src/graphDB_dataAccess.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,24 @@ def create_source_node(self, obj_source_node:sourceNode):
d.relationshipCount = $r_count, d.model= $model, d.gcsBucket=$gcs_bucket,
d.gcsBucketFolder= $gcs_bucket_folder, d.language= $language,d.gcsProjectId= $gcs_project_id,
d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0,
d.access_token=$access_token""",
d.access_token=$access_token,
d.chunkNodeCount=$chunkNodeCount,d.chunkRelCount=$chunkRelCount,
d.entityNodeCount=$entityNodeCount,d.entityEntityRelCount=$entityEntityRelCount,
d.communityNodeCount=$communityNodeCount,d.communityRelCount=$communityRelCount""",
{"fn":obj_source_node.file_name, "fs":obj_source_node.file_size, "ft":obj_source_node.file_type, "st":job_status,
"url":obj_source_node.url,
"awsacc_key_id":obj_source_node.awsAccessKeyId, "f_source":obj_source_node.file_source, "c_at":obj_source_node.created_at,
"u_at":obj_source_node.created_at, "pt":0, "e_message":'', "n_count":0, "r_count":0, "model":obj_source_node.model,
"gcs_bucket": obj_source_node.gcsBucket, "gcs_bucket_folder": obj_source_node.gcsBucketFolder,
"language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId,
"access_token":obj_source_node.access_token})
"access_token":obj_source_node.access_token,
"chunkNodeCount":obj_source_node.chunkNodeCount,
"chunkRelCount":obj_source_node.chunkRelCount,
"entityNodeCount":obj_source_node.entityNodeCount,
"entityEntityRelCount":obj_source_node.entityEntityRelCount,
"communityNodeCount":obj_source_node.communityNodeCount,
"communityRelCount":obj_source_node.communityRelCount
})
except Exception as e:
error_message = str(e)
logging.info(f"error_message = {error_message}")
Expand Down Expand Up @@ -108,7 +118,7 @@ def update_source_node(self, obj_source_node:sourceNode):
self.graph.query(query,param)
except Exception as e:
error_message = str(e)
self.update_exception_db(self.file_name,error_message)
self.update_exception_db(self,self.file_name,error_message)
raise Exception(error_message)

def get_source_list(self):
Expand Down Expand Up @@ -463,51 +473,52 @@ def update_node_relationship_count(self,document_name):
param = {"document_name": document_name}
result = self.execute_query(NODEREL_COUNT_QUERY_WITHOUT_COMMUNITY, param)
response = {}
for record in result:
filename = record["filename"]
chunkNodeCount = record["chunkNodeCount"]
chunkRelCount = record["chunkRelCount"]
entityNodeCount = record["entityNodeCount"]
entityEntityRelCount = record["entityEntityRelCount"]
if (not document_name) and (community_flag):
communityNodeCount = record["communityNodeCount"]
communityRelCount = record["communityRelCount"]
else:
communityNodeCount = 0
communityRelCount = 0
nodeCount = int(chunkNodeCount) + int(entityNodeCount) + int(communityNodeCount)
relationshipCount = int(chunkRelCount) + int(entityEntityRelCount) + int(communityRelCount)
update_query = """
MATCH (d:Document {fileName: $filename})
SET d.chunkNodeCount = $chunkNodeCount,
d.chunkRelCount = $chunkRelCount,
d.entityNodeCount = $entityNodeCount,
d.entityEntityRelCount = $entityEntityRelCount,
d.communityNodeCount = $communityNodeCount,
d.communityRelCount = $communityRelCount,
d.nodeCount = $nodeCount,
d.relationshipCount = $relationshipCount
"""
self.execute_query(update_query,{
"filename": filename,
"chunkNodeCount": chunkNodeCount,
"chunkRelCount": chunkRelCount,
"entityNodeCount": entityNodeCount,
"entityEntityRelCount": entityEntityRelCount,
"communityNodeCount": communityNodeCount,
"communityRelCount": communityRelCount,
"nodeCount" : nodeCount,
"relationshipCount" : relationshipCount
})

response[filename] = {"chunkNodeCount": chunkNodeCount,
"chunkRelCount": chunkRelCount,
"entityNodeCount": entityNodeCount,
"entityEntityRelCount": entityEntityRelCount,
"communityNodeCount": communityNodeCount,
"communityRelCount": communityRelCount,
"nodeCount" : nodeCount,
"relationshipCount" : relationshipCount
}

if result:
for record in result:
filename = record.get("filename",None)
chunkNodeCount = int(record.get("chunkNodeCount",0))
chunkRelCount = int(record.get("chunkRelCount",0))
entityNodeCount = int(record.get("entityNodeCount",0))
entityEntityRelCount = int(record.get("entityEntityRelCount",0))
if (not document_name) and (community_flag):
communityNodeCount = int(record.get("communityNodeCount",0))
communityRelCount = int(record.get("communityRelCount",0))
else:
communityNodeCount = 0
communityRelCount = 0
nodeCount = int(chunkNodeCount) + int(entityNodeCount) + int(communityNodeCount)
relationshipCount = int(chunkRelCount) + int(entityEntityRelCount) + int(communityRelCount)
update_query = """
MATCH (d:Document {fileName: $filename})
SET d.chunkNodeCount = $chunkNodeCount,
d.chunkRelCount = $chunkRelCount,
d.entityNodeCount = $entityNodeCount,
d.entityEntityRelCount = $entityEntityRelCount,
d.communityNodeCount = $communityNodeCount,
d.communityRelCount = $communityRelCount,
d.nodeCount = $nodeCount,
d.relationshipCount = $relationshipCount
"""
self.execute_query(update_query,{
"filename": filename,
"chunkNodeCount": chunkNodeCount,
"chunkRelCount": chunkRelCount,
"entityNodeCount": entityNodeCount,
"entityEntityRelCount": entityEntityRelCount,
"communityNodeCount": communityNodeCount,
"communityRelCount": communityRelCount,
"nodeCount" : nodeCount,
"relationshipCount" : relationshipCount
})

response[filename] = {"chunkNodeCount": chunkNodeCount,
"chunkRelCount": chunkRelCount,
"entityNodeCount": entityNodeCount,
"entityEntityRelCount": entityEntityRelCount,
"communityNodeCount": communityNodeCount,
"communityRelCount": communityRelCount,
"nodeCount" : nodeCount,
"relationshipCount" : relationshipCount
}

return response
4 changes: 3 additions & 1 deletion backend/src/graph_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,7 @@ def get_graph_results(uri, username, password,database,document_names):

def get_chunktext_results(uri, username, password, database, document_name, page_no):
"""Retrieves chunk text, position, and page number from graph data with pagination."""
driver = None
try:
logging.info("Starting chunk text query process")
offset = 10
Expand Down Expand Up @@ -254,4 +255,5 @@ def get_chunktext_results(uri, username, password, database, document_name, page
logging.error(f"An error occurred in get_chunktext_results. Error: {str(e)}")
raise Exception("An error occurred in get_chunktext_results. Please check the logs for more details.") from e
finally:
driver.close()
if driver:
driver.close()
Loading

0 comments on commit e5ca90a

Please sign in to comment.