Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In addition to URLs, also bring in full-text from Sinequa servers #1016

Open
code-geek opened this issue Sep 6, 2024 · 3 comments
Open

In addition to URLs, also bring in full-text from Sinequa servers #1016

code-geek opened this issue Sep 6, 2024 · 3 comments
Assignees

Comments

@code-geek
Copy link
Contributor

Description

When we bring in URLs right now, we just get the URL and title. We also want to store the full-text in the database (but not show it in the table necessarily). This will allow us to track when fulltext for pages change, and also use this data for LLM purposes.

Implementation Considerations

  • Needs a new field in the CandidateURL model
  • Might need to preserve the last one so we can see deltas
  • Perhaps some sort of hashing could help us (MD5)

Deliverable

  • Notification when there are new URLs, new titles, new text, and a clear way to tell

Dependencies

No response

@CarsonDavis CarsonDavis assigned Kirandawadi and saifrk and unassigned Kirandawadi Sep 6, 2024
saifrk pushed a commit that referenced this issue Sep 19, 2024
saifrk pushed a commit that referenced this issue Sep 19, 2024
@CarsonDavis
Copy link
Collaborator

We will probably need to use the sql endpoint from sinequa. Documentation can be found at this link: https://doc.sinequa.com/en.sinequa-es.v11/Content/en.sinequa-es.devDoc.webservice.rest-search.html#engine-sql

@CarsonDavis
Copy link
Collaborator

Existing code using this endpoint can be found in the following files:

@CarsonDavis
Copy link
Collaborator

You may need a token from the server that includes SQL engine access. You can search for tokens in the admin console, and then you make a new token, and you give it permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants