Exporting doc information to use in models like SBERT #1271

mikesol · 2024-08-22T04:14:35Z

I'm experimenting with a different indexing method of the docs using SBERT and FAISS. This is a popular approach for indexing/querying data these days, and while lots of companies offer paid versions of it, it's also fairly straightforward to do with open-source tools.

I'm wondering if it's possible to tweak doc search to dump the entire index in xml form. For example:

<entry>
  <package>purescript-lists</package>
  <module>Data.List</module>
  <id>singleton</id>
  <def>singleton :: forall a. a -> List a</def>
  <doc>Create a list with a single element.

Running time: `O(1)`</doc>
</entry>

It's easier to construct a data set for SBERT when data is formatted like this.

I see that a lot of the plumbing to do something like this is already there, but as I don't know the code well, it's tough to come up with a plan for a clean way to do it.

If you have pointers on how to hack at the repo to get there, I can give it a shot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporting doc information to use in models like SBERT #1271

Exporting doc information to use in models like SBERT #1271

mikesol commented Aug 22, 2024 •

edited

Loading

Exporting doc information to use in models like SBERT #1271

Exporting doc information to use in models like SBERT #1271

Comments

mikesol commented Aug 22, 2024 • edited Loading

mikesol commented Aug 22, 2024 •

edited

Loading