Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bleve to get full text search and facets #377

Open
gedw99 opened this issue Mar 20, 2021 · 5 comments
Open

bleve to get full text search and facets #377

gedw99 opened this issue Mar 20, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@gedw99
Copy link

gedw99 commented Mar 20, 2021

Proposal

Add bleve support ( https://github.com/blevesearch )

Motivation

Provides FTS like sqllite has and other DB's, to allow searching over documents. https://sqlite.org/fts3.html

Provides facet based data analysis. A good demo of that concept is here in the video. https://datasette.io/
In the demo of datasette, every column can be faceted: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants

  • this is a very powerful construct for developers and users

Design

For examle with SQLite it is a special Table in order to do FTS. Note that facets is a different and would need a different DSL.

For example, if each of the 517430 documents in the "Enron E-Mail Dataset" is inserted into both an FTS table and an ordinary SQLite table created using the following SQL script:

CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT);     /* FTS3 table */
CREATE TABLE enrondata2(content TEXT);                        /* Ordinary table */

Then either of the two queries below may be executed to find the number of documents in the database that contain the word "linux" (351). Using one desktop PC hardware configuration, the query on the FTS3 table returns in approximately 0.03 seconds, versus 22.5 for querying the ordinary table.

SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */

Prior work:

https://github.com/mosuka/blast
https://github.com/mosuka/blast#search-documents

$ ./bin/blast search '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .
@gedw99 gedw99 added the enhancement New feature or request label Mar 20, 2021
@gedw99 gedw99 changed the title bleve bleve to get full text search snd facets Mar 29, 2021
@gedw99 gedw99 changed the title bleve to get full text search snd facets bleve to get full text search and facets Mar 30, 2021
@tomasweigenast
Copy link

A good implementation might be store fts properties on special tables and map them to genjidb table. While this could be the simplest implementation, it will limit queries to only fts, and won't be possible to pass other WHERE query.

@gedw99
Copy link
Author

gedw99 commented Aug 10, 2021

Thanks or the suggestion.

A simple example of bleve is here along with a simple gui. It’s a good baseline.

https://github.com/blugelabs/beer-search

It creates a File System store.

there is no reason why genji store and this blue store can operate separately.

In terms of linking between the two , the meta data of the bleve store could be stored in genji as you suggest.

In your middle tier you would then update bleve and genji with separate calls.

In terms of GUI, a typical use as you see in beer search example is to use faceted search which ones not map well to data base patterns. So a facet search would tend to be on a different page from a page that needs the genji dB.

Facetted search could be used to do high level cross object searches and then once you identity in the result ms objects your interested in your gui then starts to use the genji dB.

So the genji table name and document / row ID would need to be saved in the bleve store which just requires some hooks I suspect. When data is updated in genji , you then need to tell bleve of the ID and data so that it can reinfect the bleve store.

That would get them at least working together .

later I can imaging a tighter integration but I am almost certain the genji maintainers would not be up for this. Instead a driver for bleve could be build that used genji as the store perhaps.

right now I am working on getting genji working with indexeddb so that it’s possible to build golang gui’s by cross compiling to wasm with genji embedded just like how you can use genji to build golang mobile and desktop apps.

then it would make sense to look at a bleve driver that used genji cause then we would be able to build for wasm, desktop and mobile and have a genji dB embedded .

the gui is gioui . It’s pretty cool for a golang dev to be able to use golang and only golang to build their gui and sever . I am currently work on this and will put up a full working demo on GitHub soon.

https://github.com/hack-pad/hackpadfs

uses

https://github.com/hack-pad/go-indexeddb

contact me if your curious.

or let me know what you think in general..

@gedw99
Copy link
Author

gedw99 commented Aug 10, 2021

If your curious I raised an issue about getting Bluge running in a browser as a first step :

blugelabs/bluge#72

@tomasweigenast
Copy link

Hey, it is active? So sorry I did not answer anymore, I was busy working, but now I can help with anything you need. @gedw99

@gedw99
Copy link
Author

gedw99 commented Nov 12, 2024

Hey @tomasweigenast

way late back to this ?

I am using Pocketbase for most projects as its has the real time events and a sort of denormalised system. I think I am going to have to add Bleve to PB in some way. I really dont want to but PB cant do Facets from what I can see and I really need them.

so this means that a change to PB, then causes the bleve / bulge system to also update. this is why events in PB are so useful and saves your arse. you can add create pipelines to update other semi DB system using PB almost like an event stream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants