-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Halyard with SANSA Stack #71
Comments
Have you taken this idea any further, got any sample code, etc? I am interested in prototyping something, so anything to help me kick start would be useful. |
It is just an idea to implement SANSA-compatible Spark library which will include Halyard as node-local SPARQL evaluation engine (not pointing to some central endpoint but actual in-place query evaluation engine directly communicating with HBase, as used in several actual Halyard MapReduce applications).
SANSA can then decide to direct the whole queries, fragments of the queries, subqueries or just specific statement patterns to Halyard, based on the knowledge that actual data files are also indexed in Halyard.
Any SANSA user can then decide which datasets are worth to index in Halyard and which will be evaluated dynamically by SANSA.
As an example let's have some custom frequently changing data and you want to analyze them with help of for example DBpedia.
Actual Halyard requires to index both first (and the rapidly changing data also to periodically update), then run the queries and if your analytics requires something outside of SPARQL - you are probably doomed.
Actual SANSA will on the other hand require large computation power and it will full-scan all DBpedia data whenever any query would just touch them.
Hybrid solution would allow to index DBpedia in Halyard and SANSA would delegate related queries, subqueries or statement pattern requests to the embedded Halyard library, which will get the data directly from HBase and evaluate them using actual Spark computation node resources as inlined Spark function.
Later can also actual Halyard MapReduce apps be moved to Spark/SANSA, so even indexing might be integral part.
… 20. 7. 2020 v 9:26, Lawrence ***@***.***>:
Have you taken this idea any further, got any sample code, etc? I am interested in prototyping something, so anything to help me kick start would be useful.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'm not familiar with SANZA. It looks like it partitions its RDF vertically unlike Halyard. Because of this, as I understand, it prefers to reload the RDF from the RDF source and partition into predicate tables with subject as key, and object as value. I assume you would not want to reload the Halyard triples simply to repartition them. Therefore, are you suggesting that Halyard be injected into SANZA query planner/executor so that SANZA uses Halyards SPARQL in preference? There are a few problems that SPARQL does not solve. For example shortest-path requires as a mimimum iterative SPARQL. I am hoping that SANZA SPARK would offer and alternative. |
Halyard is powerful distributed triplestore, instantly answering majority of SPARQL queries, however weak in some complex operations (like ORDER BY and GROUP BY) and complicated to implement a custom code that goes beyond SPARQL.
SANSA Stack (and similar Spark-based SPARQL frameworks) seem to be complimentary to Halyard - powerful in ordering, aggregations, and easy to integrate custom transformation logic into the pipe, however slow in ad-hoc SPARQL queries, and unable to form SPARQL Endpoint.
The idea is to provide a hybrid solution, where SANSA Stack (or any other Spark framework) can directly use Halyard data and Halyard query engine as a (distributed) source of RDF data for further processing.
halyard:forkAndFilterBy
function used in Halyard BulkExport), so Spark engine would be able to directly manage Halyard parallelization (transparently for user).This is an idea of potential synergy effect of Halyard and SANSA Stack, that seems to be worth to test.
The text was updated successfully, but these errors were encountered: