Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Bulk Load CSV Files Mode #31

Open
zprobst opened this issue Sep 9, 2024 · 0 comments
Open

Introduce Bulk Load CSV Files Mode #31

zprobst opened this issue Sep 9, 2024 · 0 comments

Comments

@zprobst
Copy link
Member

zprobst commented Sep 9, 2024

Currently, users of nodestream are looking to leverage the performance of the bulk load functionality in Neptune DB and rely on nodestream to help with the data mapping and data source abstraction. It would be great to introduce a mode for the Neptune plugin that would build up some bulk load CSV files and then when the ingestion is done, load them into the graph. This would bypass the main bottleneck of OpenCypher performance and give a significant performance boost.

However, this would require a bit of a roundabout process where the database connector would take the nodes and edges and write out CSV files, copy them to S3, and then invoke the bulk loader from the AWS SDK. Users would need to provide an S3 bucket to stage the CSV files and add a role to their Neptune cluster which allows it to read from S3.

The effort required is comparable to translating from OpenCypher to Gremlin, which is also expected to give a performance boost and allow nodestream to connect to any TinkerPop compliant database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant