Introduce Bulk Load CSV Files Mode #31

zprobst · 2024-09-09T22:27:12Z

Currently, users of nodestream are looking to leverage the performance of the bulk load functionality in Neptune DB and rely on nodestream to help with the data mapping and data source abstraction. It would be great to introduce a mode for the Neptune plugin that would build up some bulk load CSV files and then when the ingestion is done, load them into the graph. This would bypass the main bottleneck of OpenCypher performance and give a significant performance boost.

However, this would require a bit of a roundabout process where the database connector would take the nodes and edges and write out CSV files, copy them to S3, and then invoke the bulk loader from the AWS SDK. Users would need to provide an S3 bucket to stage the CSV files and add a role to their Neptune cluster which allows it to read from S3.

The effort required is comparable to translating from OpenCypher to Gremlin, which is also expected to give a performance boost and allow nodestream to connect to any TinkerPop compliant database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Bulk Load CSV Files Mode #31

Introduce Bulk Load CSV Files Mode #31

zprobst commented Sep 9, 2024

Introduce Bulk Load CSV Files Mode #31

Introduce Bulk Load CSV Files Mode #31

Comments

zprobst commented Sep 9, 2024