You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, users of nodestream are looking to leverage the performance of the bulk load functionality in Neptune DB and rely on nodestream to help with the data mapping and data source abstraction. It would be great to introduce a mode for the Neptune plugin that would build up some bulk load CSV files and then when the ingestion is done, load them into the graph. This would bypass the main bottleneck of OpenCypher performance and give a significant performance boost.
However, this would require a bit of a roundabout process where the database connector would take the nodes and edges and write out CSV files, copy them to S3, and then invoke the bulk loader from the AWS SDK. Users would need to provide an S3 bucket to stage the CSV files and add a role to their Neptune cluster which allows it to read from S3.
The effort required is comparable to translating from OpenCypher to Gremlin, which is also expected to give a performance boost and allow nodestream to connect to any TinkerPop compliant database.
The text was updated successfully, but these errors were encountered:
Currently, users of nodestream are looking to leverage the performance of the bulk load functionality in Neptune DB and rely on nodestream to help with the data mapping and data source abstraction. It would be great to introduce a mode for the Neptune plugin that would build up some bulk load CSV files and then when the ingestion is done, load them into the graph. This would bypass the main bottleneck of OpenCypher performance and give a significant performance boost.
However, this would require a bit of a roundabout process where the database connector would take the nodes and edges and write out CSV files, copy them to S3, and then invoke the bulk loader from the AWS SDK. Users would need to provide an S3 bucket to stage the CSV files and add a role to their Neptune cluster which allows it to read from S3.
The effort required is comparable to translating from OpenCypher to Gremlin, which is also expected to give a performance boost and allow nodestream to connect to any TinkerPop compliant database.
The text was updated successfully, but these errors were encountered: