Consistency of spark and rmr backends #68

piccolbo · 2015-02-05T06:05:09Z

Because of the deep differences in the backends, despite best efforts some semantic differences have trickled into the API

output function path is mandatory for spark, as we don't have a system of temp files as we do for rmr (we use rdds instead), Related is the fact that the output function returns a path on the spark backend and a big data object (temp file) on rmr. The big data object can encapsulate either a temporary or a permanent location. The equivalent on spark is the rdd and is always temporary
list of supported formats is different
system of custom formats is much more restricted in sparkR

The goal of this issue is to list these differences that spawn specific efforts to reduce or eliminate them, or if necessary document them

piccolbo added the enhancement label Feb 5, 2015

Provide feedback