Skip to content

Commit

Permalink
update gemspec and README (#1)
Browse files Browse the repository at this point in the history
update gemspec and README
  • Loading branch information
jychen7 authored Jul 6, 2022
2 parents 976262e + 928d196 commit 8aee593
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 26 deletions.
18 changes: 0 additions & 18 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,21 +28,3 @@ make test
```
make fmt
```

# FAQ

## Why Magnus?

As of 2022-07, there are a few popular Ruby bindings for Rust, [Rutie](https://github.com/danielpclark/rutie), [Magnus](https://github.com/matsadler/magnus) and [other alternatives](https://github.com/matsadler/magnus#alternatives). Magnus is picked because its API seems cleaner and it seems more clear about safe vs unsafe. The author of Magnus have a "maybe bias" comparison in this [reddit thread](https://www.reddit.com/r/ruby/comments/uskibb/comment/i98rds4/?utm_source=share&utm_medium=web2x&context=3). It is totally subjective and it should not be large effort if we decides to switch to different Ruby bindings fr Rust in future.

## Why the module name and gem name are different?

The module name `Datafusion` follows the [datafusion](https://github.com/apache/arrow-datafusion) and [datafusion-python](https://github.com/datafusion-contrib/datafusion-python). The gem name `datafusion` [is occupied in rubygems.org at 2016](https://rubygems.org/gems/datafusion), so our gem is called `arrow-datafusion`.

Similarly to the Ruby bindings of Arrow, its gem name is called [red-arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) and the module is called `arrow`.

## What is the relationship between Datafusion Ruby and Red Arrow?

Datafusion Ruby is the Ruby bindings of Arrow Datafusion (Rust). Red Arrow is the Ruby bindings of Arrow (C++). To keep Datafusion Ruby simpler, we try to not couple with Red Arrow in core features. If need, we can add additional gems (e.g. red-arrow-datafusion) to support Red Arrow in Datafusion Ruby, similar to how [red-parquet](https://github.com/apache/arrow/blob/2c7c12fd408339817f0322f137d25e9f60a87a26/ruby/red-parquet/red-parquet.gemspec#L44) use red-arrow.

ps: Datafusion Python was coupled with PyArrow. There is a proposal to separate them in medium to long term. For detail, please refer to [Can datafusion-python be used without pyarrow?](https://github.com/datafusion-contrib/datafusion-python/issues/22).
26 changes: 23 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# DataFusion in Ruby

This is a Ruby library that binds to [Apache Arrow](https://arrow.apache.org/) in-memory query engine [DataFusion](https://github.com/apache/arrow-datafusion).
This is yet another Ruby library that binds to [Apache Arrow](https://arrow.apache.org/) in-memory query engine [DataFusion](https://github.com/apache/arrow-datafusion).

It allows you to build a plan through SQL or a DataFrame API against in-memory data, parquet or CSV files, run it in a multi-threaded environment, and obtain the result back in Ruby.
This is an alternative to [datafuion-contrib/datafusion-ruby](https://github.com/datafusion-contrib/datafusion-ruby). Please refer to FAQ below.

## Quick Start

Expand Down Expand Up @@ -46,4 +46,24 @@ Dataframe

## Contribution Guide

Please see [Contribution Guide](CONTRIBUTING.md) for information about contributing to DataFusion in Ruby.
Please see [Contribution Guide](CONTRIBUTING.md).

## FAQ

### Why Magnus?

As of 2022-07, there are a few popular Ruby bindings for Rust, [Rutie](https://github.com/danielpclark/rutie), [Magnus](https://github.com/matsadler/magnus) and [other alternatives](https://github.com/matsadler/magnus#alternatives). Magnus is picked because its API seems cleaner and it seems more clear about safe vs unsafe. The author of Magnus have a "maybe bias" comparison in this [reddit thread](https://www.reddit.com/r/ruby/comments/uskibb/comment/i98rds4/?utm_source=share&utm_medium=web2x&context=3). It is totally subjective and it should not be large effort if we decides to switch to different Ruby bindings fr Rust in future.

### Why the module name and gem name are different?

The module name `Datafusion` follows the [datafusion](https://github.com/apache/arrow-datafusion) and [datafusion-python](https://github.com/datafusion-contrib/datafusion-python). The gem name `datafusion` [is occupied in rubygems.org at 2016](https://rubygems.org/gems/datafusion), so our gem is called `arrow-datafusion`.

Similarly to the Ruby bindings of Arrow, its gem name is called [red-arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) and the module is called `arrow`.

### Why another Ruby bindings for Arrow DataFusion?

[datafuion-contrib/datafusion-python](https://github.com/datafusion-contrib/datafusion-python) was the first bindings of Arrow Datafusion (Rust). It was implemented using [pyo3](https://github.com/PyO3/pyo3) for `Rust -> Python`. Besides Python, Datafusion Community also want to have Java and other language bindings. In order to share development resource, [datafuion-contrib/datafusion-c](https://github.com/datafusion-contrib/datafusion-c) is created and be used in [datafuion-contrib/datafusion-ruby](https://github.com/datafusion-contrib/datafusion-ruby). This `Rust -> C -> Ruby/Python/Java/etc` implementation is published as gem "red-datafusion" and couple with "red-arrow".

Around similar time when "red-datafusion" is created, I want to use Arrow DataFusion in Ruby, mainly to query Object Store like S3/GCS, so I create a `Rust -> Ruby` bindings using [Magnus](https://github.com/matsadler/magnus). So I just keep this `Rust -> Ruby` implementation as an alternative and publish it as gem `arrow-datafusion`. To keep it simple, "arrow-datafusion" does not couple with "red-arrow" at the moment.

ps: Datafusion Python was coupled with PyArrow. There is a proposal to separate them in medium to long term. For detail, please refer to [Can datafusion-python be used without pyarrow?](https://github.com/datafusion-contrib/datafusion-python/issues/22).
8 changes: 4 additions & 4 deletions arrow-datafusion.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ require_relative "lib/datafusion/version"
Gem::Specification.new do |spec|
spec.name = "arrow-datafusion"
spec.version = Datafusion::VERSION
spec.authors = ["Datafusion Contrib Developers"]
spec.homepage = "https://github.com/datafusion-contrib/datafusion-ruby"
spec.authors = ["jychen7"]
spec.homepage = "https://github.com/jychen7/arrow-datafusion-ruby"

spec.summary = "Ruby bindings of Apache Arrow Datafusion"
spec.summary = "yet another Ruby bindings of Apache Arrow Datafusion"
spec.description =
"Ruby bindings of Apache Arrow Datafusion"
"yet another Ruby bindings of Apache Arrow Datafusion"
spec.license = "Apache-2.0"

spec.files = ["README.md", "#{spec.name}.gemspec", "LICENSE"]
Expand Down
2 changes: 1 addition & 1 deletion lib/datafusion/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module Datafusion
VERSION = "0.0.1"
VERSION = "0.0.2"
end

0 comments on commit 8aee593

Please sign in to comment.