Get your data from Teradata AND GET OUTTA THERE!
A beautifully thin wrapper around the jdbc-teradata driver that encapsulates the ugly java bits and gives you back a nice ruby enumerable thing. Because you want to get out of Java Territory as soon as you can.
The JDBC::Teradata adapter helps to make connecting and querying Teradata pretty easy, but dealing with the results is still very Java-centric, as it returns a java.sql.ResultSet object (http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html). Rather than dealing with the awkward parsing and use of metadata, we will just give you an enumerable hash array, or a CSV string.
Since connecting to Teradata from MRI ruby is not really a thing yet, this gem wraps jdbc-teradata, which of course only runs on JRuby.
#Gemfile
gem 'teradata-extractor'
bundle install
extractor = TeradataExtractor::Query.new("server_name", "user", "password")
> #ruby Enumerator
> enum = extractor.enumerable("select Top 2 name, id, email_address, favorite_liquor from td.people_stuff")
=> #<Enumerator: #<JRuby::Generator::Threaded:...>
> enum.to_a
=> [{:name => "Steve", :id => 111, :email_address => "[email protected]", :favorite_liquor => "ALL"},
{:name => "Jerry", :id => 231, :email_address => "[email protected]", :favorite_liquor => "none"}]
> #You get the idea...it's a ruby Enumberable
> #ruby String in CSV format
> headers, rows = extractor.csv_string_io("select Top 2 name, id, email_address, favorite_liquor from td.people_stuff")
=> [[:name, :id, :email_address, :favorite_liquor],
<Enumerator: #<JRuby::Generator::Threaded:...>]
> rows.class
=> Enumerator
> rows.next
=> "Steve,111,[email protected],ALL\nJerry,231,[email protected],none\n"
> #Next returns MORE THAN ONE ROW in CSV format. See note on fetch_size
Both #enumerator and #csv_string_io have an optional second parameter, "fetch_size". When calling #enumerator, fetch_size is purely a performance concern. The enumerator returned will still yeild only 1 row when iterated using enum.next. Fetch size is an instruction to the Teradata resultSet object that tells it how many results it should fetch from the database at a time.
When calling #csv_string_io, fetch_size is significant. For convenience, #csv_string_io bundles rows into groups. So each call to rows.next will yeild a StringIO representing 1000 rows by default. If you like you can pass fetch_size of 1 to get a single row at a time. But if you're using something like https://github.com/theSteveMitchell/postgres_upsert, getting rows in a group is much more efficient, and convenient. You can just...
extractor = TeradataExtractor::Query.new(server_name, user_name, password)
headers, enum = extractor.csv_string_io("select name, id, email_address, favorite_liquor from td.people_stuff")
enum.each do |csv_stringio|
Person.pg_upsert(csv_stringio, {header: false, columns: headers})
end
- support more robust conversion from java sql datatypes to ruby objects. Currently only Date and BigDecimal are handled, other data types like String and Integer are done implicitly.
- Fork the project
- add your feature/fix to your fork(specs please)
- submit a PR
- Lay back and bask in the karma you've earned.
- If you find an issue but can't fix in in a PR, please log an issue. I'll do my best.