Skip to content
Chris Lu edited this page Apr 27, 2016 · 3 revisions

Input Data Type

One of key features of Glow system is that all data could be strongly typed. This includes input data type. The data type requires that all field types be simple serializable types. Pointers or Channels are not supported.

Data Sources

How to feed data into Glow system? There are 2 different ways:

  1. Pull from a known location
  2. Pushed through Go channel

Pull from a known location

This is useful when you already know where to fetch data. For example, you may know a HDFS folder under which there are lots of files.

  import "github.com/colinmarc/hdfs"
  ...
  flow.New().Source(func(outFiles chan os.FileInfo){
    client, _ := hdfs.New("namenode:8020")
    file, err := client.Open("/_test/fulldir3")
    res, err := file.Readdir(0)
    for _, entry := range res{
        outFiles <- entry
    }
  })

Pushed through Go channel

  import "github.com/colinmarc/hdfs"
  ...
  var outFiles chan os.FileInfo
  go func(){
    client, _ := hdfs.New("namenode:8020")
    file, err := client.Open("/_test/fulldir3")
    res, err := file.Readdir(0)
    for _, entry := range res{
        outFiles <- entry
    }
    close(outFiles)
  }()
  flow.New().Channel(outFiles)

Use Slice()

Slice() is a convenient method using channel underneath.

// process each file in its own mapper process
flow.New().Slice(
    []string{"/foo/bar_1","/foo/bar_2","/foo/bar_3"},
).Partition(3).Map(...)
Clone this wiki locally