Prototype SshSpoolSource for Flume - think Spooling Directory Source over SSH. Caveat Emptor: It is very much pre-alpha.
The SshSpoolSource mirrors many semantics from the SpoolingDirectorySource. Here's what it supports at the moment:
- SSH authorization using username/password (ssh-keys not supported yet)
- Can specify a remote directory to monitor for new files added
- Any new files added in there are considered complete and ingested
- Once a file is processed, it is considered complete, changes to it will not be picked up
- The source persists the state of processed files to disk, so it will not reprocess any files; and it can pickup where it left off in the event of a restart.
-
Build or Download the custom Flume Source
The
flume-sources
directory contains a Maven project with a custom Flume source designed to connect to the specified SSH remote path and ingest the contents of the files there into HDFS.To build the flume-sources JAR, from the root of the git repository:
$ cd flume-sources $ mvn package $ cd ..
This will generate a file called
flume-sources-1.0-SNAPSHOT.jar
in thetarget
directory. -
Add the JAR to the Flume classpath
$ sudo cp /etc/flume-ng/conf/flume-env.sh.template /etc/flume-ng/conf/flume-env.sh
Edit the
flume-env.sh
file and uncomment theFLUME_CLASSPATH
line, and enter the path to the JAR. If adding multiple paths, separate them with a colon. -
Set the Flume agent name to SshAgent in /etc/default/flume-ng-agent
If you don't see the
/etc/default/flume-ng-agent
file, it likely means that you didn't install theflume-ng-agent
package. In the file, you should have the following:FLUME_AGENT_NAME=SshAgent
-
Modify the provided Flume configuration and copy it to /etc/flume-ng/conf
There is a file called
flume.conf
in theflume-sources
directory, which needs some minor editing. There are five fields which need to be filled in with values.$ sudo cp flume.conf /etc/flume-ng/conf
-
Start the Flume agent
Create the HDFS directory hierarchy for the Flume sink.
$ hadoop fs -mkdir /user/flume/ssh $ hadoop fs -chown -R flume:flume /user/flume/ssh $ hadoop fs -chmod -R 770 /user/flume $ sudo /etc/init.d/flume-ng-agent start