Skip to content
This repository has been archived by the owner on Feb 25, 2020. It is now read-only.

Remove Kafka dependency #524

Open
jdegoes opened this issue Oct 5, 2013 · 1 comment
Open

Remove Kafka dependency #524

jdegoes opened this issue Oct 5, 2013 · 1 comment
Assignees

Comments

@jdegoes
Copy link
Contributor

jdegoes commented Oct 5, 2013

Kafka is a large, complex piece of software which requires installation and maintenance. There are many ways for Kafka to fail, and Kafka requires ongoing management in order to prevent disk overflow and make tradeoffs between recoverability and resource usage.

While Kafka is very appropriate for a large scale distributed ingest system which has to keep up with fluctuating loads and be fully redundant, it is less appropriate for a single node analytics engine like Precog. When Precog becomes distributed, the focus will be on reading data from HDFS, and not on the ingest of that data, so even long-term, the direct use of Kafka in the Precog project is an unnecessary distraction.

In order to simplify the number of moving pieces in Precog, Kafka needs to be eliminated as a dependency.

Ingest can be as simple as batching up a chunk of data and writing it out to the (abstract) file system -- e.g. appending to the relevant file.

This ticket will be considered complete when Kafka is not a dependency of the project nor referenced or utilized anywhere in the source code, unit tests, or documentation.

See @nuttycom's comment below.

@nuttycom
Copy link
Contributor

nuttycom commented Oct 6, 2013

This is very easy to achieve; simply create a new subproject that derives
from the ingest and bifrost projects, and when you mix the cake together
replace the KafkaEventStore with an EventStore implementation that passes
messages directly to the routing actor, and exclude the
KafkaShardIngestActor from the cake entirely..

On Sat, Oct 5, 2013 at 4:03 PM, John A. De Goes [email protected]:

Kafka is a large, complex piece of software which requires installation
and maintenance. There are many ways for Kafka to fail, and Kafka requires
ongoing management in order to prevent disk overflow and make tradeoffs
between recoverability and resource usage.

While Kafka is very appropriate for a large scale distributed ingest
system which has to keep up with fluctuating loads and be fully redundant,
it is less appropriate for a single node analytics engine like Precog. When
Precog becomes distributed, the focus will be on reading data from HDFS,
and not on the ingest of that data, so even long-term, the direct use of
Kafka in the Precog project is an unnecessary distraction.

In order to simplify the number of moving pieces in Precog, Kafka needs to
be eliminated as a dependency.

Ingest can be as simple as batching up a chunk of data and writing it out
to the (abstract) file system -- e.g. appending to the relevant file.


Reply to this email directly or view it on GitHubhttps://github.com//issues/524
.

@ghost ghost assigned jdegoes Dec 3, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants